Arnulf Jentzen, The Chinese University of Hong Kong, Shenzhen and University of Münster: Comprehensive convergence analysis for the Adam optimizer
Overview
- Date:Starts 18 May 2026, 13:15Ends 18 May 2026, 14:00
- Location:MV:L14, Chalmers tvärgata 3
- Language:English
Abstract: In the training of artificial intelligence (AI) systems, often not the standard gradient descent (GD) method is the employed optimization scheme but instead suitable accelerated and/or adaptive GD methods -- such as the momentum and the RMSprop methods -- are considered. The most popular of such accelerated/adaptive optimization methods is presumably the Adam optimizer due to Kingma & Ba (2014). In this talk we introduce the notion of the stability region for general deep learning optimization methods and we reveal that among standard GD, momentum, RMSprop, and Adam we have that Adam is the only optimizer that achieves the optimal higher order convergence speed and also has the maximal stability region. In another main result of this talk, which we refer to as Adam symmetry theorem, we show for a simple class of quadratic stochastic optimization problems (SOPs) that Adam converges, as the number of Adam steps increases, to the solution of the SOP (the unique minimizer of the strongly convex objective function) if and only if the random variables in the SOP (the data in the SOP) are symmetrically distributed. In particular, in the standard case where the random variables in the SOP are not symmetrically distributed we disprove that Adam converges to the minimizer of the SOP as the number of Adam steps increases. The talk is based on joint works with Steffen Dereich, Thang Do, Robin Graeber, Sebastian Kassing, Adrian Riekert, and Philippe von Wurstemberger.
- Full Professor, Applied Mathematics and Statistics, Mathematical Sciences
