Title: Dissipativity Theory for Optimization and Machine Learning Research
Abstract Empirical risk minimization (ERM) is a central topic for machine learning research, and is typically solved using first-order optimization methods whose convergence proofs are derived in a case-by-case manner. In this talk, we will present a simple routine which unifies the analysis of such optimization methods including gradient descent method, Nesterov's accelerated method, stochastic gradient descent (SGD), stochastic average gradient (SAG), SAGA, Finito, stochastic dual coordinate ascent (SDCA), stochastic variance reduction gradient (SVRG), and SGD with momentum. Specifically, we will view all these optimization methods as dynamical systems and then use a unified dissipativity approach to derive sufficient conditions for convergence rate certifications of such dynamical systems. The derived conditions are all in the form of linear matrix inequalities (LMIs). We solve these resultant LMIs and obtain analytical proofs of new convergence rates for various optimization methods (with or without individual convexity). Our proposed analysis can be automated for a large class of first-order optimization methods under various assumptions. In addition, the derived LMIs can always be numerically solved to provide clues for constructions of analytical proofs.
Bin Hu received his B.S in Theoretical and Applied Mechanics from the University of Science and Technology of China, and received the M.S. in Computational Mechanics from Carnegie Mellon University. He received the Ph.D in Aerospace Engineering and Mechanics at the University of Minnesota, advised by Peter Seiler. He is currently a postdoctoral researcher in the optimization group of Wisconsin Institute for Discovery at the University of Wisconsin-Madison. He is working with Laurent Lessard and closely collaborating with Stephen Wright. He is interested in building connections between control theory and machine learning research. His current research focuses on tailoring robust control theory (integral quadratic constraints, dissipation inequalities, jump system theory, etc) to unify the study of stochastic optimization methods (stochastic gradient, stochastic average gradient, SAGA, SVRG, Katyusha momentum, etc) and their applications in related machine learning problems (logistic regression, deep neural networks, matrix completion, etc). He is also particularly interested in the generalization mechanism of deep learning.