Towards a theory of non-commutative optimization: geodesic 1st and 2nd order methods for moment maps and polytopes
Scaling problems, such as operator scaling and tensor scaling, have recently found several applications in diverse areas of math and CS. They have been used to give efficient algorithms for non-commutative rational identity testing, compute the optimal constant in Brascamp-Lieb inequalities, one-body quantum marginal problem, solution to the Paulsen problem, and the search version of the Horn's problem, to name a few, [GGOW15, GGOW18, BFGOWW18, KLLR18, F18]. Scaling problems are natural geodesic convex optimization problems on Riemannian manifolds that arise from the symmetries of non-commutative groups, and the algorithms (and their analyses) developed in these previous works heavily relied on this general form of convexity. In this talk we discuss our recent systematic development of a theory of non-commutative optimization, which greatly extends ordinary (Euclidean) convex optimization. We will see how NC optimization unify and generalize the algorithms developed for the uniform and non-uniform versions of the operator scaling and tensor scaling problems. More specifically, our algorithms minimize the moment map (a non-commutative notion of the usual gradient), and test membership in moment polytopes (a vast class of polytopes, typically of exponential vertex and facet complexity, which quite magically arise from this a priori non-convex, non-linear setting). In the spirit of standard convex optimization, we develop two general methods in the geodesic setting, a first order and a second order method, which respectively receive first and second order information on the ``derivatives'' of the function to be optimized. We will also show the key parameters of the underlying group actions which control convergence to the optimum in each of these methods. The non-commutative analogues of ``smoothness'' (from the commutative case) are far more complex, and require significant algebraic and analytic machinery (much existing and some newly developed in this work). Despite this complexity, we shall see that the way in which these parameters control convergence in both methods is quite simple and elegant. Based on joint work with Peter Buergisser, Cole Franks, Ankit Garg, Michael Walter and Avi Wigderson.