Table of Contents
Fetching ...

Barriers for recent methods in geodesic optimization

Cole Franks, Philipp Reichenbach

TL;DR

It is suggested that it is impossible to prove polynomial running time bounds for tensor scaling based on diameter bounds alone, which motivates the search for analogues of more sophisticated algorithms, such as interior point methods, for geodesically convex optimization that do not rely onPolynomial diameter bounds.

Abstract

We study a class of optimization problems including matrix scaling, matrix balancing, multidimensional array scaling, operator scaling, and tensor scaling that arise frequently in theory and in practice. Some of these problems, such as matrix and array scaling, are convex in the Euclidean sense, but others such as operator scaling and tensor scaling are geodesically convex on a different Riemannian manifold. Trust region methods, which include box-constrained Newton's method, are known to produce high precision solutions very quickly for matrix scaling and matrix balancing (Cohen et. al., FOCS 2017, Allen-Zhu et. al. FOCS 2017), and result in polynomial time algorithms for some geodesically convex problems like operator scaling (Garg et. al. STOC 2018, Bürgisser et. al. FOCS 2019). One is led to ask whether these guarantees also hold for multidimensional array scaling and tensor scaling. We show that this is not the case by exhibiting instances with exponential diameter bound: we construct polynomial-size instances of 3-dimensional array scaling and 3-tensor scaling whose approximate solutions all have doubly exponential condition number. Moreover, we study convex-geometric notions of complexity known as margin and gap, which are used to bound the running times of all existing optimization algorithms for such problems. We show that margin and gap are exponentially small for several problems including array scaling, tensor scaling and polynomial scaling. Our results suggest that it is impossible to prove polynomial running time bounds for tensor scaling based on diameter bounds alone. Therefore, our work motivates the search for analogues of more sophisticated algorithms, such as interior point methods, for geodesically convex optimization that do not rely on polynomial diameter bounds.

Barriers for recent methods in geodesic optimization

TL;DR

It is suggested that it is impossible to prove polynomial running time bounds for tensor scaling based on diameter bounds alone, which motivates the search for analogues of more sophisticated algorithms, such as interior point methods, for geodesically convex optimization that do not rely onPolynomial diameter bounds.

Abstract

We study a class of optimization problems including matrix scaling, matrix balancing, multidimensional array scaling, operator scaling, and tensor scaling that arise frequently in theory and in practice. Some of these problems, such as matrix and array scaling, are convex in the Euclidean sense, but others such as operator scaling and tensor scaling are geodesically convex on a different Riemannian manifold. Trust region methods, which include box-constrained Newton's method, are known to produce high precision solutions very quickly for matrix scaling and matrix balancing (Cohen et. al., FOCS 2017, Allen-Zhu et. al. FOCS 2017), and result in polynomial time algorithms for some geodesically convex problems like operator scaling (Garg et. al. STOC 2018, Bürgisser et. al. FOCS 2019). One is led to ask whether these guarantees also hold for multidimensional array scaling and tensor scaling. We show that this is not the case by exhibiting instances with exponential diameter bound: we construct polynomial-size instances of 3-dimensional array scaling and 3-tensor scaling whose approximate solutions all have doubly exponential condition number. Moreover, we study convex-geometric notions of complexity known as margin and gap, which are used to bound the running times of all existing optimization algorithms for such problems. We show that margin and gap are exponentially small for several problems including array scaling, tensor scaling and polynomial scaling. Our results suggest that it is impossible to prove polynomial running time bounds for tensor scaling based on diameter bounds alone. Therefore, our work motivates the search for analogues of more sophisticated algorithms, such as interior point methods, for geodesically convex optimization that do not rely on polynomial diameter bounds.

Paper Structure

This paper contains 43 sections, 42 theorems, 181 equations, 4 figures.

Key Result

Theorem 1.1

There is an absolute constant $C > 0$ and an array $p_{ijk} \in (\mathbb R_{\geq 0}^n)^{\otimes 3}$ with $O(n)$ nonzero entries, each of bit-complexity $O(n)$, that satisfies the following property. For all $0 <\varepsilon \leq \exp(- C n^2 \log n)$ and $(x,y,z) \in \mathbb R^{3n}$, if then $\lVert(x,y,z)\rVert_2 = \Omega\left(2^{n/3}\log(1/\varepsilon)\right).$

Figures (4)

  • Figure 2.1: The positions of the ones in $A_4$ and $A_6$ are marked by $*$ in the following figure and the cells are colored according to whether they belong to $A_2, B_1, B_2$ or $B_3$.
  • Figure 3.1: The graph $D_l$ from \ref{['dfn:graph']} with the edge labels proportional to the edge labeling $q$ in \ref{['it:stoch']} of \ref{['lem:diameter']} (the constant factor $1/6n$ is omitted for readability). We have also omitted the directions, which are all towards the root $r$.
  • Figure 3.2: The matrix $M$ written in the reordered basis described before \ref{['lem:diameter']}. From the left, the five groups of columns correspond to the $\overline{w}'s$, the $u's$, the $v's$, the $w's$, and $r$ among the vertices of $D_l$. As such the dimensions of the five column groups, from left, are $3 \cdot 2, 3 (l-1), 3 (l-1), 3 (l-1), 3$, and the dimensions of the four groups of rows from top are $3 (l-1), 3 (l-1), 3 (l-1), 3\cdot 2$. $A$ is as in \ref{['eq:a-matrix']} and $I$ is the $3\times 3$ identity matrix.
  • Figure 3.3: If $v$ is a vertex of $D_l$ with edges weighted $q_1$ and $q_2$ incident to it, then the column $v,i$ of $M$ for $i \in [3]$ sums to $q_1 + 2q_2$. That is, the incoming edge contributes its weight and the outgoing edge contributes twice its weight.

Theorems & Definitions (96)

  • Theorem 1.1
  • Definition 1.2: Margin
  • Theorem 1.3
  • Theorem 1.4: Noncommutative diameter lower bound
  • Definition 1.5: Gap
  • Theorem 1.6
  • Theorem 2.1: Margin for array scaling
  • Lemma 2.2
  • proof
  • Lemma 2.3
  • ...and 86 more