A Compositional Framework for First-Order Optimization

Tyler Hanks; Matthew Klawonn; Evan Patterson; Matthew Hale; James Fairbanks

A Compositional Framework for First-Order Optimization

Tyler Hanks, Matthew Klawonn, Evan Patterson, Matthew Hale, James Fairbanks

TL;DR

This paper presents an algebraic framework for hierarchically composing optimization problems defined on hypergraphs and automatically generating distributed solution algorithms that respect the given hierarchical structure and derives a novel sufficient condition for when a problem defined by compositional data is solvable by a decomposition method.

Abstract

Optimization decomposition methods are a fundamental tool to develop distributed solution algorithms for large scale optimization problems arising in fields such as machine learning and optimal control. In this paper, we present an algebraic framework for hierarchically composing optimization problems defined on hypergraphs and automatically generating distributed solution algorithms that respect the given hierarchical structure. The central abstractions of our framework are operads, operad algebras, and algebra morphisms, which formalize notions of syntax, semantics, and structure preserving semantic transformations respectively. These abstractions allow us to formally relate composite optimization problems to the distributed algorithms that solve them. Specifically, we show that certain classes of optimization problems form operad algebras, and a collection of first-order solution methods, namely gradient descent, Uzawa's algorithm (also called gradient ascent-descent), and their subgradient variants, yield algebra morphisms from these problem algebras to algebras of dynamical systems. Primal and dual decomposition methods are then recovered by applying these morphisms to certain classes of composite problems. Using this framework, we also derive a novel sufficient condition for when a problem defined by compositional data is solvable by a decomposition method. We show that the minimum cost network flow problem satisfies this condition, thereby allowing us to automatically derive a hierarchical dual decomposition algorithm for finding minimum cost flows on composite flow networks. We implement our operads, algebras, and algebra morphisms in a Julia package called AlgebraicOptimization.jl and use our implementation to empirically demonstrate that hierarchical dual decomposition outperforms standard dual decomposition on classes of flow networks with hierarchical structure.

A Compositional Framework for First-Order Optimization

TL;DR

Abstract

Paper Structure (27 sections, 18 theorems, 93 equations, 9 figures, 3 algorithms)

This paper contains 27 sections, 18 theorems, 93 equations, 9 figures, 3 algorithms.

Introduction
Preliminaries
Notation
First-order Optimization
Algebras of Undirected Wiring Diagrams
Gradient Descent is an Algebra Morphism
Composing Optimization Problems
Solving Composite Problems with Gradient Descent
The Compositional Data Condition
Uzawa's Algorithm is an Algebra Morphism
Composing Saddle Problems
Additional Classes of Optimization Problems
Solving Composite Saddle Problems with Uzawa's Algorithm
Subgradient Descent as a Functor
Non-deterministic Dynamical Systems
...and 12 more sections

Key Result

Lemma 2.8

Given a lax symmetric monoidal functor $(F,\varphi)\colon (\mathrm{\textnormal{FinSet}},+)\to(\mathrm{\textnormal{Set}},\times)$, there is a lax symmetric monoidal functor $(F\textnormal{Csp},\varphi')\colon (\mathrm{\textnormal{Cospan}},+)\to(\mathrm{\textnormal{Set}},\times)$ defined by the follow

Figures (9)

Figure 1: A. An example undirected wiring diagram (UWD), which is a special type of hypergraph. Each box has a finite set of connection points which we call ports. We refer to the small circles as junctions and the edges connecting ports to junctions as wires. Importantly, every UWD has a boundary, visualized by the large outer box. We refer to ports on inner boxes and outer boxes as inner ports and outer ports, respectively. B. The UWD in (A) interpreted as a composite optimization problem. Subproblems inhabit boxes and their optimization variables inhabit wires. Subproblems connected by the same junction share the variables on wires incident to that junction. C. The UWD in (A) interpreted as a composite dynamical system. Subsystems inhabit boxes and their state variables inhabit wires with junctions indicating which state variables are shared. The change in a shared state variable is computed by summing changes from contributing subsystems (encoded by the matrix $K^T$). Gradient descent gives a structure-preserving map from the objective function semantics to the dynamical systems semantics.
Figure 2: The full hierarchy of results presented in this paper. Nodes represent the various UWD-algebras developed including those for composing saddle problems, convex problems, concave problems, all with and without differentiability assumptions, as well as composing deterministic and non-deterministic dynamical systems. Hooked arrows indicate that there is an inclusion of one algebra into another. Non-hooked arrows are the algebra morphisms including gradient descent ($\mathsf{gd}$), gradient ascent-descent ($\mathsf{ga}\text{-}\mathsf{d}$) and the primal-dual subgradient method ($\mathsf{pd}\text{-}\mathsf{subg}$). Composing the inclusions with the gradient algebra morphisms yields (sub)gradient descent for convex problems and (super)gradient ascent for concave problems.
Figure 3: A. An example UWD with two inner boxes. B. The cospan representation of the UWD in (A). The signature of this UWD is $\{1,2\}+\{3,4\}\to \{a,b,c\}\leftarrow\{1',2'\}$.
Figure 4: A visualization of the relationship between a function $\phi\colon[2]+[3]\to [4]$, the action of the pushforward $\phi_*$ on an input pair $(x,y)$, and the action of the pullback $\phi^*$ on an input $z$. The pushforward sums components of $(x,y)$ according to $\phi$ while the pullback duplicates components of $z$ according to $\phi$.
Figure 5: A. An example of the collect algebra acting on a UWD $\Phi$ with two inner boxes. The resulting linear map $\Phi_*$ takes a pair of vectors in $\mathbb{R}^2\times \mathbb{R}^2$ as input and produces a vector in $\mathbb{R}^3$ by summing the components of the inputs which share the same junction in $\Phi$. Arrows are added to emphasize that the flow of information is directed from the inner boxes to the boundary box. B. An example of the distribute algebra acting on $\Phi$. The resulting linear map $\Phi^*$ takes a vector in $\mathbb{R}^3$ and produces a pair of vectors in $\mathbb{R}^2\times \mathbb{R}^2$ by copying components of the input which share the same junction in $\Phi$. Arrows are added to emphasize that the flow of information is directed from the boundary box to the inner boxes. The matrix representations are with respect to the standard bases. Note that these examples satisfy the simplifying assumptions of Remark \ref{['rem:simple']}.
...and 4 more figures

Theorems & Definitions (63)

Definition 2.1
Definition 2.2: Definition 2.1.2 in spivak_operad_2013
Definition 2.3: Example 2.1.4 in spivak_operad_2013
Definition 2.4: Example 2.1.7 in spivak_operad_2013
Definition 2.5: Definition 2.2.1 in spivak_operad_2013
Definition 2.6: Definition 2.2.5 in spivak_operad_2013
Example 2.7: Pushforwards and Pullbacks
Lemma 2.8
proof
Remark 2.9
...and 53 more

A Compositional Framework for First-Order Optimization

TL;DR

Abstract

A Compositional Framework for First-Order Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (63)