DFWLayer: Differentiable Frank-Wolfe Optimization Layer

Zixuan Liu; Liu Liu; Xueqian Wang; Peilin Zhao

DFWLayer: Differentiable Frank-Wolfe Optimization Layer

Zixuan Liu, Liu Liu, Xueqian Wang, Peilin Zhao

TL;DR

Experimental results demonstrate that the DFWLayer not only attains competitive accuracy in solutions and gradients but also consistently adheres to constraints.

Abstract

Differentiable optimization has received a significant amount of attention due to its foundational role in the domain of machine learning based on neural networks. This paper proposes a differentiable layer, named Differentiable Frank-Wolfe Layer (DFWLayer), by rolling out the Frank-Wolfe method, a well-known optimization algorithm which can solve constrained optimization problems without projections and Hessian matrix computations, thus leading to an efficient way of dealing with large-scale convex optimization problems with norm constraints. Experimental results demonstrate that the DFWLayer not only attains competitive accuracy in solutions and gradients but also consistently adheres to constraints.

DFWLayer: Differentiable Frank-Wolfe Optimization Layer

TL;DR

Experimental results demonstrate that the DFWLayer not only attains competitive accuracy in solutions and gradients but also consistently adheres to constraints.

Abstract

Paper Structure (13 sections, 2 theorems, 11 equations, 2 figures, 4 tables, 1 algorithm)

This paper contains 13 sections, 2 theorems, 11 equations, 2 figures, 4 tables, 1 algorithm.

Introduction & Related Work
Methodology
Experimental Results
Conclusions
Appendix
Algorithm Details
Theoretical Results
Experimental Details
Different-Scale Optimization Problems
Efficiency.
Accuracy.
Softmax Temperature.
Robotics Tasks Under Imitation Learning

Key Result

Theorem 2.1

Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$ be a L-smooth convex function on a convex region $\mathcal{C}$ with diameter $M$ and $x^\ast=\arg\min_{x\in\mathcal{C}} f(x)$. Under assum:dis, the suboptimality gap of DFWLayer for $\ell_1$ norm constraints is bounded by

Figures (2)

Figure 1: Gradients and solutions distance between CvxpyLayer and DFWLayer with different temperatures for medium-scale problems. All the curves terminate for tolerance $\epsilon = 1e-4$. The shaded area for all the curves stands for standard deviation over 5 trials and $x^\ast$ stands for solutions obtained by CvxpyLayer.
Figure 2: MSE loss, mean violation and violation rate for robotics tasks. The first row is for R+O03 and the second row is for HC+O. Mean violation is computed over violated samples and violation rate is the ratio of violated samples to all samples in the testing set. The shaded area for all the curves stands for standard deviation over 5 trials. Lower values are better for all the metrics.

Theorems & Definitions (4)

Theorem 2.1
Lemma A.3
Remark A.4
proof : Proof of \ref{['thm:gap']}

DFWLayer: Differentiable Frank-Wolfe Optimization Layer

TL;DR

Abstract

DFWLayer: Differentiable Frank-Wolfe Optimization Layer

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (4)