Table of Contents
Fetching ...

DFWLayer: Differentiable Frank-Wolfe Optimization Layer

Zixuan Liu, Liu Liu, Xueqian Wang, Peilin Zhao

TL;DR

Experimental results demonstrate that the DFWLayer not only attains competitive accuracy in solutions and gradients but also consistently adheres to constraints.

Abstract

Differentiable optimization has received a significant amount of attention due to its foundational role in the domain of machine learning based on neural networks. This paper proposes a differentiable layer, named Differentiable Frank-Wolfe Layer (DFWLayer), by rolling out the Frank-Wolfe method, a well-known optimization algorithm which can solve constrained optimization problems without projections and Hessian matrix computations, thus leading to an efficient way of dealing with large-scale convex optimization problems with norm constraints. Experimental results demonstrate that the DFWLayer not only attains competitive accuracy in solutions and gradients but also consistently adheres to constraints.

DFWLayer: Differentiable Frank-Wolfe Optimization Layer

TL;DR

Experimental results demonstrate that the DFWLayer not only attains competitive accuracy in solutions and gradients but also consistently adheres to constraints.

Abstract

Differentiable optimization has received a significant amount of attention due to its foundational role in the domain of machine learning based on neural networks. This paper proposes a differentiable layer, named Differentiable Frank-Wolfe Layer (DFWLayer), by rolling out the Frank-Wolfe method, a well-known optimization algorithm which can solve constrained optimization problems without projections and Hessian matrix computations, thus leading to an efficient way of dealing with large-scale convex optimization problems with norm constraints. Experimental results demonstrate that the DFWLayer not only attains competitive accuracy in solutions and gradients but also consistently adheres to constraints.
Paper Structure (13 sections, 2 theorems, 11 equations, 2 figures, 4 tables, 1 algorithm)

This paper contains 13 sections, 2 theorems, 11 equations, 2 figures, 4 tables, 1 algorithm.

Key Result

Theorem 2.1

Let $f: \mathbb{R}^n \rightarrow \mathbb{R}$ be a L-smooth convex function on a convex region $\mathcal{C}$ with diameter $M$ and $x^\ast=\arg\min_{x\in\mathcal{C}} f(x)$. Under assum:dis, the suboptimality gap of DFWLayer for $\ell_1$ norm constraints is bounded by

Figures (2)

  • Figure 1: Gradients and solutions distance between CvxpyLayer and DFWLayer with different temperatures for medium-scale problems. All the curves terminate for tolerance $\epsilon = 1e-4$. The shaded area for all the curves stands for standard deviation over 5 trials and $x^\ast$ stands for solutions obtained by CvxpyLayer.
  • Figure 2: MSE loss, mean violation and violation rate for robotics tasks. The first row is for R+O03 and the second row is for HC+O. Mean violation is computed over violated samples and violation rate is the ratio of violated samples to all samples in the testing set. The shaded area for all the curves stands for standard deviation over 5 trials. Lower values are better for all the metrics.

Theorems & Definitions (4)

  • Theorem 2.1
  • Lemma A.3
  • Remark A.4
  • proof : Proof of \ref{['thm:gap']}