On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization

Stephen Gould; Basura Fernando; Anoop Cherian; Peter Anderson; Rodrigo Santa Cruz; Edison Guo

On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization

Stephen Gould, Basura Fernando, Anoop Cherian, Peter Anderson, Rodrigo Santa Cruz, Edison Guo

TL;DR

The paper derives exact gradient expressions for differentiating parameterized argmin and argmax problems in bi-level optimization, covering unconstrained and constrained lower-level problems. It presents general implicit-differentiation formulas, extends them to equality and inequality constraints via null-space and barrier methods, and demonstrates applications with scalar and softmax exemplars. A bi-level learning example shows how to adjust model parameters to steer the location of maximum-likelihood features, highlighting practical end-to-end learning potential. The discussion addresses computational considerations and suggests directions for scalable and non-smooth settings in real-world AI tasks.

Abstract

Some recent works in machine learning and computer vision involve the solution of a bi-level optimization problem. Here the solution of a parameterized lower-level problem binds variables that appear in the objective of an upper-level problem. The lower-level problem typically appears as an argmin or argmax optimization problem. Many techniques have been proposed to solve bi-level optimization problems, including gradient descent, which is popular with current end-to-end learning approaches. In this technical report we collect some results on differentiating argmin and argmax optimization problems with and without constraints and provide some insightful motivating examples.

On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization

TL;DR

Abstract

On Differentiating Parameterized Argmin and Argmax Problems with Application to Bi-level Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (13)