From Inverse Optimization to Feasibility to ERM

Saurabh Mishra; Anant Raj; Sharan Vaswani

From Inverse Optimization to Feasibility to ERM

Saurabh Mishra, Anant Raj, Sharan Vaswani

TL;DR

The paper introduces contextual inverse linear programming (CILP) by predicting LP costs from context and enforcing that the predicted costs yield the observed optimal decisions. It first recasts CILP as a convex feasibility problem using KKT conditions (SETS $C$ and $F$) and solves it with alternating projections, achieving linear convergence without relying on degeneracy or interpolation assumptions. To scale to large problems, it then reduces CILP to empirical risk minimization on a smooth, convex loss that satisfies the Polyak-Lojasiewicz condition, enabling efficient first-order methods with provable generalization guarantees. The approach is validated on synthetic and real-world tasks (e.g., Warcraft SP, MNIST PM), outperforming several baselines in decision accuracy while remaining computationally competitive. This framework provides a principled, scalable path for learning optimization parameters from contextual data with theoretical convergence and generalization guarantees.

Abstract

Inverse optimization involves inferring unknown parameters of an optimization problem from known solutions and is widely used in fields such as transportation, power systems, and healthcare. We study the contextual inverse optimization setting that utilizes additional contextual information to better predict the unknown problem parameters. We focus on contextual inverse linear programming (CILP), addressing the challenges posed by the non-differentiable nature of LPs. For a linear prediction model, we reduce CILP to a convex feasibility problem allowing the use of standard algorithms such as alternating projections. The resulting algorithm for CILP is equipped with theoretical convergence guarantees without additional assumptions such as degeneracy or interpolation. Next, we reduce CILP to empirical risk minimization (ERM) on a smooth, convex loss that satisfies the Polyak-Lojasiewicz condition. This reduction enables the use of scalable first-order optimization methods to solve large non-convex problems while maintaining theoretical guarantees in the convex setting. Subsequently, we use the reduction to ERM to quantify the generalization performance of the proposed algorithm on previously unseen instances. Finally, we experimentally validate our approach on synthetic and real-world problems and demonstrate improved performance compared to existing methods.

From Inverse Optimization to Feasibility to ERM

TL;DR

and

) and solves it with alternating projections, achieving linear convergence without relying on degeneracy or interpolation assumptions. To scale to large problems, it then reduces CILP to empirical risk minimization on a smooth, convex loss that satisfies the Polyak-Lojasiewicz condition, enabling efficient first-order methods with provable generalization guarantees. The approach is validated on synthetic and real-world tasks (e.g., Warcraft SP, MNIST PM), outperforming several baselines in decision accuracy while remaining computationally competitive. This framework provides a principled, scalable path for learning optimization parameters from contextual data with theoretical convergence and generalization guarantees.

Abstract

Paper Structure (34 sections, 18 theorems, 56 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 34 sections, 18 theorems, 56 equations, 9 figures, 1 table, 1 algorithm.

Introduction
Related Work
Problem Formulation
Challenge in Gradient Estimation
Reduction to Convex Feasibility
Reduction
Algorithm
Practical considerations: Margin
Handling non-linear optimization problems
Challenges for solving large-scale problems
Reduction to Empirical Risk Minimization
Reduction
Properties of $h(\theta)$
First-order Methods
Generalization Guarantees
...and 19 more sections

Key Result

Proposition 5.0

Point $\hat{c} := (c_1, c_2, \ldots, c_N)$ where $c_i = z_i \tilde{\theta}$ and $\tilde{\theta} \in \arg\min h(\theta)$ lies in the intersection $\mathcal{C} \cap \mathcal{F}$ if it exists, else $\hat{c} \in \mathcal{F}$ is the point closest to $\mathcal{C}$.

Figures (9)

Figure 1: CIO framework: model $f_\theta$ takes input $z$ and predicts the cost vector $c = f_\theta(z)$. This cost vector is the input of an optimization procedure that outputs decision $x(c)$. Given the optimal decision $x^*$, the objective is to learn the model parameters such that the predicted decision $x(c)$ is close to the optimal decision. To train the model in an end-to-end fashion, the key challenge is to compute the gradient of $c$ w.r.t decision $x(c)$ (shown in red in the figure).
Figure 2: Decision loss: Training and Test plot for the real world experiments. Our method significantly outperforms the other methods (ST, BB, MOM, SPO+).
Figure 3: Estimate loss: training and test plots for real-world experiments. Our method significantly outperforms existing methods (ST, BB, MOM) and is comparable to SPO+, which uses the knowledge of $c^*$.
Figure 4: In this figure, we can see two point $x, y$ and their projection onto a linear boundary of set $C$ denoted as $x_1, y_1$ respectively. Moreover, the angle between $x,y$ and $x, x_1$ is the right angle; thus, the two vectors are orthogonal.
Figure 5: Estimate loss
...and 4 more figures

Theorems & Definitions (26)

Proposition 5.0
Proposition 5.0
Proposition 5.0
Proposition 5.0
Theorem 6.1
Corollary 6.1
Corollary 6.1
Proposition 6.1
Proposition 2.0
Lemma 2.1
...and 16 more

From Inverse Optimization to Feasibility to ERM

TL;DR

Abstract

From Inverse Optimization to Feasibility to ERM

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (26)