Cheap Thrills: Effective Amortized Optimization Using Inexpensive Labels

Khai Nguyen; Petros Ellinas; Anvita Bhagavathula; Priya Donti

Cheap Thrills: Effective Amortized Optimization Using Inexpensive Labels

Khai Nguyen, Petros Ellinas, Anvita Bhagavathula, Priya Donti

TL;DR

A novel framework that first collects "cheap"imperfect labels, then performs supervised pretraining, and finally refines the model through self-supervised learning to improve overall performance is proposed, which yields faster convergence; improved accuracy, feasibility, and optimality; and up to 59x reductions in total offline cost.

Abstract

To scale the solution of optimization and simulation problems, prior work has explored machine-learning surrogates that inexpensively map problem parameters to corresponding solutions. Commonly used approaches, including supervised and self-supervised learning with either soft or hard feasibility enforcement, face inherent challenges such as reliance on expensive, high-quality labels or difficult optimization landscapes. To address their trade-offs, we propose a novel framework that first collects "cheap" imperfect labels, then performs supervised pretraining, and finally refines the model through self-supervised learning to improve overall performance. Our theoretical analysis and merit-based criterion show that labeled data need only place the model within a basin of attraction, confirming that only modest numbers of inexact labels and training epochs are required. We empirically validate our simple three-stage strategy across challenging domains, including nonconvex constrained optimization, power-grid operation, and stiff dynamical systems, and show that it yields faster convergence; improved accuracy, feasibility, and optimality; and up to 59x reductions in total offline cost.

Cheap Thrills: Effective Amortized Optimization Using Inexpensive Labels

TL;DR

Abstract

Paper Structure (49 sections, 2 theorems, 44 equations, 12 figures, 8 tables)

This paper contains 49 sections, 2 theorems, 44 equations, 12 figures, 8 tables.

Introduction
Problem Formulation
Self-Supervision from Supervised Warm-Starting via Cheap Labels
Theoretical Analysis
How Bad Can Labels Be?
How Many Labels Are Needed?
Empirical Analysis
Methods Evaluated
Evaluation Metrics
Benchmark Problems
Result Discussions
Related Work
Conclusion
Additional Theoretical Results
Proof of Theorem \ref{['thm:two_regimes_tight']}
...and 34 more sections

Key Result

Theorem 4.2

Supervised warm-starting exhibits two regimes: (i) Globally admissible proxy. If $\Delta_{\text{proxy}} < m_\theta$, then there exists $K$ such that $\varepsilon(K) < m_\theta - \Delta_{\text{proxy}}$ and thus $\pi_{\theta_K} \in \mathcal{B}(y^\star)$. Thus, convergence to $\hat{y}$ yields supervise

Figures (12)

Figure 1: Overview of our approach. We propose a simple but effective three-stage amortized optimization framework, (1) collecting cheap imperfect labels from approximate procedures, (2) pretraining a supervised warm-start, and (3) training with self-supervision, that reduces offline cost by up to $59\times$ while consistently improving accuracy, optimality, and feasibility over existing baselines.
Figure 2: Loss (left) and merit (right) landscapes along two weight directions from our experiments. The surrogate loss facilitates SSL \ref{['eq:ssl_obj']}, while the task-faithful but ill-conditioned merit \ref{['eq:merit']} exhibits sharp ridges and multiple basins, explaining the potential failure of vanilla SSL when trained directly on the merit.
Figure 3: Amortized optimization of power grid operation. Our approach of using cheap DCOPF labels to warm-start SSL consistently reduces average optimality gaps and constraint violations, while remaining competitive in worst-case ACOPF problems. The gains are especially pronounced for hard-constraint methods.
Figure 4: Physics-informed learning of stiff dynamical equations. Left and center: aggregate solution error (MSE and MAE) relative to ground truth. Right: temporal evolution of the MSE for the fastest state variable ($E$). Our method that warm-starts SSL with cheap labels reduces errors and stabilizes trajectories.
Figure 5: Average merit over training epochs for pretraining, vanilla SSL and our method. The merit follows a U-shaped trajectory whose minimum defines the start of SSL. Compared to vanilla SSL, which plateaus at higher merit, our approach yields faster convergence and a better final solution.
...and 7 more figures

Theorems & Definitions (7)

Remark 2.1: Exact Formulation
Definition 4.1: Error Decomposition
Theorem 4.2: Basin Admissibility under Supervised Warm-Starting
Definition 4.3: Traversability and Effective Target
Proposition 4.5: Geometric Scaling
proof
proof

Cheap Thrills: Effective Amortized Optimization Using Inexpensive Labels

TL;DR

Abstract

Cheap Thrills: Effective Amortized Optimization Using Inexpensive Labels

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (7)