Primitive Agentic First-Order Optimization

R. Sala

Primitive Agentic First-Order Optimization

R. Sala

TL;DR

The paper addresses budgeted first-order optimization by introducing Sequential Update Selection (SUS), an RL-based framework that chooses among simple update operators at each iteration using a compact state representation. By casting the optimization loop as a partially observable Markov decision process and applying epsilon-greedy Q-learning (SARSA) with a few discrete updates (e.g., GD, NAG), the approach learns policies that outperform a hyperparameter-tuned Nesterov Accelerated Gradient on unseen quadratic problems under a fixed budget. The main contributions are the SUS formulation, its low-dimensional state-action design, and empirical evidence that elementary RL methods can yield tangible efficiency gains with modest training complexity. The results suggest practical potential for agentic optimization on resource-constrained problems and point to future work on richer state representations, hierarchical policies, and cross-domain generalization, all within the computational-rationality paradigm.

Abstract

Efficient numerical optimization methods can improve performance and reduce the environmental impact of computing in many applications. This work presents a proof-of-concept study combining primitive state representations and agent-environment interactions as first-order optimizers in the setting of budget-limited optimization. Through reinforcement learning (RL) over a set of training instances of an optimization problem class, optimal policies for sequential update selection of algorithmic iteration steps are approximated in generally formulated low-dimensional partial state representations that consider aspects of progress and resource use. For the investigated case studies, deployment of the trained agents to unseen instances of the quadratic optimization problem classes outperformed conventional optimal algorithms with optimized hyperparameters. The results show that elementary RL methods combined with succinct partial state representations can be used as heuristics to manage complexity in RL-based optimization, paving the way for agentic optimization approaches.

Primitive Agentic First-Order Optimization

TL;DR

Abstract

Paper Structure (17 sections, 14 equations, 5 figures, 1 algorithm)

This paper contains 17 sections, 14 equations, 5 figures, 1 algorithm.

INTRODUCTION AND MOTIVATION
Related Work
Accelerated Gradient Methods
Reinforcement Learning based Optimization
Problem setting and Methods
Conventional Accelerated Gradient Methods
Agent-Environment based Sequential Update Selection
SUS Details and Implementation Examples
Action Selection
Q-Values, returns, and updates
Concrete Action Sets
State Representations
Reward models
Case Studies and Results
Setup
...and 2 more sections

Figures (5)

Figure 1: Agent-Environment interaction process for Sequential Update Selection (SUS) in unconstrained first-order optimization. For $k=0,1,\ldots, K$.
Figure 2: Medians and (0.25,0.75) quantiles of objective values for successive iterations, of SUS-NAG and Hyperoptimized-NAG, for parameters $d=100$, $K=100$, $\kappa \sim \text{Uniform}[1E2,1E3]$, $m_1=20$, $m_2=40$, $\mathcal{H}_2$. Training parameters $\epsilon_0=0.99$, $\alpha_0=0.3$, decaying to 0.5% of their initial value at final episode 12800.
Figure 3: Example of a greedy policy table, where the colors represent the optimal action index in action set $\mathcal{H}_3$ for all states. Problem parameters $d=10$, $K=20$, $\kappa \sim \text{Uniform}[1E3,1E3]$, $m_1=20$, $m_2=20$, $\mathcal{H}_3$. Training parameters $\epsilon_0=0.99$, $\alpha_0=0.3$, decaying to 0.5% of their initial value at final episode $N=12800$.
Figure 4: Means and standard deviations of average relative objective value improvement: $(y_K^{\text{NAG}}-y_K^{\text{SUS}-\mathcal{H}_1})/y_K^{\text{NAG}}$, for parameters $d=100$, $K=50$, $\kappa \sim \text{Unif.}[1E2,1E3]$, $\mathcal{H}_1$, $m_1=10$, $m_2=20$, as well as $m_1=20$, $m_2=40$. Training parameters: $\epsilon_0=0.99$, $\alpha_0=0.3$, decaying to 0.5% of their initial value at the final episode.
Figure 5: Medians and (0.25, 0.75) quantiles of the relative run time reduction: $(k_T^{\text{NAG}}-k_T^{\text{SUS}-\mathcal{H}_2})/k_T^{\text{NAG}}$ for increasing problem dimension. Parameters: $\kappa \sim \text{Unif.}[1E2,1E3]$, $m_1=20$, $m_2=40$, $\mathcal{H}_2$. Training parameters $\epsilon_0=0.99$, $\alpha_0=0.3$, decaying to 0.5% of their initial value at the final episode 12800.

Primitive Agentic First-Order Optimization

TL;DR

Abstract

Primitive Agentic First-Order Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (5)