Table of Contents
Fetching ...

Primitive Agentic First-Order Optimization

R. Sala

TL;DR

The paper addresses budgeted first-order optimization by introducing Sequential Update Selection (SUS), an RL-based framework that chooses among simple update operators at each iteration using a compact state representation. By casting the optimization loop as a partially observable Markov decision process and applying epsilon-greedy Q-learning (SARSA) with a few discrete updates (e.g., GD, NAG), the approach learns policies that outperform a hyperparameter-tuned Nesterov Accelerated Gradient on unseen quadratic problems under a fixed budget. The main contributions are the SUS formulation, its low-dimensional state-action design, and empirical evidence that elementary RL methods can yield tangible efficiency gains with modest training complexity. The results suggest practical potential for agentic optimization on resource-constrained problems and point to future work on richer state representations, hierarchical policies, and cross-domain generalization, all within the computational-rationality paradigm.

Abstract

Efficient numerical optimization methods can improve performance and reduce the environmental impact of computing in many applications. This work presents a proof-of-concept study combining primitive state representations and agent-environment interactions as first-order optimizers in the setting of budget-limited optimization. Through reinforcement learning (RL) over a set of training instances of an optimization problem class, optimal policies for sequential update selection of algorithmic iteration steps are approximated in generally formulated low-dimensional partial state representations that consider aspects of progress and resource use. For the investigated case studies, deployment of the trained agents to unseen instances of the quadratic optimization problem classes outperformed conventional optimal algorithms with optimized hyperparameters. The results show that elementary RL methods combined with succinct partial state representations can be used as heuristics to manage complexity in RL-based optimization, paving the way for agentic optimization approaches.

Primitive Agentic First-Order Optimization

TL;DR

The paper addresses budgeted first-order optimization by introducing Sequential Update Selection (SUS), an RL-based framework that chooses among simple update operators at each iteration using a compact state representation. By casting the optimization loop as a partially observable Markov decision process and applying epsilon-greedy Q-learning (SARSA) with a few discrete updates (e.g., GD, NAG), the approach learns policies that outperform a hyperparameter-tuned Nesterov Accelerated Gradient on unseen quadratic problems under a fixed budget. The main contributions are the SUS formulation, its low-dimensional state-action design, and empirical evidence that elementary RL methods can yield tangible efficiency gains with modest training complexity. The results suggest practical potential for agentic optimization on resource-constrained problems and point to future work on richer state representations, hierarchical policies, and cross-domain generalization, all within the computational-rationality paradigm.

Abstract

Efficient numerical optimization methods can improve performance and reduce the environmental impact of computing in many applications. This work presents a proof-of-concept study combining primitive state representations and agent-environment interactions as first-order optimizers in the setting of budget-limited optimization. Through reinforcement learning (RL) over a set of training instances of an optimization problem class, optimal policies for sequential update selection of algorithmic iteration steps are approximated in generally formulated low-dimensional partial state representations that consider aspects of progress and resource use. For the investigated case studies, deployment of the trained agents to unseen instances of the quadratic optimization problem classes outperformed conventional optimal algorithms with optimized hyperparameters. The results show that elementary RL methods combined with succinct partial state representations can be used as heuristics to manage complexity in RL-based optimization, paving the way for agentic optimization approaches.
Paper Structure (17 sections, 14 equations, 5 figures, 1 algorithm)

This paper contains 17 sections, 14 equations, 5 figures, 1 algorithm.

Figures (5)

  • Figure 1: Agent-Environment interaction process for Sequential Update Selection (SUS) in unconstrained first-order optimization. For $k=0,1,\ldots, K$.
  • Figure 2: Medians and (0.25,0.75) quantiles of objective values for successive iterations, of SUS-NAG and Hyperoptimized-NAG, for parameters $d=100$, $K=100$, $\kappa \sim \text{Uniform}[1E2,1E3]$, $m_1=20$, $m_2=40$, $\mathcal{H}_2$. Training parameters $\epsilon_0=0.99$, $\alpha_0=0.3$, decaying to 0.5% of their initial value at final episode 12800.
  • Figure 3: Example of a greedy policy table, where the colors represent the optimal action index in action set $\mathcal{H}_3$ for all states. Problem parameters $d=10$, $K=20$, $\kappa \sim \text{Uniform}[1E3,1E3]$, $m_1=20$, $m_2=20$, $\mathcal{H}_3$. Training parameters $\epsilon_0=0.99$, $\alpha_0=0.3$, decaying to 0.5% of their initial value at final episode $N=12800$.
  • Figure 4: Means and standard deviations of average relative objective value improvement: $(y_K^{\text{NAG}}-y_K^{\text{SUS}-\mathcal{H}_1})/y_K^{\text{NAG}}$, for parameters $d=100$, $K=50$, $\kappa \sim \text{Unif.}[1E2,1E3]$, $\mathcal{H}_1$, $m_1=10$, $m_2=20$, as well as $m_1=20$, $m_2=40$. Training parameters: $\epsilon_0=0.99$, $\alpha_0=0.3$, decaying to 0.5% of their initial value at the final episode.
  • Figure 5: Medians and (0.25, 0.75) quantiles of the relative run time reduction: $(k_T^{\text{NAG}}-k_T^{\text{SUS}-\mathcal{H}_2})/k_T^{\text{NAG}}$ for increasing problem dimension. Parameters: $\kappa \sim \text{Unif.}[1E2,1E3]$, $m_1=20$, $m_2=40$, $\mathcal{H}_2$. Training parameters $\epsilon_0=0.99$, $\alpha_0=0.3$, decaying to 0.5% of their initial value at the final episode 12800.