Primitive Agentic First-Order Optimization
R. Sala
TL;DR
The paper addresses budgeted first-order optimization by introducing Sequential Update Selection (SUS), an RL-based framework that chooses among simple update operators at each iteration using a compact state representation. By casting the optimization loop as a partially observable Markov decision process and applying epsilon-greedy Q-learning (SARSA) with a few discrete updates (e.g., GD, NAG), the approach learns policies that outperform a hyperparameter-tuned Nesterov Accelerated Gradient on unseen quadratic problems under a fixed budget. The main contributions are the SUS formulation, its low-dimensional state-action design, and empirical evidence that elementary RL methods can yield tangible efficiency gains with modest training complexity. The results suggest practical potential for agentic optimization on resource-constrained problems and point to future work on richer state representations, hierarchical policies, and cross-domain generalization, all within the computational-rationality paradigm.
Abstract
Efficient numerical optimization methods can improve performance and reduce the environmental impact of computing in many applications. This work presents a proof-of-concept study combining primitive state representations and agent-environment interactions as first-order optimizers in the setting of budget-limited optimization. Through reinforcement learning (RL) over a set of training instances of an optimization problem class, optimal policies for sequential update selection of algorithmic iteration steps are approximated in generally formulated low-dimensional partial state representations that consider aspects of progress and resource use. For the investigated case studies, deployment of the trained agents to unseen instances of the quadratic optimization problem classes outperformed conventional optimal algorithms with optimized hyperparameters. The results show that elementary RL methods combined with succinct partial state representations can be used as heuristics to manage complexity in RL-based optimization, paving the way for agentic optimization approaches.
