Continual Learning, Not Training: Online Adaptation For Agents
Aman Jaglan, Jarrod Barnes
TL;DR
The paper tackles the mismatch between traditional continual learning and real-time deployment by proposing a system-centric, gradient-free framework called ATLAS that decouples reasoning from execution and uses a Persistent Learning Memory to enable inference-time adaptation. Through a dual-agent Teacher–Student architecture guided by a-principled reward signals, ATLAS achieves efficient performance with significantly reduced token usage while matching or surpassing larger models on a cyber-threat benchmark. The approach yields transferable, causally annotated traces that can train explicit world models, and empirical results on ExCyTIn-Bench Incident #5 demonstrate substantial gains in both efficacy and efficiency, with cross-incident transfer showing generalization. Overall, system-centric continual learning offers a practical, deployable path toward adaptive AI that learns from use and provides audit-friendly traces for downstream modeling.
Abstract
Continual Learning (CL) methods have traditionally focused on mitigating catastrophic forgetting through gradient-based retraining, an approach ill-suited for deployed agents that must adapt in real time. We introduce our Adaptive Teaching and Learning System (ATLAS), a dual-agent architecture that decouples reasoning (Teacher) from execution (Student) and incorporates a persistent learning memory that stores distilled guidance from experience. This informs the orchestration layer, enabling the system to dynamically adjust its operational strategies, such as supervision level or initial plan selection, at inference time. In doing so, ATLAS achieves gradient-free continual learning, shifting the locus of adaptation from model parameters to system-level orchestration. We formulate this as a system-centric paradigm for continual learning, where the objective is adaptive efficiency: maximizing task success while minimizing computational cost through inference-time orchestration rather than parameter updates. Evaluated on Microsoft's ExCyTIn-Bench, an open-source benchmark simulating complex cyberthreat investigation, ATLAS achieves 54.1% success with GPT-5-mini as its Student, outperforming the larger GPT-5 (High) by 13% while reducing cost by 86%. Cross-incident validation demonstrates generalization: frozen pamphlets from Incident #5 improve accuracy from 28% to 41% with zero retraining, while shifting output composition from verbose exploration to structured reasoning. Together, these findings establish gradient-free continual learning as a viable path toward adaptive, deployable AI systems and provide causally annotated traces valuable for training explicit world models.
