Self-Adapting Language Models
Adam Zweiger, Jyothish Pari, Han Guo, Ekin Akyürek, Yoon Kim, Pulkit Agrawal
TL;DR
This work tackles the rigidity of pretrained LLMs by enabling self-directed adaptation through self-generated training data and update directives. It introduces SEAL, a two-loop framework where a model generates self-edits and uses RL to maximize downstream performance after applying those edits, with an inner gradient-based update via supervised finetuning. Across knowledge incorporation and ARC few-shot learning, SEAL achieves notable gains over in-context learning and non-RL baselines, and demonstrates CPT benefits in knowledge tasks. The approach suggests a path toward continual, agentic LLMs that can autonomously augment their knowledge and capabilities with synthetic data, reducing reliance on external supervision.
Abstract
Large language models (LLMs) are powerful but static; they lack mechanisms to adapt their weights in response to new tasks, knowledge, or examples. We introduce Self-Adapting LLMs (SEAL), a framework that enables LLMs to self-adapt by generating their own finetuning data and update directives. Given a new input, the model produces a self-edit-a generation that may restructure the information in different ways, specify optimization hyperparameters, or invoke tools for data augmentation and gradient-based updates. Through supervised finetuning (SFT), these self-edits result in persistent weight updates, enabling lasting adaptation. To train the model to produce effective self-edits, we use a reinforcement learning loop with the downstream performance of the updated model as the reward signal. Unlike prior approaches that rely on separate adaptation modules or auxiliary networks, SEAL directly uses the model's own generation to control its adaptation process. Experiments on knowledge incorporation and few-shot generalization show that SEAL is a promising step toward language models capable of self-directed adaptation. Our website and code is available at https://jyopari.github.io/posts/seal.
