Data Agent: Learning to Select Data via End-to-End Dynamic Optimization

Suorong Yang; Fangjian Su; Hai Gan; Ziqi Ye; Jie Li; Baile Xu; Furao Shen; Soujanya Poria

Data Agent: Learning to Select Data via End-to-End Dynamic Optimization

Suorong Yang, Fangjian Su, Hai Gan, Ziqi Ye, Jie Li, Baile Xu, Furao Shen, Soujanya Poria

TL;DR

Data Agent is proposed, an end-to-end dynamic data selection framework that formulates data selection as a training-aware sequential decision-making problem, guided by a composite reward that integrates loss-based difficulty and confidence-based uncertainty signals.

Abstract

Dynamic Data selection aims to accelerate training by prioritizing informative samples during online training. However, existing methods typically rely on task-specific handcrafted metrics or static/snapshot-based criteria to estimate sample importance, limiting scalability across learning paradigms and making it difficult to capture the evolving utility of data throughout training. To address this challenge, we propose Data Agent, an end-to-end dynamic data selection framework that formulates data selection as a training-aware sequential decision-making problem. The agent learns a sample-wise selection policy that co-evolves with model optimization, guided by a composite reward that integrates loss-based difficulty and confidence-based uncertainty signals. The reward signals capture complementary objectives of optimization impact and information gain, together with a tuning-free adaptive weighting mechanism that balances these signals over training. Extensive experiments across a wide range of datasets and architectures demonstrate that Data Agent consistently accelerates training while preserving or improving performance, e.g., reducing costs by over 50\% on ImageNet-1k and MMLU with lossless performance. Moreover, its dataset-agnostic formulation and modular reward make it plug-and-play across tasks and scenarios, e.g., robustness to noisy datasets, highlighting its potential in real-world scenarios.

Data Agent: Learning to Select Data via End-to-End Dynamic Optimization

TL;DR

Abstract

Paper Structure (15 sections, 2 theorems, 10 equations, 4 figures, 8 tables)

This paper contains 15 sections, 2 theorems, 10 equations, 4 figures, 8 tables.

Introduction
Related Work
Static Data Selection
Dynamic Data Selection
The Proposed Method
Reinforcement Learning Formulation of Data Selection
Experiment
Experiment Setup
Performance Comparison
Generalization to More Advanced Architectures
Generalization to Different Training Paradigms
Generalization under Distribution Shift
Robustness to Noisy Scenarios
Ablation Study
Conclusion

Key Result

Proposition 3.1

Let $f_\theta$ be a network trained with the CE loss $\ell(x,y)$ and softmax outputs $p_\theta(y|x)$. Under a first-order SGD update, the expected magnitude of the parameter update induced by a sample $(x_i,y_i)$ satisfies and is therefore a monotonic function of the training loss $\ell(x_i,y_i) = -\log p_\theta(y_i \mid x_i)$.

Figures (4)

Figure 1: (a) End-to-end dynamic data selection. Existing methods often rely on handcrafted, task-specific static heuristics to estimate sample importance, limiting the scalability across learning paradigms. In contrast, our framework formulates data selection as a learning problem and jointly optimizes it with model training in a plug-and-play manner, forming a closed-loop, training-aware selection process. (b) Illustration of data points prioritized by difficulty and uncertainty signals. The uncertainty signal concentrates on the inter-cluster boundaries and transitional regions, while the difficulty signal focuses more on the sparse cluster areas.
Figure 2: The framework of the proposed Data Agent. At each training stage, the agent observes the model state and derives reward signals from standard forward passes. These signals are combined using an adaptive weighting mechanism to guide a PPO-based actor-critic agent, which learns the selection policy. The selected data is used in subsequent training, forming a closed-loop training pipeline where data selection evolves alongside model optimization. Notably, the modular reward design enables the framework to be easily adapted to various learning paradigms.
Figure 3: Performance and saved costs on ImageNet-1k across Swin-T, ViT-B, and ViT-L on a 4-A100-GPU server. We report the total GPU hours.
Figure 4: Effect of the RL agent on CIFAR-100 and Tiny-ImageNet under different selection ratios.

Theorems & Definitions (2)

Proposition 3.1
Proposition 3.2

Data Agent: Learning to Select Data via End-to-End Dynamic Optimization

TL;DR

Abstract

Data Agent: Learning to Select Data via End-to-End Dynamic Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (2)