Dataset Distillation for Offline Reinforcement Learning
Jonathan Light, Yuanzhe Liu, Ziniu Hu
TL;DR
This paper tackles offline reinforcement learning by proposing dataset distillation via gradient matching to synthesize a compact, high-quality training dataset from a given offline collection. By aligning the BC gradient signals between real and synthetic data, the method enables a policy trained on a small synthetic set to achieve performance comparable to or better than training on the full offline dataset or percentile-filtered baselines, as demonstrated on Procgen tasks. The key contribution is the synthetic-data approach, which improves data efficiency and generalization in offline RL, while revealing practical limitations such as action-imbalance effects in certain environments. The work suggests a promising direction for data-centric RL where smaller, curated datasets enable robust policy learning with reduced computational and data collection demands.
Abstract
Offline reinforcement learning often requires a quality dataset that we can train a policy on. However, in many situations, it is not possible to get such a dataset, nor is it easy to train a policy to perform well in the actual environment given the offline data. We propose using data distillation to train and distill a better dataset which can then be used for training a better policy model. We show that our method is able to synthesize a dataset where a model trained on it achieves similar performance to a model trained on the full dataset or a model trained using percentile behavioral cloning. Our project site is available at https://datasetdistillation4rl.github.io . We also provide our implementation at https://github.com/ggflow123/DDRL .
