Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control
Thomas T. Zhang, Daniel Pfrommer, Chaoyi Pan, Nikolai Matni, Max Simchowitz
TL;DR
This work addresses exponential trajectory-error growth in continuous-control imitation learning and analyzes two practical interventions: action chunking (AC) and exploratory data collection via noise injection. It shows that AC yields horizon-stable performance when open-loop dynamics are incrementally stable (EISS), with trajectory error bounded as $\bm{\mathsf{J}}_{\textsc{Traj},T}(\tilde{\pi}) \lesssim \bm{\mathsf{J}}_{\textsc{Demo},T}(\tilde{\pi})$ for chunk lengths $\ell$ exceeding a stability-dependent threshold, specifically $\ell > \log(1/\rho)^{-1} \log(\mathrm{poly}(L_{\pi}, C_{\mathrm{ISS}}))$. When the ambient dynamics are not open-loop stable, the authors show that a simple mixture data-collection strategy—noise injection with a mixture parameter $\alpha$ and noise level $\sigma_u$—guarantees a sharp bound of the form $\bm{\mathsf{J}}_{\textsc{Traj},T}(\hat{\pi}) \lesssim O(1) \bm{\mathsf{J}}_{\textsc{Demo},T}(\hat{\pi}; \mathbb{P}_{\pi^*,\sigma_u,\alpha})$, incorporating first-order supervision on the excitable subspace via the Jacobian. The theoretical results are complemented by experimental validation on robot-learning benchmarks (e.g., HalfCheetah, Humanoid, Robomimic), showing that both AC and noise-based exploration improve imitation performance, with AC requiring end-effector stabilization to realize gains in practice. Overall, the paper provides a stability-centric lens that yields tighter, horizon-robust imitation guarantees beyond prior coverage-based analyses and demonstrates practical recipe for horizon-insensitive imitation in continuous control.
Abstract
This paper presents a theoretical analysis of two of the most impactful interventions in modern learning from demonstration in robotics and continuous control: the practice of action-chunking (predicting sequences of actions in open-loop) and exploratory augmentation of expert demonstrations. Though recent results show that learning from demonstration, also known as imitation learning (IL), can suffer errors that compound exponentially with task horizon in continuous settings, we demonstrate that action chunking and exploratory data collection circumvent exponential compounding errors in different regimes. Our results identify control-theoretic stability as the key mechanism underlying the benefits of these interventions. On the empirical side, we validate our predictions and the role of control-theoretic stability through experimentation on popular robot learning benchmarks. On the theoretical side, we demonstrate that the control-theoretic lens provides fine-grained insights into how compounding error arises, leading to tighter statistical guarantees on imitation learning error when these interventions are applied than previous techniques based on information-theoretic considerations alone.
