Table of Contents
Fetching ...

Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control

Thomas T. Zhang, Daniel Pfrommer, Chaoyi Pan, Nikolai Matni, Max Simchowitz

TL;DR

This work addresses exponential trajectory-error growth in continuous-control imitation learning and analyzes two practical interventions: action chunking (AC) and exploratory data collection via noise injection. It shows that AC yields horizon-stable performance when open-loop dynamics are incrementally stable (EISS), with trajectory error bounded as $\bm{\mathsf{J}}_{\textsc{Traj},T}(\tilde{\pi}) \lesssim \bm{\mathsf{J}}_{\textsc{Demo},T}(\tilde{\pi})$ for chunk lengths $\ell$ exceeding a stability-dependent threshold, specifically $\ell > \log(1/\rho)^{-1} \log(\mathrm{poly}(L_{\pi}, C_{\mathrm{ISS}}))$. When the ambient dynamics are not open-loop stable, the authors show that a simple mixture data-collection strategy—noise injection with a mixture parameter $\alpha$ and noise level $\sigma_u$—guarantees a sharp bound of the form $\bm{\mathsf{J}}_{\textsc{Traj},T}(\hat{\pi}) \lesssim O(1) \bm{\mathsf{J}}_{\textsc{Demo},T}(\hat{\pi}; \mathbb{P}_{\pi^*,\sigma_u,\alpha})$, incorporating first-order supervision on the excitable subspace via the Jacobian. The theoretical results are complemented by experimental validation on robot-learning benchmarks (e.g., HalfCheetah, Humanoid, Robomimic), showing that both AC and noise-based exploration improve imitation performance, with AC requiring end-effector stabilization to realize gains in practice. Overall, the paper provides a stability-centric lens that yields tighter, horizon-robust imitation guarantees beyond prior coverage-based analyses and demonstrates practical recipe for horizon-insensitive imitation in continuous control.

Abstract

This paper presents a theoretical analysis of two of the most impactful interventions in modern learning from demonstration in robotics and continuous control: the practice of action-chunking (predicting sequences of actions in open-loop) and exploratory augmentation of expert demonstrations. Though recent results show that learning from demonstration, also known as imitation learning (IL), can suffer errors that compound exponentially with task horizon in continuous settings, we demonstrate that action chunking and exploratory data collection circumvent exponential compounding errors in different regimes. Our results identify control-theoretic stability as the key mechanism underlying the benefits of these interventions. On the empirical side, we validate our predictions and the role of control-theoretic stability through experimentation on popular robot learning benchmarks. On the theoretical side, we demonstrate that the control-theoretic lens provides fine-grained insights into how compounding error arises, leading to tighter statistical guarantees on imitation learning error when these interventions are applied than previous techniques based on information-theoretic considerations alone.

Action Chunking and Exploratory Data Collection Yield Exponential Improvements in Behavior Cloning for Continuous Control

TL;DR

This work addresses exponential trajectory-error growth in continuous-control imitation learning and analyzes two practical interventions: action chunking (AC) and exploratory data collection via noise injection. It shows that AC yields horizon-stable performance when open-loop dynamics are incrementally stable (EISS), with trajectory error bounded as for chunk lengths exceeding a stability-dependent threshold, specifically . When the ambient dynamics are not open-loop stable, the authors show that a simple mixture data-collection strategy—noise injection with a mixture parameter and noise level —guarantees a sharp bound of the form , incorporating first-order supervision on the excitable subspace via the Jacobian. The theoretical results are complemented by experimental validation on robot-learning benchmarks (e.g., HalfCheetah, Humanoid, Robomimic), showing that both AC and noise-based exploration improve imitation performance, with AC requiring end-effector stabilization to realize gains in practice. Overall, the paper provides a stability-centric lens that yields tighter, horizon-robust imitation guarantees beyond prior coverage-based analyses and demonstrates practical recipe for horizon-insensitive imitation in continuous control.

Abstract

This paper presents a theoretical analysis of two of the most impactful interventions in modern learning from demonstration in robotics and continuous control: the practice of action-chunking (predicting sequences of actions in open-loop) and exploratory augmentation of expert demonstrations. Though recent results show that learning from demonstration, also known as imitation learning (IL), can suffer errors that compound exponentially with task horizon in continuous settings, we demonstrate that action chunking and exploratory data collection circumvent exponential compounding errors in different regimes. Our results identify control-theoretic stability as the key mechanism underlying the benefits of these interventions. On the empirical side, we validate our predictions and the role of control-theoretic stability through experimentation on popular robot learning benchmarks. On the theoretical side, we demonstrate that the control-theoretic lens provides fine-grained insights into how compounding error arises, leading to tighter statistical guarantees on imitation learning error when these interventions are applied than previous techniques based on information-theoretic considerations alone.

Paper Structure

This paper contains 40 sections, 34 theorems, 185 equations, 12 figures, 2 algorithms.

Key Result

Theorem A

There exists families $\mathcal{P}_{\mathrm{stab}}$ and $\mathcal{P}_{\mathrm{unst}}$ of policies and dynamics such that:

Figures (12)

  • Figure 1: We analyze two common practices in Imitation Learning: Action Chunking (\ref{['inter:chunk']}, left), and Exploratory Data Collection via Noise Injection (\ref{['inter:explore']}, right). We show in \ref{['sec:actionchunk']} how Action Chunking guarantees stable behavior of learned policies by chaining sufficiently long open-loop segments of predicted actions, provided the open-loop dynamics is stable. We show in \ref{['sec:noise_injection']} how augmenting some expert trajectories with Noise Injection provides supervision on directions around expert trajectories that are most susceptible to compounding errors, which may not be witnessed in nominal (optimal!) expert execution.
  • Figure 2: Visualization of the benefits of action-chunking (\ref{['inter:chunk']}) and noise-injection (\ref{['inter:explore']}). Left: even on synthetic globally stable (\ref{['def:exp_dISS']}) dynamics $f$, frequent feedback can cause exponential compounding error, which action-chunking mitigates. Center: HalfCheetah-v5 environment. We see sufficiently large white-noise injection yields significant performance improvement, on par with more advanced iterative methods. Right: Humanoid-v5 environment. Iterative methods like DAgger and Dart can be suboptimal due to poor learned policy rollouts or aggressive noise-covariance shaping, while naive noise-injection reliably provides the necessary local exploration; error bars omitted for clarity. Experiment details in §\ref{['sec:experiments']} and \ref{['appdx: experiments']}.
  • Figure 3: A comparison of open-loop control, where the policy generates actions without accessing the system state, and closed-loop control, where the policy's generated actions condition on the system state. While action-chunks are generated closed-loop, the actions within a chunk are executed "open-loop."
  • Figure 4: A visualization of EISS (\ref{['def:exp_dISS']}), which guarantees pairwise contraction of trajectories.
  • Figure 5: Success rates as a function of evaluated action-chunk lengths on the challenging $\texttt{robomimic}$ "tool_hang" environment with full-state observations. Left: Each line corresponds to a model trained for a given prediction horizon on 100 expert trajectories. Each point corresponds to the model evaluating a given chunk length ranging from receding-horizon ($\ell = 1$) to the full chunk. While prediction horizon has some (transient) effect, evaluating slightly longer chunks improves success drastically. Right: We repeat a similar set-up with 50 expert training trajectories. We see that noise-injection (\ref{['inter:explore']}) can also synergize in this open-loop stable setting (see \ref{['sec:experiments']}), though requires modifying the data-collecting procedure rather than simply adjusting policy parameterization and evaluation as in AC.
  • ...and 7 more figures

Theorems & Definitions (65)

  • Definition 2.1: EISS, \ref{['fig:diss']}
  • Theorem A: Motivating lower bounds, Informal vers. of simchowitz2025pitfalls
  • Definition 3.1: Chunking Policy
  • Definition 3.2: Induced Chunking Policy
  • Proposition 3.0
  • Proposition 3.1
  • Theorem A
  • Definition 4.1
  • Definition 4.2: Jacobian Linearization
  • Definition 4.3: Linearized Controllability Gramian
  • ...and 55 more