Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling

Yuejiang Liu; Jubayer Ibn Hamid; Annie Xie; Yoonho Lee; Maximilian Du; Chelsea Finn

Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling

Yuejiang Liu, Jubayer Ibn Hamid, Annie Xie, Yoonho Lee, Maximilian Du, Chelsea Finn

TL;DR

This paper analyzes the tradeoffs of action chunking in imitation learning for robotics, showing that longer chunks improve temporal dependency modeling but reduce reactivity to unforeseen changes. It introduces Bidirectional Decoding (BID), a test-time sampling strategy that combines backward coherence with forward contrast to select action chunks that balance long-term consistency and short-term reactivity. Across diagnostic, simulation, and real-world tasks, BID consistently improves performance and is shown to be scalable and compatible with existing inference methods, albeit with added computational cost. The work provides a practical plug-in for enhancing generative behavioral cloning in dynamic environments.

Abstract

Predicting and executing a sequence of actions without intermediate replanning, known as action chunking, is increasingly used in robot learning from human demonstrations. Yet, its effects on the learned policy remain inconsistent: some studies find it crucial for achieving strong results, while others observe decreased performance. In this paper, we first dissect how action chunking impacts the divergence between a learner and a demonstrator. We find that action chunking allows the learner to better capture the temporal dependencies in demonstrations but at the cost of reduced reactivity to unexpected states. To address this tradeoff, we propose Bidirectional Decoding (BID), a test-time inference algorithm that bridges action chunking with closed-loop adaptation. At each timestep, BID samples multiple candidate predictions and searches for the optimal one based on two criteria: (i) backward coherence, which favors samples that align with previous decisions; (ii) forward contrast, which seeks samples of high likelihood for future plans. By coupling decisions within and across action chunks, BID promotes both long-term consistency and short-term reactivity. Experimental results show that our method boosts the performance of two state-of-the-art generative policies across seven simulation benchmarks and two real-world tasks. Code and videos are available at https://bid-robot.github.io.

Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling

TL;DR

Abstract

Paper Structure (51 sections, 10 theorems, 59 equations, 15 figures, 6 tables, 1 algorithm)

This paper contains 51 sections, 10 theorems, 59 equations, 15 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Analysis: Tradeoffs in Action Chunking
Preliminaries
Analysis
Method: Bidirectional Decoding
Test-Time Search
Bidirectional Criteria
Experiments
One-Dimensional Diagnostic Experiments
Simulation Experiments with Stochastic Noise
Comparison with Existing Inference Methods
Scalability and Compatibility of BID
Generality and Efficiency of BID
Real-world Experiments with Dynamic Objects
...and 36 more sections

Key Result

Proposition 1

Let $\mathcal{L}$ be a non-linear and non-negative convex function measuring the prediction error with respect to demonstrations. Let $C \coloneqq \{a_{t-h:t-1}\} \cup \mathcal{S}^+$ where $\mathcal{S}^+$ are the common states that both $\pi_h$ and $\pi_{h+d}$ observe. For the ease of notation, let

Figures (15)

Figure 1: Illustration of different inference methods applied to a robot policy with action chunking. The robot is tasked with catching a moving trolley. (a) Vanilla action chunking zhaoLearningFineGrainedBimanual2023 executes actions based on previous predictions, resulting in delayed reactions to object motions. (b) Receding horizon chiDiffusionPolicyVisuomotor2023 enables faster reactions, but leads to a jittery trajectory in the presence of multimodal demonstrations (e.g., both left- and right-handers). (c) Our Bidirectional Decoding explicitly searches for the optimal action from multiple predictions sampled at each time step, achieving both long-term consistency and short-term reactivity.
Figure 2: Illustration of the expert decision process, where a latent variable introduces temporal dependencies in actions.
Figure 3: Illustration of $(k, 1)$-expert, $(c, h)$-learner, and $(c, h+d)$-learner. Shaded regions represent observed history; darker indicate greater influence on current decision.
Figure 4: Illustration of bidirectional decoding.
Figure 5: Effect of action horizon $h$ on idle actions in 1-dimensional simulations. All policies share the same prediction length $l$. Long action horizons lead to idle distributions closer to the long-idle expert in low-noise environments, whereas shorter action horizons align more closely with the short-idle expert in high-noise environments. When both idling and noise are non-negligible, a moderate action horizon performs the best.
...and 10 more figures

Theorems & Definitions (20)

Proposition 1: Consistency-Reactivity Inequalities
Corollary 2: Consistency
Corollary 3: Reactivity
Lemma 4
proof
Lemma 5
proof
Proposition 6
proof
Proposition 7
...and 10 more

Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling

TL;DR

Abstract

Bidirectional Decoding: Improving Action Chunking via Guided Test-Time Sampling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (20)