$χ_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies

Checheng Yu; Chonghao Sima; Gangcheng Jiang; Hai Zhang; Haoguang Mai; Hongyang Li; Huijie Wang; Jin Chen; Kaiyang Wu; Li Chen; Lirui Zhao; Modi Shi; Ping Luo; Qingwen Bu; Shijia Peng; Tianyu Li; Yibo Yuan

$χ_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies

Checheng Yu, Chonghao Sima, Gangcheng Jiang, Hai Zhang, Haoguang Mai, Hongyang Li, Huijie Wang, Jin Chen, Kaiyang Wu, Li Chen, Lirui Zhao, Modi Shi, Ping Luo, Qingwen Bu, Shijia Peng, Tianyu Li, Yibo Yuan

TL;DR

The paper addresses robustness gaps in real-world, long-horizon robotic manipulation by modeling three distributional pillars—training data ($P_{train}$), model bias ($Q_{model}$), and deployment trajectories ($P_{test}$)—and proposes χ$_0$, a resource-efficient framework. It combines Model Arithmetic (weight-space merging of policies trained on data subsets), Stage Advantage (stage-conditioned, low-variance progress signals for long-horizon tasks), and Train-Deploy Alignment (inference-augmented data and temporal smoothing) to align $P_{train}$, $Q_{model}$, and $P_{test}$. Empirical results on two dual-arm garment manipulation systems show that χ$_0$ surpasses the open-source π$_{0.5}$ baseline by about 250% in success rate with only ~20 hours of demonstrations on 8×A100 GPUs and can operate autonomously for 24 hours; ablations demonstrate the complementary value of MA, SA, and TDA. The work offers a practical, data-efficient path toward production-level robustness in complex manipulation tasks and provides code, data, and models to the community.

Abstract

High-reliability long-horizon robotic manipulation has traditionally relied on large-scale data and compute to understand complex real-world dynamics. However, we identify that the primary bottleneck to real-world robustness is not resource scale alone, but the distributional shift among the human demonstration distribution, the inductive bias learned by the policy, and the test-time execution distribution -- a systematic inconsistency that causes compounding errors in multi-stage tasks. To mitigate these inconsistencies, we propose $χ_{0}$, a resource-efficient framework with effective modules designated to achieve production-level robustness in robotic manipulation. Our approach builds off three technical pillars: (i) Model Arithmetic, a weight-space merging strategy that efficiently soaks up diverse distributions of different demonstrations, varying from object appearance to state variations; (ii) Stage Advantage, a stage-aware advantage estimator that provides stable, dense progress signals, overcoming the numerical instability of prior non-stage approaches; and (iii) Train-Deploy Alignment, which bridges the distribution gap via spatio-temporal augmentation, heuristic DAgger corrections, and temporal chunk-wise smoothing. $χ_{0}$ enables two sets of dual-arm robots to collaboratively orchestrate long-horizon garment manipulation, spanning tasks from flattening, folding, to hanging different clothes. Our method exhibits high-reliability autonomy; we are able to run the system from arbitrary initial state for consecutive 24 hours non-stop. Experiments validate that $χ_{0}$ surpasses the state-of-the-art $π_{0.5}$ in success rate by nearly 250%, with only 20-hour data and 8 A100 GPUs. Code, data and models will be released to facilitate the community.

$χ_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies

TL;DR

The paper addresses robustness gaps in real-world, long-horizon robotic manipulation by modeling three distributional pillars—training data (

), model bias (

), and deployment trajectories (

)—and proposes χ

, a resource-efficient framework. It combines Model Arithmetic (weight-space merging of policies trained on data subsets), Stage Advantage (stage-conditioned, low-variance progress signals for long-horizon tasks), and Train-Deploy Alignment (inference-augmented data and temporal smoothing) to align

, and

. Empirical results on two dual-arm garment manipulation systems show that χ

surpasses the open-source π

baseline by about 250% in success rate with only ~20 hours of demonstrations on 8×A100 GPUs and can operate autonomously for 24 hours; ablations demonstrate the complementary value of MA, SA, and TDA. The work offers a practical, data-efficient path toward production-level robustness in complex manipulation tasks and provides code, data, and models to the community.

Abstract

, a resource-efficient framework with effective modules designated to achieve production-level robustness in robotic manipulation. Our approach builds off three technical pillars: (i) Model Arithmetic, a weight-space merging strategy that efficiently soaks up diverse distributions of different demonstrations, varying from object appearance to state variations; (ii) Stage Advantage, a stage-aware advantage estimator that provides stable, dense progress signals, overcoming the numerical instability of prior non-stage approaches; and (iii) Train-Deploy Alignment, which bridges the distribution gap via spatio-temporal augmentation, heuristic DAgger corrections, and temporal chunk-wise smoothing.

enables two sets of dual-arm robots to collaboratively orchestrate long-horizon garment manipulation, spanning tasks from flattening, folding, to hanging different clothes. Our method exhibits high-reliability autonomy; we are able to run the system from arbitrary initial state for consecutive 24 hours non-stop. Experiments validate that

surpasses the state-of-the-art

in success rate by nearly 250%, with only 20-hour data and 8 A100 GPUs. Code, data and models will be released to facilitate the community.

Paper Structure (40 sections, 2 equations, 19 figures, 2 tables, 1 algorithm)

This paper contains 40 sections, 2 equations, 19 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Imitation Learning and Policy Deployment in Real-world
Model Merging and Weight Interpolation
Advantage Estimation for Long-Horizon Tasks
Methodology
Preliminary and Problem Setup
Pipeline of $\chi_0$ system
Model Arithmetic
Stage Advantage
Train-Deploy-Alignment
Experiments
Evaluation Tasks and Metrics
Data collection and Training Strategy
Baselines and Ablation Design
...and 25 more sections

Figures (19)

Figure 1: Top: System overview. A robot teamwork system with two dual-arm ALOHA robots performing long-horizon collaborative garment manipulation, including flattening, folding and hanging. Bottom: Technical philosophy and performance. Distributional inconsistencies are inherent to robot learning ($P_\text{train}$: expert demonstrations; $Q_\text{model}$: policy inductive bias; $P_\text{test}$: deployment trajectories). $\chi_0$ systematically resolves these pairwise mismatches: Model Arithmetic aligns $Q_\text{model}$ with $P_\text{train}$; Train-Deploy Alignment bridges $P_\text{train}$ and $P_\text{test}$; and Stage Advantage optimizes $Q_\text{model}$ for $P_\text{test}$. The contributions of these modules collectively enable $\chi_0$ to surpass the baseline $\pi_{0.5}$black2025pi05 in terms of success rate by approximately 250%.
Figure 2: Pipeline of $\chi_0$. Our framework addresses distributional inconsistencies across three stages. (Left) $P_{\text{train}}$: heuristic DAgger and spatio-temporal augmentation expand training coverage, with Stage Annotation for advantage estimation; (Middle) $Q_{\text{model}}$: Model Arithmetic merges complementary policies in weight space, guided by stage-aware advantage; (Right) $P_{\text{test}}$: temporal chunk-wise smoothing ensures execution accuracy, while on-policy DAgger enables closed-loop refinement.
Figure 3: Souping strategies in Model Arithmetic. Policies trained on separate subsets are merged via weighted interpolation. (Top) Inverse Loss assigns higher coefficients to models with lower validation loss. (Bottom) Other strategies.
Figure 4: Cumulative value based on SA. Red/green stands for negative/positive. Top: Task A shows slip fails and recovery; Middle: Task B shows fetching and cloth misplace; Bottom: Task C shows pull-over and visual occlusion.
Figure 5: Train-Deploy-Alignment strategies and T-SNE visualizations. Left: three complementary strategies for distribution alignment. Right: T-SNE visualizations showing progressive distribution alignment as each strategy is applied.
...and 14 more figures

$χ_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies

TL;DR

Abstract

$χ_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies

Authors

TL;DR

Abstract

Table of Contents

Figures (19)