OminiAdapt: Learning Cross-Task Invariance for Robust and Environment-Aware Robotic Manipulation
Yongxu Wang, Weiyun Yi, Xinhao Kong, Wanting Li
TL;DR
The paper tackles the challenge of covariate shift in imitation learning for humanoid robot manipulation in unstructured environments. It introduces OminiAdapt, a multimodal framework combining cross-view feature fusion with CBAM-based attention, continuous object tracking for background masking, and Dynamic Adaptive Batch Normalization to rapidly adapt to new tasks. Empirical results across clothes folding, apple picking, flower arrangement, and water pouring show notable improvements over baselines HIT and ACT, with ablations confirming the importance of masking strategies, attention modules, and partial BN freezing. The approach offers a scalable, environment-aware path toward robust, autonomous manipulation, though limitations remain in multi-perspective consistency and tactile modality integration.
Abstract
With the rapid development of embodied intelligence, leveraging large-scale human data for high-level imitation learning on humanoid robots has become a focal point of interest in both academia and industry. However, applying humanoid robots to precision operation domains remains challenging due to the complexities they face in perception and control processes, the long-standing physical differences in morphology and actuation mechanisms between humanoid robots and humans, and the lack of task-relevant features obtained from egocentric vision. To address the issue of covariate shift in imitation learning, this paper proposes an imitation learning algorithm tailored for humanoid robots. By focusing on the primary task objectives, filtering out background information, and incorporating channel feature fusion with spatial attention mechanisms, the proposed algorithm suppresses environmental disturbances and utilizes a dynamic weight update strategy to significantly improve the success rate of humanoid robots in accomplishing target tasks. Experimental results demonstrate that the proposed method exhibits robustness and scalability across various typical task scenarios, providing new ideas and approaches for autonomous learning and control in humanoid robots. The project will be open-sourced on GitHub.
