X-IL: Exploring the Design Space of Imitation Learning Policies
Xiaogang Jia, Atalay Donat, Xi Huang, Xuan Zhao, Denis Blessing, Hongyi Zhou, Han A. Wang, Hanyi Zhang, Qian Wang, Rudolf Lioutikov, Gerhard Neumann
TL;DR
X-IL introduces a modular, open-source framework for systematic exploration of imitation learning policies, decomposing the pipeline into observation representations, backbones, architectures, and policy representations. By supporting multi-modal inputs (RGB, point clouds, language), diverse backbones (Transformer, Mamba, xLSTM), and policy forms (BC, diffusion, flow), it enables controlled ablations and rapid prototyping. Empirical results on LIBERO and RoboCasa show state-of-the-art performance and data efficiency, with insights that sequence models like Mamba and xLSTM can outperform Transformers under comparable budgets and that robot-specific encoders and well-designed multi-modal fusion are crucial. Overall, X-IL provides a practical, scalable resource for practitioners and researchers to design, compare, and generalize IL policies across varied robotic tasks.
Abstract
Designing modern imitation learning (IL) policies requires making numerous decisions, including the selection of feature encoding, architecture, policy representation, and more. As the field rapidly advances, the range of available options continues to grow, creating a vast and largely unexplored design space for IL policies. In this work, we present X-IL, an accessible open-source framework designed to systematically explore this design space. The framework's modular design enables seamless swapping of policy components, such as backbones (e.g., Transformer, Mamba, xLSTM) and policy optimization techniques (e.g., Score-matching, Flow-matching). This flexibility facilitates comprehensive experimentation and has led to the discovery of novel policy configurations that outperform existing methods on recent robot learning benchmarks. Our experiments demonstrate not only significant performance gains but also provide valuable insights into the strengths and weaknesses of various design choices. This study serves as both a practical reference for practitioners and a foundation for guiding future research in imitation learning.
