Pelican-VL 1.0: A Foundation Brain Model for Embodied Intelligence
Yi Zhang, Che Liu, Xiancong Ren, Hanchu Ni, Shuai Zhang, Zeyuan Ding, Jiayu Hu, Hanzhe Shan, Zhenwei Niu, Zhaoyang Liu, Shuang Liu, Yue Zhao, Junbo Qi, Qinfan Zhang, Dengjie Li, Yidong Wang, Jiachen Luo, Yong Dai, Zenglin Xu, Bin Shen, Qifan Wang, Jian Tang, Xiaozhu Ju
TL;DR
Pelican-VL 1.0 introduces an open-source embodied brain model at 7–72B parameters, unified by the Deliberate Practice Policy Optimization (DPPO) framework that couples RL-based skill discovery with supervised consolidation. Through a metaloop of Exploratory Grounding and Targeted Remediation, the approach leverages large-scale, mixed-modal data and a unified preference-learning objective to achieve robust spatial, temporal, and planning capabilities in real-world embodied tasks. Extensive hardware-enabled experiments demonstrate state-of-the-art performance on contact-rich manipulation, affordance-based reasoning, and long-horizon multi-agent planning, while revealing richer, diagnostic benchmarks across nine embodied capability dimensions. By open-sourcing both models and the DPPO toolchain, the work lays a foundation for scalable, self-improving embodied AI and a pathway toward autonomous, real-world robotic intelligence.
Abstract
This report presents Pelican-VL 1.0, a new family of open-source embodied brain models with parameter scales ranging from 7 billion to 72 billion. Our explicit mission is clearly stated as: To embed powerful intelligence into various embodiments. Pelican-VL 1.0 is currently the largest-scale open-source embodied multimodal brain model. Its core advantage lies in the in-depth integration of data power and intelligent adaptive learning mechanisms. Specifically, metaloop distilled a high-quality dataset from a raw dataset containing 4+ billion tokens. Pelican-VL 1.0 is trained on a large-scale cluster of 1000+ A800 GPUs, consuming over 50k+ A800 GPU-hours per checkpoint. This translates to a 20.3% performance uplift from its base model and outperforms 100B-level open-source counterparts by 10.6%, placing it on par with leading proprietary systems on well-known embodied benchmarks. We establish a novel framework, DPPO (Deliberate Practice Policy Optimization), inspired by human metacognition to train Pelican-VL 1.0. We operationalize this as a metaloop that teaches the AI to practice deliberately, which is a RL-Refine-Diagnose-SFT loop.
