Dense Policy: Bidirectional Autoregressive Learning of Actions
Yue Su, Xinyu Zhan, Hongjie Fang, Han Xue, Hao-Shu Fang, Yong-Lu Li, Cewu Lu, Lixin Yang
TL;DR
Dense Policy introduces a bidirectional autoregressive framework for robotic action prediction that expands sparse keyframes into dense action sequences via a coarse-to-fine process with logarithmic-time inference. Built on an encoder-only architecture, it fuses observation features through cross-attention at each expansion level, achieving efficient training and faster inference while maintaining high accuracy. Across 11 simulation tasks in 3 benchmarks and 4 real-world tasks, Dense Policy outperforms holistically generated baselines and unidirectional autoregressive approaches, with ablations confirming the value of bidirectional dependencies for long-horizon manipulation. The work demonstrates strong generalization across 2D/3D perception and real-world settings, while noting potential extensions to broader vision-language-action tasks and scaling to larger models.
Abstract
Mainstream visuomotor policies predominantly rely on generative models for holistic action prediction, while current autoregressive policies, predicting the next token or chunk, have shown suboptimal results. This motivates a search for more effective learning methods to unleash the potential of autoregressive policies for robotic manipulation. This paper introduces a bidirectionally expanded learning approach, termed Dense Policy, to establish a new paradigm for autoregressive policies in action prediction. It employs a lightweight encoder-only architecture to iteratively unfold the action sequence from an initial single frame into the target sequence in a coarse-to-fine manner with logarithmic-time inference. Extensive experiments validate that our dense policy has superior autoregressive learning capabilities and can surpass existing holistic generative policies. Our policy, example data, and training code will be publicly available upon publication. Project page: https: //selen-suyue.github.io/DspNet/.
