PointNSP: Autoregressive 3D Point Cloud Generation with Next-Scale Level-of-Detail Prediction
Ziqiao Meng, Qichao Wang, Zhiyang Dou, Zixing Song, Zhipeng Zhou, Irwin King, Peilin Zhao
TL;DR
PointNSP introduces a permutation-invariant autoregressive framework that predicts next-scale level-of-detail in a coarse-to-fine sequence for 3D point clouds. By using a multi-scale LoD representation, FPS-based sampling, a shared VQVAE tokenizer, and a bidirectional intra-scale plus masked inter-scale transformer, it achieves state-of-the-art quality while improving efficiency over diffusion baselines. Extensive ShapeNet experiments show strong results in single-class, many-class, completion, and upsampling tasks, with notable parameter and speed advantages. The work also provides theoretical and empirical analysis of permutation invariance and offers a scalable path toward foundation-scale 3D generation.
Abstract
Autoregressive point cloud generation has long lagged behind diffusion-based approaches in quality. The performance gap stems from the fact that autoregressive models impose an artificial ordering on inherently unordered point sets, forcing shape generation to proceed as a sequence of local predictions. This sequential bias emphasizes short-range continuity but undermines the model's capacity to capture long-range dependencies, hindering its ability to enforce global structural properties such as symmetry, consistent topology, and large-scale geometric regularities. Inspired by the level-of-detail (LOD) principle in shape modeling, we propose PointNSP, a coarse-to-fine generative framework that preserves global shape structure at low resolutions and progressively refines fine-grained geometry at higher scales through a next-scale prediction paradigm. This multi-scale factorization aligns the autoregressive objective with the permutation-invariant nature of point sets, enabling rich intra-scale interactions while avoiding brittle fixed orderings. Experiments on ShapeNet show that PointNSP establishes state-of-the-art (SOTA) generation quality for the first time within the autoregressive paradigm. In addition, it surpasses strong diffusion-based baselines in parameter, training, and inference efficiency. Finally, in dense generation with 8,192 points, PointNSP's advantages become even more pronounced, underscoring its scalability potential.
