Table of Contents
Fetching ...

PointNSP: Autoregressive 3D Point Cloud Generation with Next-Scale Level-of-Detail Prediction

Ziqiao Meng, Qichao Wang, Zhiyang Dou, Zixing Song, Zhipeng Zhou, Irwin King, Peilin Zhao

TL;DR

PointNSP introduces a permutation-invariant autoregressive framework that predicts next-scale level-of-detail in a coarse-to-fine sequence for 3D point clouds. By using a multi-scale LoD representation, FPS-based sampling, a shared VQVAE tokenizer, and a bidirectional intra-scale plus masked inter-scale transformer, it achieves state-of-the-art quality while improving efficiency over diffusion baselines. Extensive ShapeNet experiments show strong results in single-class, many-class, completion, and upsampling tasks, with notable parameter and speed advantages. The work also provides theoretical and empirical analysis of permutation invariance and offers a scalable path toward foundation-scale 3D generation.

Abstract

Autoregressive point cloud generation has long lagged behind diffusion-based approaches in quality. The performance gap stems from the fact that autoregressive models impose an artificial ordering on inherently unordered point sets, forcing shape generation to proceed as a sequence of local predictions. This sequential bias emphasizes short-range continuity but undermines the model's capacity to capture long-range dependencies, hindering its ability to enforce global structural properties such as symmetry, consistent topology, and large-scale geometric regularities. Inspired by the level-of-detail (LOD) principle in shape modeling, we propose PointNSP, a coarse-to-fine generative framework that preserves global shape structure at low resolutions and progressively refines fine-grained geometry at higher scales through a next-scale prediction paradigm. This multi-scale factorization aligns the autoregressive objective with the permutation-invariant nature of point sets, enabling rich intra-scale interactions while avoiding brittle fixed orderings. Experiments on ShapeNet show that PointNSP establishes state-of-the-art (SOTA) generation quality for the first time within the autoregressive paradigm. In addition, it surpasses strong diffusion-based baselines in parameter, training, and inference efficiency. Finally, in dense generation with 8,192 points, PointNSP's advantages become even more pronounced, underscoring its scalability potential.

PointNSP: Autoregressive 3D Point Cloud Generation with Next-Scale Level-of-Detail Prediction

TL;DR

PointNSP introduces a permutation-invariant autoregressive framework that predicts next-scale level-of-detail in a coarse-to-fine sequence for 3D point clouds. By using a multi-scale LoD representation, FPS-based sampling, a shared VQVAE tokenizer, and a bidirectional intra-scale plus masked inter-scale transformer, it achieves state-of-the-art quality while improving efficiency over diffusion baselines. Extensive ShapeNet experiments show strong results in single-class, many-class, completion, and upsampling tasks, with notable parameter and speed advantages. The work also provides theoretical and empirical analysis of permutation invariance and offers a scalable path toward foundation-scale 3D generation.

Abstract

Autoregressive point cloud generation has long lagged behind diffusion-based approaches in quality. The performance gap stems from the fact that autoregressive models impose an artificial ordering on inherently unordered point sets, forcing shape generation to proceed as a sequence of local predictions. This sequential bias emphasizes short-range continuity but undermines the model's capacity to capture long-range dependencies, hindering its ability to enforce global structural properties such as symmetry, consistent topology, and large-scale geometric regularities. Inspired by the level-of-detail (LOD) principle in shape modeling, we propose PointNSP, a coarse-to-fine generative framework that preserves global shape structure at low resolutions and progressively refines fine-grained geometry at higher scales through a next-scale prediction paradigm. This multi-scale factorization aligns the autoregressive objective with the permutation-invariant nature of point sets, enabling rich intra-scale interactions while avoiding brittle fixed orderings. Experiments on ShapeNet show that PointNSP establishes state-of-the-art (SOTA) generation quality for the first time within the autoregressive paradigm. In addition, it surpasses strong diffusion-based baselines in parameter, training, and inference efficiency. Finally, in dense generation with 8,192 points, PointNSP's advantages become even more pronounced, underscoring its scalability potential.

Paper Structure

This paper contains 36 sections, 21 equations, 19 figures, 8 tables, 2 algorithms.

Figures (19)

  • Figure 1: PointNSP achieves SoTA performance compared to recent strong baseline methods across six key evaluation metrics.
  • Figure 2: Three types of point cloud generative models: (a) diffusion-based methods that iteratively denoise shapes starting from Gaussian noise; (b) vanilla autoregressive (AR) methods that predict the next point by flattening the 3D shape into a sequence; and (c) our proposed PointNSP, which predicts next-scale level-of-detail in a coarse-to-fine manner.
  • Figure 3: (a) Illustration of training a multi-scale VQVAE in a residual manner for point cloud representation across scales $s_{1}$ to $s_{3}$, resulting in a multi-scale token sequence $Q = (q_{1}, \dots, q_{3})$; (b) Illustration of training a causal transformer with intermediate shape decoding, scale token $\operatorname{upsampling}$ ($s_{1}\rightarrow s_{2}$ and $s_{2}\rightarrow s_{3}$), position-aware soft masks $\mathbf{M}^{P}_{k}$, and block-wise causal masks $\mathbf{M}$.
  • Figure 4: Visualization of generation results compared with baseline models. PointNSP produces high-quality and diverse 3D point clouds.
  • Figure 5: Visualization of multi-scale point clouds during the PointNSP generation process as the scale $K$ increases.
  • ...and 14 more figures