Adaptive Point Transformer
Alessandro Baiocchi, Indro Spinelli, Alessandro Nicolosi, Simone Scardapane
TL;DR
The paper tackles the scalability challenge of point cloud transformers by introducing AdaPT, which uses learnable drop predictors to adaptively downsample tokens during inference. A budget mechanism enables flexible control over computational cost without retraining, supported by a differentiable Gumbel-Softmax sampling strategy and a regularization term to encourage token elimination. Empirical results on ModelNet40 show AdaPT achieving competitive accuracy across varying budgets while reducing FLOPs, underscoring its practical utility for resource-constrained scenarios. The work also demonstrates that learnable token selection can preserve performance while enabling real-world deployment of large-scale point cloud models.
Abstract
The recent surge in 3D data acquisition has spurred the development of geometric deep learning models for point cloud processing, boosted by the remarkable success of transformers in natural language processing. While point cloud transformers (PTs) have achieved impressive results recently, their quadratic scaling with respect to the point cloud size poses a significant scalability challenge for real-world applications. To address this issue, we propose the Adaptive Point Cloud Transformer (AdaPT), a standard PT model augmented by an adaptive token selection mechanism. AdaPT dynamically reduces the number of tokens during inference, enabling efficient processing of large point clouds. Furthermore, we introduce a budget mechanism to flexibly adjust the computational cost of the model at inference time without the need for retraining or fine-tuning separate models. Our extensive experimental evaluation on point cloud classification tasks demonstrates that AdaPT significantly reduces computational complexity while maintaining competitive accuracy compared to standard PTs. The code for AdaPT is made publicly available.
