Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis
Hongyu Sun, Qiuhong Ke, Ming Cheng, Yongcai Wang, Deying Li, Chenhui Gou, Jianfei Cai
TL;DR
This work tackles the challenge of test-time distribution shifts in open-vocabulary point cloud recognition by introducing Point-Cache, a training-free, plug-and-play hierarchical cache built from online test data. The cache comprises a global component storing coarse fingerprints and a local component capturing part-level details, which are dynamically updated to prioritize high-quality samples. By querying the global and local caches and fusing their adaptation logits with the zero-shot predictions from large multimodal 3D models, Point-Cache achieves robust and generalizable recognition across seen and unseen classes with minimal computational overhead. Extensive experiments across eight benchmarks and multiple backbones demonstrate consistent gains, favorable memory/throughput trade-offs, and clear ablation-supported design choices. This approach leverages powerful pre-trained 3D-language models to enable practical open-vocabulary point cloud analysis at test time.
Abstract
This paper proposes a general solution to enable point cloud recognition models to handle distribution shifts at test time. Unlike prior methods, which rely heavily on training data (often inaccessible during online inference) and are limited to recognizing a fixed set of point cloud classes predefined during training, we explore a more practical and challenging scenario: adapting the model solely based on online test data to recognize both previously seen classes and novel, unseen classes at test time. To this end, we develop \textbf{Point-Cache}, a hierarchical cache model that captures essential clues of online test samples, particularly focusing on the global structure of point clouds and their local-part details. Point-Cache, which serves as a rich 3D knowledge base, is dynamically managed to prioritize the inclusion of high-quality samples. Designed as a plug-and-play module, our method can be flexibly integrated into large multimodal 3D models to support open-vocabulary point cloud recognition. Notably, our solution operates with efficiency comparable to zero-shot inference, as it is entirely training-free. Point-Cache demonstrates substantial gains across 8 challenging benchmarks and 4 representative large 3D models, highlighting its effectiveness. Code is available at https://github.com/auniquesun/Point-Cache.
