Table of Contents
Fetching ...

Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis

Hongyu Sun, Qiuhong Ke, Ming Cheng, Yongcai Wang, Deying Li, Chenhui Gou, Jianfei Cai

TL;DR

This work tackles the challenge of test-time distribution shifts in open-vocabulary point cloud recognition by introducing Point-Cache, a training-free, plug-and-play hierarchical cache built from online test data. The cache comprises a global component storing coarse fingerprints and a local component capturing part-level details, which are dynamically updated to prioritize high-quality samples. By querying the global and local caches and fusing their adaptation logits with the zero-shot predictions from large multimodal 3D models, Point-Cache achieves robust and generalizable recognition across seen and unseen classes with minimal computational overhead. Extensive experiments across eight benchmarks and multiple backbones demonstrate consistent gains, favorable memory/throughput trade-offs, and clear ablation-supported design choices. This approach leverages powerful pre-trained 3D-language models to enable practical open-vocabulary point cloud analysis at test time.

Abstract

This paper proposes a general solution to enable point cloud recognition models to handle distribution shifts at test time. Unlike prior methods, which rely heavily on training data (often inaccessible during online inference) and are limited to recognizing a fixed set of point cloud classes predefined during training, we explore a more practical and challenging scenario: adapting the model solely based on online test data to recognize both previously seen classes and novel, unseen classes at test time. To this end, we develop \textbf{Point-Cache}, a hierarchical cache model that captures essential clues of online test samples, particularly focusing on the global structure of point clouds and their local-part details. Point-Cache, which serves as a rich 3D knowledge base, is dynamically managed to prioritize the inclusion of high-quality samples. Designed as a plug-and-play module, our method can be flexibly integrated into large multimodal 3D models to support open-vocabulary point cloud recognition. Notably, our solution operates with efficiency comparable to zero-shot inference, as it is entirely training-free. Point-Cache demonstrates substantial gains across 8 challenging benchmarks and 4 representative large 3D models, highlighting its effectiveness. Code is available at https://github.com/auniquesun/Point-Cache.

Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis

TL;DR

This work tackles the challenge of test-time distribution shifts in open-vocabulary point cloud recognition by introducing Point-Cache, a training-free, plug-and-play hierarchical cache built from online test data. The cache comprises a global component storing coarse fingerprints and a local component capturing part-level details, which are dynamically updated to prioritize high-quality samples. By querying the global and local caches and fusing their adaptation logits with the zero-shot predictions from large multimodal 3D models, Point-Cache achieves robust and generalizable recognition across seen and unseen classes with minimal computational overhead. Extensive experiments across eight benchmarks and multiple backbones demonstrate consistent gains, favorable memory/throughput trade-offs, and clear ablation-supported design choices. This approach leverages powerful pre-trained 3D-language models to enable practical open-vocabulary point cloud analysis at test time.

Abstract

This paper proposes a general solution to enable point cloud recognition models to handle distribution shifts at test time. Unlike prior methods, which rely heavily on training data (often inaccessible during online inference) and are limited to recognizing a fixed set of point cloud classes predefined during training, we explore a more practical and challenging scenario: adapting the model solely based on online test data to recognize both previously seen classes and novel, unseen classes at test time. To this end, we develop \textbf{Point-Cache}, a hierarchical cache model that captures essential clues of online test samples, particularly focusing on the global structure of point clouds and their local-part details. Point-Cache, which serves as a rich 3D knowledge base, is dynamically managed to prioritize the inclusion of high-quality samples. Designed as a plug-and-play module, our method can be flexibly integrated into large multimodal 3D models to support open-vocabulary point cloud recognition. Notably, our solution operates with efficiency comparable to zero-shot inference, as it is entirely training-free. Point-Cache demonstrates substantial gains across 8 challenging benchmarks and 4 representative large 3D models, highlighting its effectiveness. Code is available at https://github.com/auniquesun/Point-Cache.

Paper Structure

This paper contains 25 sections, 4 equations, 14 figures, 18 tables, 1 algorithm.

Figures (14)

  • Figure 1: Recognition accuracy comparison on clean and corrupted point cloud datasets. The suffix -C indicates datasets with corruptions. Models experience a severe performance drop when data corruptions arise. Point-Cache effectively narrows down the performance gap between clean and corrupted data. The hardest split of ScanObjNN is used.
  • Figure 2: The overall pipeline of Point-Cache. The zero-shot predictions $\hat{\textbf{y}}_{zs}$ of large 3D models are effectively adapted by our global cache logits $\hat{\textbf{y}}_{g}$ and local cache logits $\hat{\textbf{y}}_{l}$ to handle the distribution shifts, enabling robust and generalizable point cloud analysis.
  • Figure 3: Challenges in encoding part feature for various 3D objects of different classes: Part Feature Capture & Storage.
  • Figure 4: Ablation studies on the hyper-parameters in the cache design, including the shot size $K$ per class, the number of parts $m$ per object, the balance factors in the final prediction logits and the sharpness coefficients in affinity computation.
  • Figure 5: The average recognition accuracy of accumulated samples during online inference. The curve changes significantly in the initial stage due to the small number of samples. Models with our global and hierarchical cache receives perceptible performance gains.
  • ...and 9 more figures