Towards Training-Free Open-World Classification with 3D Generative Models
Xinzhe Xia, Weiguang Zhao, Yuyao Yan, Guanyu Yang, Rui Zhang, Kaizhu Huang, Xi Yang
TL;DR
This work tackles 3D open-world classification with open-category and open-pose challenges by proposing a training-free framework that leverages 3D generative priors to synthesize rotation-invariant anchor samples. A rotation-invariant 3D encoder, combined with LLM-derived category descriptions and diffusion-based 3D generation, forms a lightweight, retraining-free classifier whose predictions rely on cosine similarity to anchor prototypes. The approach delivers state-of-the-art improvements on ModelNet10 and McGill, demonstrates robustness to pose variation, and analyzes how anchor quantity, generative models, and representation backbones influence performance. It also explores the role of LLM prompts in generating category descriptions, highlighting both potential gains and challenges in prompt reliability and prompt-based diversity.
Abstract
3D open-world classification is a challenging yet essential task in dynamic and unstructured real-world scenarios, requiring both open-category and open-pose recognition. To address these challenges, recent wisdom often takes sophisticated 2D pre-trained models to provide enriched and stable representations. However, these methods largely rely on how 3D objects can be projected into 2D space, which is unfortunately not well solved, and thus significantly limits their performance. Unlike these present efforts, in this paper we make a pioneering exploration of 3D generative models for 3D open-world classification. Drawing on abundant prior knowledge from 3D generative models, we additionally craft a rotation-invariant feature extractor. This innovative synergy endows our pipeline with the advantages of being training-free, open-category, and pose-invariant, thus well suited to 3D open-world classification. Extensive experiments on benchmark datasets demonstrate the potential of generative models in 3D open-world classification, achieving state-of-the-art performance on ModelNet10 and McGill with 32.0% and 8.7% overall accuracy improvement, respectively.
