Table of Contents
Fetching ...

Proto-FG3D: Prototype-based Interpretable Fine-Grained 3D Shape Classification

Shuxian Ma, Zihao Dong, Runmin Cong, Sam Kwong, Xiuli Shao

TL;DR

Proto-FG3D tackles fine-grained 3D shape classification by shifting from parametric softmax to non-parametric prototypes. It projects 3D shapes into multi-view 2D images, encodes them with a shared backbone, and learns a class-specific prototype pool through Prototype Association and online clustering, updated via EMA. Training optimizes intra-class prototype alignment and inter-prototype separation using a combination of cross-entropy and view-prototype contrastive losses, while inference relies on nearest prototype matching for transparent decisions. Experiments on FG3D and ModelNet40 demonstrate state-of-the-art accuracy, improved robustness to class imbalance, and built-in interpretability via global prototypes and local view-level explanations.

Abstract

Deep learning-based multi-view coarse-grained 3D shape classification has achieved remarkable success over the past decade, leveraging the powerful feature learning capabilities of CNN-based and ViT-based backbones. However, as a challenging research area critical for detailed shape understanding, fine-grained 3D classification remains understudied due to the limited discriminative information captured during multi-view feature aggregation, particularly for subtle inter-class variations, class imbalance, and inherent interpretability limitations of parametric model. To address these problems, we propose the first prototype-based framework named Proto-FG3D for fine-grained 3D shape classification, achieving a paradigm shift from parametric softmax to non-parametric prototype learning. Firstly, Proto-FG3D establishes joint multi-view and multi-category representation learning via Prototype Association. Secondly, prototypes are refined via Online Clustering, improving both the robustness of multi-view feature allocation and inter-subclass balance. Finally, prototype-guided supervised learning is established to enhance fine-grained discrimination via prototype-view correlation analysis and enables ad-hoc interpretability through transparent case-based reasoning. Experiments on FG3D and ModelNet40 show Proto-FG3D surpasses state-of-the-art methods in accuracy, transparent predictions, and ad-hoc interpretability with visualizations, challenging conventional fine-grained 3D recognition approaches.

Proto-FG3D: Prototype-based Interpretable Fine-Grained 3D Shape Classification

TL;DR

Proto-FG3D tackles fine-grained 3D shape classification by shifting from parametric softmax to non-parametric prototypes. It projects 3D shapes into multi-view 2D images, encodes them with a shared backbone, and learns a class-specific prototype pool through Prototype Association and online clustering, updated via EMA. Training optimizes intra-class prototype alignment and inter-prototype separation using a combination of cross-entropy and view-prototype contrastive losses, while inference relies on nearest prototype matching for transparent decisions. Experiments on FG3D and ModelNet40 demonstrate state-of-the-art accuracy, improved robustness to class imbalance, and built-in interpretability via global prototypes and local view-level explanations.

Abstract

Deep learning-based multi-view coarse-grained 3D shape classification has achieved remarkable success over the past decade, leveraging the powerful feature learning capabilities of CNN-based and ViT-based backbones. However, as a challenging research area critical for detailed shape understanding, fine-grained 3D classification remains understudied due to the limited discriminative information captured during multi-view feature aggregation, particularly for subtle inter-class variations, class imbalance, and inherent interpretability limitations of parametric model. To address these problems, we propose the first prototype-based framework named Proto-FG3D for fine-grained 3D shape classification, achieving a paradigm shift from parametric softmax to non-parametric prototype learning. Firstly, Proto-FG3D establishes joint multi-view and multi-category representation learning via Prototype Association. Secondly, prototypes are refined via Online Clustering, improving both the robustness of multi-view feature allocation and inter-subclass balance. Finally, prototype-guided supervised learning is established to enhance fine-grained discrimination via prototype-view correlation analysis and enables ad-hoc interpretability through transparent case-based reasoning. Experiments on FG3D and ModelNet40 show Proto-FG3D surpasses state-of-the-art methods in accuracy, transparent predictions, and ad-hoc interpretability with visualizations, challenging conventional fine-grained 3D recognition approaches.

Paper Structure

This paper contains 12 sections, 10 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Multi-view 3D shape classification paradigms: (a) Parametric softmax can be interpreted as a learnable prototype-based approach, where class- and view-coupled prototypes are learned in a fully parametric manner. (b) Non-parametric prototype learning directly identifies subcluster centers of embedded features as prototypes, enabling per-view predictions through nonparametric nearest prototype retrieval.
  • Figure 2: Architecture of prototype-based fine-grained 3D shape classification model. The visualization of initial prototypes (e.g., $\mathbf{Q}_{t=0}^c$) is reserved for future experimental results.