Table of Contents
Fetching ...

Geodesic Prototype Matching via Diffusion Maps for Interpretable Fine-Grained Recognition

Junhao Jia, Yunyou Liu, Yifei Sun, Huangwei Chen, Feiwei Qin, Changmiao Wang, Yong Peng

TL;DR

This work presents a novel paradigm for prototype-based recognition by grounding similarity in the intrinsic geometry of deep features by distill the latent manifold structure of each class into a diffusion space and devise a differentiable Nystr\"{o}m interpolation to make this geometry accessible to both unseen samples and learnable prototypes.

Abstract

Nonlinear manifolds are pervasive in deep visual features, where Euclidean distances can misrepresent true similarity. This mismatch is particularly detrimental to prototype-based interpretable fine-grained recognition, where even subtle semantic distinctions are crucial. To mitigate this issue, this work presents a novel paradigm for prototype-based recognition by grounding similarity in the intrinsic geometry of deep features. Concretely, we distill the latent manifold structure of each class into a diffusion space and, critically, devise a differentiable Nyström interpolation to make this geometry accessible to both unseen samples and learnable prototypes. To maintain efficiency, we employ compact per-class landmark sets with periodic updates. This strategy keeps the embedding synchronized with the evolving backbone, enabling fast inference at scale. Comprehensive experiments on two benchmark datasets demonstrate that our GeoProto yields prototypes focusing on semantically corresponding parts, significantly outperforming Euclidean prototype networks.

Geodesic Prototype Matching via Diffusion Maps for Interpretable Fine-Grained Recognition

TL;DR

This work presents a novel paradigm for prototype-based recognition by grounding similarity in the intrinsic geometry of deep features by distill the latent manifold structure of each class into a diffusion space and devise a differentiable Nystr\"{o}m interpolation to make this geometry accessible to both unseen samples and learnable prototypes.

Abstract

Nonlinear manifolds are pervasive in deep visual features, where Euclidean distances can misrepresent true similarity. This mismatch is particularly detrimental to prototype-based interpretable fine-grained recognition, where even subtle semantic distinctions are crucial. To mitigate this issue, this work presents a novel paradigm for prototype-based recognition by grounding similarity in the intrinsic geometry of deep features. Concretely, we distill the latent manifold structure of each class into a diffusion space and, critically, devise a differentiable Nyström interpolation to make this geometry accessible to both unseen samples and learnable prototypes. To maintain efficiency, we employ compact per-class landmark sets with periodic updates. This strategy keeps the embedding synchronized with the evolving backbone, enabling fast inference at scale. Comprehensive experiments on two benchmark datasets demonstrate that our GeoProto yields prototypes focusing on semantically corresponding parts, significantly outperforming Euclidean prototype networks.

Paper Structure

This paper contains 11 sections, 6 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Diffusion (geodesic) similarity respects the class-manifold, avoiding Euclidean “shortcuts” and yielding semantically consistent prototype–part matches.
  • Figure 2: The overview of our proposed GeoProto framework. (a) Training: build class-wise diffusion (geodesic) manifolds from CNN features and embed prototypes via Nyström. (b) Inference: map a query into each class manifold via Nyström, compute geodesic similarity to prototypes, then max-pool and aggregate to produce the class score and prototype–part explanations.
  • Figure 3: Each panel shows a prototype and its five nearest patches. GeoProto yields more class-consistent parts, while Euclidean tends to include off-manifold textures.