Table of Contents
Fetching ...

Learning Fine-grained Domain Generalization via Hyperbolic State Space Hallucination

Qi Bi, Jingjun Yi, Haolan Zhan, Wei Ji, Gui-Song Xia

TL;DR

This work tackles Fine-grained Domain Generalization (FGDG), where a model must generalize to unseen domains while distinguishing subtle fine-grained categories. It introduces Hyperbolic State Space Hallucination (HSSH), which extends a VMamba-based state-space backbone with two components: State Space Hallucination (SSH) to broaden style diversity of state embeddings and Hyperbolic Manifold Consistency (HMC) to enforce hierarchical alignment on a hyperbolic manifold via the exponential map into the Poincaré ball and geodesic distance $d_h$. The training objective combines standard classification losses with the hyperbolic consistency loss $\mathcal{L}_{HMC}$ (weighted by $\lambda=0.5$), and experiments on three FGDG benchmarks show state-of-the-art results, supported by ablations and visualizations that highlight the benefits of SSH and HMC. The findings indicate that jointly expanding style diversity and leveraging hyperbolic geometry to capture high-order statistics substantially improves fine-grained discrimination under cross-domain shifts, with practical implications for FGVC in diverse real-world settings.

Abstract

Fine-grained domain generalization (FGDG) aims to learn a fine-grained representation that can be well generalized to unseen target domains when only trained on the source domain data. Compared with generic domain generalization, FGDG is particularly challenging in that the fine-grained category can be only discerned by some subtle and tiny patterns. Such patterns are particularly fragile under the cross-domain style shifts caused by illumination, color and etc. To push this frontier, this paper presents a novel Hyperbolic State Space Hallucination (HSSH) method. It consists of two key components, namely, state space hallucination (SSH) and hyperbolic manifold consistency (HMC). SSH enriches the style diversity for the state embeddings by firstly extrapolating and then hallucinating the source images. Then, the pre- and post- style hallucinate state embeddings are projected into the hyperbolic manifold. The hyperbolic state space models the high-order statistics, and allows a better discernment of the fine-grained patterns. Finally, the hyperbolic distance is minimized, so that the impact of style variation on fine-grained patterns can be eliminated. Experiments on three FGDG benchmarks demonstrate its state-of-the-art performance.

Learning Fine-grained Domain Generalization via Hyperbolic State Space Hallucination

TL;DR

This work tackles Fine-grained Domain Generalization (FGDG), where a model must generalize to unseen domains while distinguishing subtle fine-grained categories. It introduces Hyperbolic State Space Hallucination (HSSH), which extends a VMamba-based state-space backbone with two components: State Space Hallucination (SSH) to broaden style diversity of state embeddings and Hyperbolic Manifold Consistency (HMC) to enforce hierarchical alignment on a hyperbolic manifold via the exponential map into the Poincaré ball and geodesic distance . The training objective combines standard classification losses with the hyperbolic consistency loss (weighted by ), and experiments on three FGDG benchmarks show state-of-the-art results, supported by ablations and visualizations that highlight the benefits of SSH and HMC. The findings indicate that jointly expanding style diversity and leveraging hyperbolic geometry to capture high-order statistics substantially improves fine-grained discrimination under cross-domain shifts, with practical implications for FGVC in diverse real-world settings.

Abstract

Fine-grained domain generalization (FGDG) aims to learn a fine-grained representation that can be well generalized to unseen target domains when only trained on the source domain data. Compared with generic domain generalization, FGDG is particularly challenging in that the fine-grained category can be only discerned by some subtle and tiny patterns. Such patterns are particularly fragile under the cross-domain style shifts caused by illumination, color and etc. To push this frontier, this paper presents a novel Hyperbolic State Space Hallucination (HSSH) method. It consists of two key components, namely, state space hallucination (SSH) and hyperbolic manifold consistency (HMC). SSH enriches the style diversity for the state embeddings by firstly extrapolating and then hallucinating the source images. Then, the pre- and post- style hallucinate state embeddings are projected into the hyperbolic manifold. The hyperbolic state space models the high-order statistics, and allows a better discernment of the fine-grained patterns. Finally, the hyperbolic distance is minimized, so that the impact of style variation on fine-grained patterns can be eliminated. Experiments on three FGDG benchmarks demonstrate its state-of-the-art performance.

Paper Structure

This paper contains 13 sections, 10 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: In each fine-grained domain (first row), due to the inter-category similarity, the embedding style from the VMamba baseline (second row) is limited. The proposed HSSH (third row) enriches the style diversity of the state embedding, which benefits the robustness to unseen domains. The x- and y- axis refers to the mean and standard deviation value in the style space, ranging from 0 to 1.
  • Figure 2: Hyperbolic manifold provides a feasible path to learn hierarchical and high-order statistics, which has potential to distinguish a fine-grained category from other fine-grained categories under the same coarse-grained category.
  • Figure 3: The proposed hyperbolic state space hallucination (HSSH) is empowered by two key components, namely, state space hallucination (SSH) and hyperbolic manifold consistency (HMC). SSH enriches the style diversity for the state embeddings by firstly extrapolating and then hallucinating from the source images. HMC minimizes the hyperbolic distance between the pre- and post- hallucinated samples, so that the impact from the style variation on fine-grained patterns can be eliminated.
  • Figure 4: t-SNE feature visualization. Blue: source domain; Red: unseen target domain. Different types of icon refer to different fine-grained categories.
  • Figure 5: Heatmap visualization of the proposed HSSH on unseen target domains by GradCAM.