Learning Fine-grained Domain Generalization via Hyperbolic State Space Hallucination
Qi Bi, Jingjun Yi, Haolan Zhan, Wei Ji, Gui-Song Xia
TL;DR
This work tackles Fine-grained Domain Generalization (FGDG), where a model must generalize to unseen domains while distinguishing subtle fine-grained categories. It introduces Hyperbolic State Space Hallucination (HSSH), which extends a VMamba-based state-space backbone with two components: State Space Hallucination (SSH) to broaden style diversity of state embeddings and Hyperbolic Manifold Consistency (HMC) to enforce hierarchical alignment on a hyperbolic manifold via the exponential map into the Poincaré ball and geodesic distance $d_h$. The training objective combines standard classification losses with the hyperbolic consistency loss $\mathcal{L}_{HMC}$ (weighted by $\lambda=0.5$), and experiments on three FGDG benchmarks show state-of-the-art results, supported by ablations and visualizations that highlight the benefits of SSH and HMC. The findings indicate that jointly expanding style diversity and leveraging hyperbolic geometry to capture high-order statistics substantially improves fine-grained discrimination under cross-domain shifts, with practical implications for FGVC in diverse real-world settings.
Abstract
Fine-grained domain generalization (FGDG) aims to learn a fine-grained representation that can be well generalized to unseen target domains when only trained on the source domain data. Compared with generic domain generalization, FGDG is particularly challenging in that the fine-grained category can be only discerned by some subtle and tiny patterns. Such patterns are particularly fragile under the cross-domain style shifts caused by illumination, color and etc. To push this frontier, this paper presents a novel Hyperbolic State Space Hallucination (HSSH) method. It consists of two key components, namely, state space hallucination (SSH) and hyperbolic manifold consistency (HMC). SSH enriches the style diversity for the state embeddings by firstly extrapolating and then hallucinating the source images. Then, the pre- and post- style hallucinate state embeddings are projected into the hyperbolic manifold. The hyperbolic state space models the high-order statistics, and allows a better discernment of the fine-grained patterns. Finally, the hyperbolic distance is minimized, so that the impact of style variation on fine-grained patterns can be eliminated. Experiments on three FGDG benchmarks demonstrate its state-of-the-art performance.
