Fine-Grained Domain Generalization with Feature Structuralization

Wenlong Yu; Dongyue Chen; Qilong Wang; Qinghua Hu

Fine-Grained Domain Generalization with Feature Structuralization

Wenlong Yu, Dongyue Chen, Qilong Wang, Qinghua Hu

TL;DR

This work tackles Fine-Grained Domain Generalization (FGDG) by introducing Feature Structuralization (FS), which decomposes learned features into common, specific, and confounding parts and aligns them across multi-granularity knowledge. The framework uses a decorrelation term plus three alignment losses and a prediction calibration term to promote invariant yet discriminative representations, implemented via a Granularity Transition Layer and two backbones (coarse and fine branches). Empirical results on three benchmarks show substantial generalization gains (approximately 6.2% on average) over state-of-the-art DG methods, with extensive ablations confirming the value of each component and an explainability analysis validating the semantic structuring through Concept Relevance Propagation. The approach offers robust FGDG performance across architectures and provides interpretable feature structuring that isolates common, category-specific, and confounding information, enabling insights for future granularity-aware learning and explainability enhancements.

Abstract

Fine-grained domain generalization (FGDG) is a more challenging task than traditional DG tasks due to its small inter-class variations and relatively large intra-class disparities. When domain distribution changes, the vulnerability of subtle features leads to a severe deterioration in model performance. Nevertheless, humans inherently demonstrate the capacity for generalizing to out-of-distribution data, leveraging structured multi-granularity knowledge that emerges from discerning the commonality and specificity within categories. Likewise, we propose a Feature Structuralized Domain Generalization (FSDG) model, wherein features experience structuralization into common, specific, and confounding segments, harmoniously aligned with their relevant semantic concepts, to elevate performance in FGDG. Specifically, feature structuralization (FS) is accomplished through joint optimization of five constraints: a decorrelation function applied to disentangled segments, three constraints ensuring common feature consistency and specific feature distinctiveness, and a prediction calibration term. By imposing these stipulations, FSDG is prompted to disentangle and align features based on multi-granularity knowledge, facilitating robust subtle distinctions among categories. Extensive experimentation on three benchmarks consistently validates the superiority of FSDG over state-of-the-art counterparts, with an average improvement of 6.2% in FGDG performance. Beyond that, the explainability analysis on explicit concept matching intensity between the shared concepts among categories and the model channels, along with experiments on various mainstream model architectures, substantiates the validity of FS.

Fine-Grained Domain Generalization with Feature Structuralization

TL;DR

Abstract

Paper Structure (15 sections, 14 equations, 7 figures, 11 tables)

This paper contains 15 sections, 14 equations, 7 figures, 11 tables.

Introduction
Related Work
Domain Generalization
Fine-Grained Visual Categorization
Method
Problem Statement
Disentanglement and Decorrelation of Three Partitions
Alignment for Commonality, Specificity and Prediction
Experiments
Datasets
Implementation and Evaluation
Main Results
Analysis
Explainability Analysis
Conclusion

Figures (7)

Figure 1: Instance of multi-granularity knowledge. Four animals are categorized into various classes across three granularity levels according to their commonalities and specificities, as described in the green region. The numbers represent species labels: labels 4, 5, 6, and 7 correspond to the $f_0$ granularity level; 2 and 3 to the $c_1$ level; and 1 to the coarsest $c_2$ level. For commonalities, the number outside parentheses indicates the parent category, while the two inside are its sub-categories.
Figure 2: Illustration of the model exemplified in a three-granularity hierarchy (i.e., $G=3$). The FS module is highlighted in the middle box with solid arrows depicting the operation dimensions. $\oplus$ represents the disentanglement operator. Given five input images, FSDG outputs multi-granular results, as indicated in the right section. The box showing $\mathcal{L}_{lf}$ illustrates the alignment operation for coarse-fine predicted distributions and shows $\varepsilon$ during the training process. Conf. is an abbreviation of confounding and Granu. Trans. means the Granularity Transition Layer. The coarse branches and the FS optimization module are excluded during model inference.
Figure 3: Distance analyses of commonality and specificity. Under various combinations of losses, the distances among common features, (a) $S_{cs}$ and (b) $S_{cd}$, and the similarity of the specific parts, (c) $S_{p}$, are computed to illustrate the effectiveness of FS.
Figure 4: Analyses of the proportions of common, specific, and confounding components.
Figure 5: Performance of various loss functions under different coefficients. (a) presents the performance of each loss. (b) is the progressive optimization of the 4 losses.
...and 2 more figures

Fine-Grained Domain Generalization with Feature Structuralization

TL;DR

Abstract

Fine-Grained Domain Generalization with Feature Structuralization

Authors

TL;DR

Abstract

Table of Contents

Figures (7)