Fine-Grained Domain Generalization with Feature Structuralization
Wenlong Yu, Dongyue Chen, Qilong Wang, Qinghua Hu
TL;DR
This work tackles Fine-Grained Domain Generalization (FGDG) by introducing Feature Structuralization (FS), which decomposes learned features into common, specific, and confounding parts and aligns them across multi-granularity knowledge. The framework uses a decorrelation term plus three alignment losses and a prediction calibration term to promote invariant yet discriminative representations, implemented via a Granularity Transition Layer and two backbones (coarse and fine branches). Empirical results on three benchmarks show substantial generalization gains (approximately 6.2% on average) over state-of-the-art DG methods, with extensive ablations confirming the value of each component and an explainability analysis validating the semantic structuring through Concept Relevance Propagation. The approach offers robust FGDG performance across architectures and provides interpretable feature structuring that isolates common, category-specific, and confounding information, enabling insights for future granularity-aware learning and explainability enhancements.
Abstract
Fine-grained domain generalization (FGDG) is a more challenging task than traditional DG tasks due to its small inter-class variations and relatively large intra-class disparities. When domain distribution changes, the vulnerability of subtle features leads to a severe deterioration in model performance. Nevertheless, humans inherently demonstrate the capacity for generalizing to out-of-distribution data, leveraging structured multi-granularity knowledge that emerges from discerning the commonality and specificity within categories. Likewise, we propose a Feature Structuralized Domain Generalization (FSDG) model, wherein features experience structuralization into common, specific, and confounding segments, harmoniously aligned with their relevant semantic concepts, to elevate performance in FGDG. Specifically, feature structuralization (FS) is accomplished through joint optimization of five constraints: a decorrelation function applied to disentangled segments, three constraints ensuring common feature consistency and specific feature distinctiveness, and a prediction calibration term. By imposing these stipulations, FSDG is prompted to disentangle and align features based on multi-granularity knowledge, facilitating robust subtle distinctions among categories. Extensive experimentation on three benchmarks consistently validates the superiority of FSDG over state-of-the-art counterparts, with an average improvement of 6.2% in FGDG performance. Beyond that, the explainability analysis on explicit concept matching intensity between the shared concepts among categories and the model channels, along with experiments on various mainstream model architectures, substantiates the validity of FS.
