Table of Contents
Fetching ...

A Language Anchor-Guided Method for Robust Noisy Domain Generalization

Zilin Dai, Lehong Wang, Fangzhou Lin, Yidong Wang, Zhigang Li, Kazunori D Yamada, Ziming Zhang, Wang Lu

TL;DR

Domain generalization under distribution shift and label noise remains challenging due to spurious correlations. The authors propose $A^3W$, an NLP-anchor guided framework that aligns image features with class-specific semantic anchors derived from CLIP and employs a softmax-weighted loss to down-weight noisy samples, thereby improving robustness. Empirical results across multiple DG benchmarks show consistent improvements over state-of-the-art methods, with notable gains under higher noise and in semantically rich settings. This knowledge-guided approach demonstrates the practical value of integrating external semantic cues into domain generalization and opens avenues for dynamic anchors and multi-modal extensions.

Abstract

Real-world machine learning applications often struggle with two major challenges: distribution shift and label noise. Models tend to overfit by focusing on redundant and uninformative features in the training data, which makes it hard for them to generalize to the target domain. Noisy data worsens this problem by causing further overfitting to the noise, meaning that existing methods often fail to tell the difference between true, invariant features and misleading, spurious ones. To tackle these issues, we introduce Anchor Alignment and Adaptive Weighting (A3W). This new algorithm uses sample reweighting guided by natural language processing (NLP) anchors to extract more representative features. In simple terms, A3W leverages semantic representations from natural language models as a source of domain-invariant prior knowledge. Additionally, it employs a weighted loss function that adjusts each sample's contribution based on its similarity to the corresponding NLP anchor. This adjustment makes the model more robust to noisy labels. Extensive experiments on standard benchmark datasets show that A3W consistently outperforms state-of-the-art domain generalization methods, offering significant improvements in both accuracy and robustness across different datasets and noise levels.

A Language Anchor-Guided Method for Robust Noisy Domain Generalization

TL;DR

Domain generalization under distribution shift and label noise remains challenging due to spurious correlations. The authors propose , an NLP-anchor guided framework that aligns image features with class-specific semantic anchors derived from CLIP and employs a softmax-weighted loss to down-weight noisy samples, thereby improving robustness. Empirical results across multiple DG benchmarks show consistent improvements over state-of-the-art methods, with notable gains under higher noise and in semantically rich settings. This knowledge-guided approach demonstrates the practical value of integrating external semantic cues into domain generalization and opens avenues for dynamic anchors and multi-modal extensions.

Abstract

Real-world machine learning applications often struggle with two major challenges: distribution shift and label noise. Models tend to overfit by focusing on redundant and uninformative features in the training data, which makes it hard for them to generalize to the target domain. Noisy data worsens this problem by causing further overfitting to the noise, meaning that existing methods often fail to tell the difference between true, invariant features and misleading, spurious ones. To tackle these issues, we introduce Anchor Alignment and Adaptive Weighting (A3W). This new algorithm uses sample reweighting guided by natural language processing (NLP) anchors to extract more representative features. In simple terms, A3W leverages semantic representations from natural language models as a source of domain-invariant prior knowledge. Additionally, it employs a weighted loss function that adjusts each sample's contribution based on its similarity to the corresponding NLP anchor. This adjustment makes the model more robust to noisy labels. Extensive experiments on standard benchmark datasets show that A3W consistently outperforms state-of-the-art domain generalization methods, offering significant improvements in both accuracy and robustness across different datasets and noise levels.

Paper Structure

This paper contains 25 sections, 2 theorems, 21 equations, 6 figures, 4 tables, 1 algorithm.

Key Result

Lemma 3.1

Suppose $\mathcal{H}$ is the space of predictors induced by $(f, \{Proj_c\})$. If $\max_{x,c}\,\|\nabla \mathcal{L}_{\text{anchor}}(x,c)\|\leq \gamma$, then $\mathcal{H}$ excludes functions whose representations deviate from the anchors by more than a constant factor related to $\gamma$. In particul

Figures (6)

  • Figure 1: Illustration of domain shift using four distinct domains—painting, photo, cartoon, and sketch—for the same object class. In (a), the visual appearance of “elephant” varies substantially across domains, underscoring significant style discrepancies. In (b), the t-SNE projection shows that even for the same class, the distribution of features differs across domains, highlighting the inherent challenges of domain generalization. In (c), unclear or mislabeled samples introduces additional noise, further exacerbating the difficulty of achieving robust generalization. In (d), we show the accuracy of a network trained on noisy data: ideally, the model should resist overfitting to noise, but the graph indicates a steady increase in noise accuracy over time, suggesting progressive overfitting to noisy labels.
  • Figure 2: Architecture of $A^3W$. This diagram illustrates the end-to-end workflow and key components of $A^3W$. The encoder (Enc.) converts input text into embeddings, which are then refined by the featurizer (Fea.) into a more informative representation. The classifier (CLs.) leverages these refined features for prediction. Additionally, the similarity module (sim.) computes the cosine similarity between the embedding anchor and the projected features, while the alignment module (align.) creates deep copies of this similarity for weight ($w$) computation.
  • Figure 3: Noise analysis. (a) The effect of increasing noise levels on classification accuracy reveals that higher noise leads to a sharper decline, reflecting an increased tendency to overfit. (b) $A^3W$ is most robust to noise injection, with its accuracy decreasing by only 0.2 when noise increases from 0.1 to 0.25, in contrast to other algorithms, which show declines between 0.427 and 0.527.
  • Figure 4: Convergence trajectories under three different random seeds and training configurations for two datasets.
  • Figure 5: t-SNE embeddings of three methods on the PACS dataset. From left to right, up to down are ERM, IRM, Mixup, and $A^3W$.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Lemma 3.1: Semantic Prior Restricts Hypothesis Space
  • proof : Proof
  • Theorem 3.1: Robustness under Weighted ERM
  • proof : Proof