Table of Contents
Fetching ...

Language as an Anchor: Preserving Relative Visual Geometry for Domain Incremental Learning

Shuyi Geng, Tao Zhou, Yi Zhou

TL;DR

Domain Incremental Learning faces inter-domain interference and knowledge fragmentation as distributions shift across domains. The proposed LAVA framework uses a language-anchored approach to preserve the relative geometry of visual features by aligning domain-specific visual relations to a fixed text-based semantic structure, leveraging VL-RSA with a KL-based structural loss and CA-CDFA for cross-domain feature aggregation. A multi-level feature integration (MLFI) during inference enables robust domain identification, while a memory-efficient prototype-based retrieval supports cross-domain knowledge reuse. Across four standard DIL benchmarks, LAVA achieves state-of-the-art performance with strong robustness to domain order and meaningful memory/compute efficiency, demonstrating the value of language as a stable semantic compass for continual learning.

Abstract

A key challenge in Domain Incremental Learning (DIL) is to continually learn under shifting distributions while preserving knowledge from previous domains. Existing methods face a fundamental dilemma. On one hand, projecting all domains into a single unified visual space leads to inter-domain interference and semantic distortion, as large shifts may vary with not only visual appearance but also underlying semantics. On the other hand, isolating domain-specific parameters causes knowledge fragmentation, creating "knowledge islands" that hamper knowledge reuse and exacerbate forgetting. To address this issue, we propose LAVA (Language-Anchored Visual Alignment), a novel DIL framework that replaces direct feature alignment with relative alignment driven by a text-based reference anchor. LAVA guides the visual representations of each incoming domain to preserve a consistent relative geometry, which is defined by mirroring the pairwise semantic similarities between the class names. This anchored geometric structure acts as a bridge across domains, enabling the retrieval of class-aware prior knowledge and facilitating robust feature aggregation. Extensive experiments on standard DIL benchmarks demonstrate that LAVA achieves significant performance improvements over state-of-the-arts. Code is available at https://github.com/ShuyiGeng/LAVA.

Language as an Anchor: Preserving Relative Visual Geometry for Domain Incremental Learning

TL;DR

Domain Incremental Learning faces inter-domain interference and knowledge fragmentation as distributions shift across domains. The proposed LAVA framework uses a language-anchored approach to preserve the relative geometry of visual features by aligning domain-specific visual relations to a fixed text-based semantic structure, leveraging VL-RSA with a KL-based structural loss and CA-CDFA for cross-domain feature aggregation. A multi-level feature integration (MLFI) during inference enables robust domain identification, while a memory-efficient prototype-based retrieval supports cross-domain knowledge reuse. Across four standard DIL benchmarks, LAVA achieves state-of-the-art performance with strong robustness to domain order and meaningful memory/compute efficiency, demonstrating the value of language as a stable semantic compass for continual learning.

Abstract

A key challenge in Domain Incremental Learning (DIL) is to continually learn under shifting distributions while preserving knowledge from previous domains. Existing methods face a fundamental dilemma. On one hand, projecting all domains into a single unified visual space leads to inter-domain interference and semantic distortion, as large shifts may vary with not only visual appearance but also underlying semantics. On the other hand, isolating domain-specific parameters causes knowledge fragmentation, creating "knowledge islands" that hamper knowledge reuse and exacerbate forgetting. To address this issue, we propose LAVA (Language-Anchored Visual Alignment), a novel DIL framework that replaces direct feature alignment with relative alignment driven by a text-based reference anchor. LAVA guides the visual representations of each incoming domain to preserve a consistent relative geometry, which is defined by mirroring the pairwise semantic similarities between the class names. This anchored geometric structure acts as a bridge across domains, enabling the retrieval of class-aware prior knowledge and facilitating robust feature aggregation. Extensive experiments on standard DIL benchmarks demonstrate that LAVA achieves significant performance improvements over state-of-the-arts. Code is available at https://github.com/ShuyiGeng/LAVA.

Paper Structure

This paper contains 29 sections, 14 equations, 12 figures, 12 tables.

Figures (12)

  • Figure 1: (a) Unified Space Paradigm: Projecting features from all domains into a shared visual space risks interference and semantic distortion. (b) Isolated Space Paradigm: Learning domain-specific subspaces via prompts or adapters leads to fragmented knowledge. (c) Our LAVA: A Relative Space Framework. LAVA aligns each domain’s relative visual geometry (e.g., the learnable visual angles $\theta^t$) with a stable semantic structure derived from a frozen text-based reference anchor (e.g., the fixed semantic angles $\theta^{\text{text}}$), preserving a consistent semantic map across domains.
  • Figure 2: Overview of the proposed LAVA framework. LAVA consists of two branches: the text branch establishes a text-based reference anchor from class names and computes relative encodings that capture their pairwise relationships; the visual branch employs domain-specific prompts to extract features from a frozen encoder. These two branches feed into two core modules: (a) The VL-RSA module aligns the visual representations with the text-based geometry via a structural alignment loss, creating a domain-invariant relational space; (b) The CA-CDFA module aggregates class-aware features from previously seen domains using an attention mechanism, enabling effective cross-domain knowledge reuse and aggregation.
  • Figure 3: Overview of the inference pipeline. The model first identifies the domain of an input image via (c) Multi-Level Feature Integration (MLFI) module (bottom), and then employs the domain-specific modules (top) for the final classification task.
  • Figure 4: Performance progression on the ImageNet-R and ImageNet-C benchmarks.
  • Figure 5: Impact of Domain Order on DomainNet. LAVA shows robust stability and performance across domain orders, including the challenging Quickdraw-first order.
  • ...and 7 more figures