Table of Contents
Fetching ...

SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning

Seokju Yun, Seunghye Chae, Dongheon Lee, Youngmin Ro

TL;DR

Domain generalization (DG) in vision with vision foundation models (VFMs) remains challenging due to domain shifts. SoMA introduces a spectral, parameter-efficient fine-tuning approach that leverages singular value decomposition (SVD) to identify and preserve generalizable components while tuning only minor singular components via a low-rank adapter, coupled with freezing of early blocks and an annealing weight decay schedule. The method yields state-of-the-art results on DG for semantic segmentation (DGSS) and object detection (DGOD), with no inference overhead and strong data/model scalability, and extends to subject personalization in generative settings. Collectively, SoMA demonstrates how spectral structure and block-level dynamics of VFMs can be exploited to achieve robust, efficient domain-generalizable representation learning across diverse tasks.

Abstract

Domain generalization (DG) aims to adapt a model using one or multiple source domains to ensure robust performance in unseen target domains. Recently, Parameter-Efficient Fine-Tuning (PEFT) of foundation models has shown promising results in the context of DG problem. Nevertheless, existing PEFT methods still struggle to strike a balance between preserving generalizable components of the pre-trained model and learning task-specific features. To gain insights into the distribution of generalizable components, we begin by analyzing the pre-trained weights through the lens of singular value decomposition. Building on these insights, we introduce Singular Value Decomposed Minor Components Adaptation (SoMA), an approach that selectively tunes minor singular components while keeping the residual parts frozen. SoMA effectively retains the generalization ability of the pre-trained model while efficiently acquiring task-specific skills. Moreover, we freeze domain-generalizable blocks and employ an annealing weight decay strategy, thereby achieving an optimal balance in the delicate trade-off between generalizability and discriminability. SoMA attains state-of-the-art results on multiple benchmarks that span both domain generalized semantic segmentation to domain generalized object detection. In addition, our methods introduce no additional inference overhead or regularization loss, maintain compatibility with any backbone or head, and are designed to be versatile, allowing easy integration into a wide range of tasks.

SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning

TL;DR

Domain generalization (DG) in vision with vision foundation models (VFMs) remains challenging due to domain shifts. SoMA introduces a spectral, parameter-efficient fine-tuning approach that leverages singular value decomposition (SVD) to identify and preserve generalizable components while tuning only minor singular components via a low-rank adapter, coupled with freezing of early blocks and an annealing weight decay schedule. The method yields state-of-the-art results on DG for semantic segmentation (DGSS) and object detection (DGOD), with no inference overhead and strong data/model scalability, and extends to subject personalization in generative settings. Collectively, SoMA demonstrates how spectral structure and block-level dynamics of VFMs can be exploited to achieve robust, efficient domain-generalizable representation learning across diverse tasks.

Abstract

Domain generalization (DG) aims to adapt a model using one or multiple source domains to ensure robust performance in unseen target domains. Recently, Parameter-Efficient Fine-Tuning (PEFT) of foundation models has shown promising results in the context of DG problem. Nevertheless, existing PEFT methods still struggle to strike a balance between preserving generalizable components of the pre-trained model and learning task-specific features. To gain insights into the distribution of generalizable components, we begin by analyzing the pre-trained weights through the lens of singular value decomposition. Building on these insights, we introduce Singular Value Decomposed Minor Components Adaptation (SoMA), an approach that selectively tunes minor singular components while keeping the residual parts frozen. SoMA effectively retains the generalization ability of the pre-trained model while efficiently acquiring task-specific skills. Moreover, we freeze domain-generalizable blocks and employ an annealing weight decay strategy, thereby achieving an optimal balance in the delicate trade-off between generalizability and discriminability. SoMA attains state-of-the-art results on multiple benchmarks that span both domain generalized semantic segmentation to domain generalized object detection. In addition, our methods introduce no additional inference overhead or regularization loss, maintain compatibility with any backbone or head, and are designed to be versatile, allowing easy integration into a wide range of tasks.

Paper Structure

This paper contains 26 sections, 6 equations, 11 figures, 20 tables.

Figures (11)

  • Figure 1: Overview of SoMA framework.Left. SoMA achieves state-of-the-art results across diverse tasks, ranging from domain-generalized semantic segmentation (DGSS) to object detection (DGOD), and performs well in both synthetic-to-real and real-to-real scenarios. Middle. Our method, trained solely on synthetic datasets, demonstrates strong generalization capabilities in complex real-world scenes. Right. We present SoMA, a method that applies SVD to the pre-trained weights, decomposing them into $r$ minor singular components $U_{[:,-r:]}\Sigma_{[-r:]} (V^T)_{[-r:,:]}$ and residuals. SoMA selectively tunes the smallest $r$ components, effectively preserving world knowledge of foundation models. Since SoMA shares the same forward architecture as LoRA hu2022lora, it adds no extra latency during inference phase.
  • Figure 2: Distribution of generalizable components.Top. Number of classes exhibiting specific accuracy drops after applying SVD to DINOv2-large weights and reconstructing by truncating the smallest $r$ singular components. Bottom. Distinct roles of singular components across levels. Numbers in parentheses represent the count of classes with an accuracy drop ratio exceeding 50%. The average WordNet hierarchy depth of these classes is shown, with higher values indicating greater context specificity.
  • Figure 3: The inherent generalization capabilities of the early blocks of VFM.Top. We apply PCA on the extracted intermediate features (8th block of DINOv2-Large) and visualize the top three leading components. Bottom. Class-wise IoU comparison for rare classes hoyer2022daformer under the GTAV$\rightarrow$Cityscapes setting, focusing on the impact of freezing the first eight blocks. “NFEB” stands for the Number of Frozen Early Blocks. A darker shade of blue indicates enhanced generalization performance relative to the baseline.
  • Figure 4: DGOD qualitative results.
  • Figure 5: Subject Personalization. 1. A dog gracefully leaping in origami style, 2. A dog in watercolor painting style, and 3. A dog soaring through a digital landscape in vector illustration style.
  • ...and 6 more figures