Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning

Yixiong Zou; Yicong Liu; Yiman Hu; Yuhua Li; Ruixuan Li

Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning

Yixiong Zou, Yicong Liu, Yiman Hu, Yuhua Li, Ruixuan Li

TL;DR

Cross-domain few-shot learning (CDFSL) struggles with transferring knowledge across domain shifts and finetuning on scarce target data. The authors extend loss-landscape analysis from parameter space to representation space (RSLL) and show that short-range flatness is insufficient for generalization, motivating long-range flattening through interpolation of differently normalized representations using the FLoR layer. They instantiate FLoR for both CNNs and Vision Transformers, with a Beta-distributed mixing parameter during base training and a learnable mix during finetuning, achieving state-of-the-art results across 8 cross-domain datasets (up to about 9% gains on some domains). The approach improves both transferability and few-shot finetuning and is supported by extensive ablations and analyses, with code to be released.

Abstract

Cross-domain few-shot learning (CDFSL) aims to acquire knowledge from limited training data in the target domain by leveraging prior knowledge transferred from source domains with abundant training samples. CDFSL faces challenges in transferring knowledge across dissimilar domains and fine-tuning models with limited training data. To address these challenges, we initially extend the analysis of loss landscapes from the parameter space to the representation space, which allows us to simultaneously interpret the transferring and fine-tuning difficulties of CDFSL models. We observe that sharp minima in the loss landscapes of the representation space result in representations that are hard to transfer and fine-tune. Moreover, existing flatness-based methods have limited generalization ability due to their short-range flatness. To enhance the transferability and facilitate fine-tuning, we introduce a simple yet effective approach to achieve long-range flattening of the minima in the loss landscape. This approach considers representations that are differently normalized as minima in the loss landscape and flattens the high-loss region in the middle by randomly sampling interpolated representations. We implement this method as a new normalization layer that replaces the original one in both CNNs and ViTs. This layer is simple and lightweight, introducing only a minimal number of additional parameters. Experimental results on 8 datasets demonstrate that our approach outperforms state-of-the-art methods in terms of average accuracy. Moreover, our method achieves performance improvements of up to 9\% compared to the current best approaches on individual datasets. Our code will be released.

Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning

TL;DR

Abstract

Paper Structure (22 sections, 8 equations, 6 figures, 7 tables)

This paper contains 22 sections, 8 equations, 6 figures, 7 tables.

Introduction
Analyzing Generalization from the Aspect of Representation-Space Loss Landscapes
Preliminaries
Representation-Space Loss Landscape
Verification and Interpretation
Short-Range Flatness Limits Generalization
Flatten Long-Range Loss Landscapes
Flattening for Convolutional Neural Networks
Flattening for Vision Transformers
Implementation
Experiments
Dataset and evaluation setup
Implementation details
Comparison with state-of-the-art works
Ablation study
...and 7 more sections

Figures (6)

Figure 1: (a) Representation-space loss landscape (RSLL): Given an input sample, the model maps it into a representation space, where effective representations correspond to low classification losses, i.e., minima in the landscape. Since domain shifts can be reflected by the landscape shift and representation shift, a sharp minimum in the landscape corresponds to a representation vulnerable to domain shifts, making the training and finetuning difficult. (b) We can easily find different minima in the RSLL, which covers a longer range than current flatness-based methods. This inspires us to flatten a long-range loss landscape by randomly interpolating these minima. (c) Given a flattened minimum, the representation and the model are more transferable against domain shifts and easier to be finetuned on the target domain.
Figure 2: To validate the analysis based on representation-space loss landscapes (RSLL), we apply low-frequency noises to the representation space (i.e., pixels and features) to be the domain shifts on training data for a sanity check. The perturbation variance measures the distance between the perturbed representation and the original representation (a minimum in RSLL). We use the performance drop against perturbation variance to measure the sharpness of the landscapes around the minimum, where a larger drop indicates a sharper minimum. We can see the model based on Instance Normalization (IN) is located in a flatter minimum than the model based on Batch Normalization (BN), which brings the high performance of the IN-based model in cross-domain tasks. This result is consistent with current works fu2023styleadv and validates the rationale of the RSLL analysis. (a) Samples of the pixel perturbation. (b) Perturbation on pixels. (c) Perturbation on features.
Figure 3: Directly apply SAM on each representation of ResNet10 on CDFSL datasets. Only marginal improvements on the average CDFSL 5-way 5-shot accuracy can be observed, and the perturbation step size is small, indicating the complex loss landscape could only support SAM to learn a short-range flatness.
Figure 4: We implement our method as a normalization layer (FLoR layer) to replace the ordinary normalization layer in the backbone network (e.g., CNNs or ViTs). This layer interpolates two differently normalized representations, which flattens the intermediate high-loss region between two minima in RSLL.
Figure 5: To verify the loss landscape between BN and IN representations, we train a baseline model with separate streams of IN and BN, by means of specifying different mixing ratios $\delta$. Two peaks and one valley can be observed in the baseline curve (gray), indicating the high-loss region between the two representations. In contrast, only one peak is observed in our curve (purple), indicating we can effectively flatten the loss landscapes.
...and 1 more figures

Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning

TL;DR

Abstract

Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)