Flatten Long-Range Loss Landscapes for Cross-Domain Few-Shot Learning
Yixiong Zou, Yicong Liu, Yiman Hu, Yuhua Li, Ruixuan Li
TL;DR
Cross-domain few-shot learning (CDFSL) struggles with transferring knowledge across domain shifts and finetuning on scarce target data. The authors extend loss-landscape analysis from parameter space to representation space (RSLL) and show that short-range flatness is insufficient for generalization, motivating long-range flattening through interpolation of differently normalized representations using the FLoR layer. They instantiate FLoR for both CNNs and Vision Transformers, with a Beta-distributed mixing parameter during base training and a learnable mix during finetuning, achieving state-of-the-art results across 8 cross-domain datasets (up to about 9% gains on some domains). The approach improves both transferability and few-shot finetuning and is supported by extensive ablations and analyses, with code to be released.
Abstract
Cross-domain few-shot learning (CDFSL) aims to acquire knowledge from limited training data in the target domain by leveraging prior knowledge transferred from source domains with abundant training samples. CDFSL faces challenges in transferring knowledge across dissimilar domains and fine-tuning models with limited training data. To address these challenges, we initially extend the analysis of loss landscapes from the parameter space to the representation space, which allows us to simultaneously interpret the transferring and fine-tuning difficulties of CDFSL models. We observe that sharp minima in the loss landscapes of the representation space result in representations that are hard to transfer and fine-tune. Moreover, existing flatness-based methods have limited generalization ability due to their short-range flatness. To enhance the transferability and facilitate fine-tuning, we introduce a simple yet effective approach to achieve long-range flattening of the minima in the loss landscape. This approach considers representations that are differently normalized as minima in the loss landscape and flattens the high-loss region in the middle by randomly sampling interpolated representations. We implement this method as a new normalization layer that replaces the original one in both CNNs and ViTs. This layer is simple and lightweight, introducing only a minimal number of additional parameters. Experimental results on 8 datasets demonstrate that our approach outperforms state-of-the-art methods in terms of average accuracy. Moreover, our method achieves performance improvements of up to 9\% compared to the current best approaches on individual datasets. Our code will be released.
