Boosting Few-Shot Segmentation via Instance-Aware Data Augmentation and Local Consensus Guided Cross Attention
Li Guo, Haoming Liu, Yuxuan Xia, Chengyu Zhang, Xiaochen Lu
TL;DR
The paper tackles generalization in few-shot semantic segmentation, where prototype-based methods struggle under domain shifts. It reconsiders fine-tuning by adding two key components: Instance-Aware Data Augmentation (IDA) to diversify the small support set, and Local Consensus Guided Cross Attention (LCCA) to align query and support features via dense cross-image correlations. The approach is implemented within a two-stage training framework and extended to the K-shot setting, with a synergistic integration of IDA and LC-CAN that yields substantial gains on PASCAL-$5^i$ and COCO-$20^i$, particularly in 5-shot scenarios. The results demonstrate improved generalization and robustness, suggesting that combining targeted augmentation with cross-image correspondence can bridge the gap between fine-tuning and prototype-based methods in FSS.
Abstract
Few-shot segmentation aims to train a segmentation model that can fast adapt to a novel task for which only a few annotated images are provided. Most recent models have adopted a prototype-based paradigm for few-shot inference. These approaches may have limited generalization capacity beyond the standard 1- or 5-shot settings. In this paper, we closely examine and reevaluate the fine-tuning based learning scheme that fine-tunes the classification layer of a deep segmentation network pre-trained on diverse base classes. To improve the generalizability of the classification layer optimized with sparsely annotated samples, we introduce an instance-aware data augmentation (IDA) strategy that augments the support images based on the relative sizes of the target objects. The proposed IDA effectively increases the support set's diversity and promotes the distribution consistency between support and query images. On the other hand, the large visual difference between query and support images may hinder knowledge transfer and cripple the segmentation performance. To cope with this challenge, we introduce the local consensus guided cross attention (LCCA) to align the query feature with support features based on their dense correlation, further improving the model's generalizability to the query image. The significant performance improvements on the standard few-shot segmentation benchmarks PASCAL-$5^i$ and COCO-$20^i$ verify the efficacy of our proposed method.
