Table of Contents
Fetching ...

Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction

Senqiao Yang, Jiarui Wu, Jiaming Liu, Xiaoqi Li, Qizhe Zhang, Mingjie Pan, Yulu Gan, Zehui Chen, Shanghang Zhang

TL;DR

This work tackles domain shifts in dense prediction through Test-Time Adaptation by introducing Sparse Visual Domain Prompts (SVDP), which place minimal, pixel-level prompts to preserve spatial details. To effectively leverage SVDP, the authors develop Domain Prompt Placement (DPP) to target high-uncertainty regions and Domain Prompt Updating (DPU) to adapt prompts per target sample via adaptive EMA weighting. Empirical results on semantic segmentation and depth estimation demonstrate state-of-the-art performance on TTA and CTTA benchmarks, showcasing robustness to diverse domain shifts with minimal parameter updates. The approach offers a practical, privacy-conscious path for deploying pre-trained models in changing real-world environments without source data access.

Abstract

The visual prompts have provided an efficient manner in addressing visual cross-domain problems. In previous works, Visual Domain Prompt (VDP) first introduces domain prompts to tackle the classification Test-Time Adaptation (TTA) problem by warping image-level prompts on the input and fine-tuning prompts for each target domain. However, since the image-level prompts mask out continuous spatial details in the prompt-allocated region, it will suffer from inaccurate contextual information and limited domain knowledge extraction, particularly when dealing with dense prediction TTA problems. To overcome these challenges, we propose a novel Sparse Visual Domain Prompts (SVDP) approach, which holds minimal trainable parameters (e.g., 0.1\%) in the image-level prompt and reserves more spatial information of the input. To better apply SVDP in extracting domain-specific knowledge, we introduce the Domain Prompt Placement (DPP) method to adaptively allocates trainable parameters of SVDP on the pixels with large distribution shifts. Furthermore, recognizing that each target domain sample exhibits a unique domain shift, we design Domain Prompt Updating (DPU) strategy to optimize prompt parameters differently for each sample, facilitating efficient adaptation to the target domain. Extensive experiments were conducted on widely-used TTA and continual TTA benchmarks, and our proposed method achieves state-of-the-art performance in both semantic segmentation and depth estimation tasks.

Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction

TL;DR

This work tackles domain shifts in dense prediction through Test-Time Adaptation by introducing Sparse Visual Domain Prompts (SVDP), which place minimal, pixel-level prompts to preserve spatial details. To effectively leverage SVDP, the authors develop Domain Prompt Placement (DPP) to target high-uncertainty regions and Domain Prompt Updating (DPU) to adapt prompts per target sample via adaptive EMA weighting. Empirical results on semantic segmentation and depth estimation demonstrate state-of-the-art performance on TTA and CTTA benchmarks, showcasing robustness to diverse domain shifts with minimal parameter updates. The approach offers a practical, privacy-conscious path for deploying pre-trained models in changing real-world environments without source data access.

Abstract

The visual prompts have provided an efficient manner in addressing visual cross-domain problems. In previous works, Visual Domain Prompt (VDP) first introduces domain prompts to tackle the classification Test-Time Adaptation (TTA) problem by warping image-level prompts on the input and fine-tuning prompts for each target domain. However, since the image-level prompts mask out continuous spatial details in the prompt-allocated region, it will suffer from inaccurate contextual information and limited domain knowledge extraction, particularly when dealing with dense prediction TTA problems. To overcome these challenges, we propose a novel Sparse Visual Domain Prompts (SVDP) approach, which holds minimal trainable parameters (e.g., 0.1\%) in the image-level prompt and reserves more spatial information of the input. To better apply SVDP in extracting domain-specific knowledge, we introduce the Domain Prompt Placement (DPP) method to adaptively allocates trainable parameters of SVDP on the pixels with large distribution shifts. Furthermore, recognizing that each target domain sample exhibits a unique domain shift, we design Domain Prompt Updating (DPU) strategy to optimize prompt parameters differently for each sample, facilitating efficient adaptation to the target domain. Extensive experiments were conducted on widely-used TTA and continual TTA benchmarks, and our proposed method achieves state-of-the-art performance in both semantic segmentation and depth estimation tasks.
Paper Structure (26 sections, 6 equations, 9 figures, 9 tables)

This paper contains 26 sections, 6 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: The motivation and main idea of our method.(a) Previous dense visual domain prompts (VDP) mask out consecutive spatial details in the placed regions as shown in red circles. In dense prediction DA problems, applying dense VDP will lead to inaccurate context information extraction and severe performance degradation. (b) We introduce Sparse Visual Domain Prompts (SVDP), which are tailored for addressing the occlusion problem of pixel-wise information and can better extract local domain knowledge for cross-domain learning. Though the parameters of SVDP are less than VDP, SVDP achieves better semantic segmentation performance in the Test Time Adaptation.
  • Figure 2: The overall framework.Left: We warp the SVDP into the image and place prompt parameters on the selected pixel by the Domain Prompt Placement (DPP) method. The reformulated image serves as the input of the teacher and student model. We obtain the uncertainty map as described in Eq. \ref{['eq:mc']} through the teacher model. The uncertainty map is used to evaluate the degree of pixel-level distribution shift. SVDP adopts consistency loss (Eq. \ref{['eq:loss']}) and exponential moving average (EMA) as the optimization strategies. Right: Domain Prompt Updating (DPU). Based on the image-level uncertainty value, we adopt different EMA weights to realize stable updating of SVDP parameters, facilitating efficient adaptation to the target domain.
  • Figure 3: The detailed process of Domain Prompt Placing. The uncertainty map is estimated by MC Dropout gal2016dropout. The SVDP parameters are placed on the pixels with high uncertainty, then warp into the raw image.
  • Figure 4: The process of Domain Prompt Updating. We adaptively adjust the prompt EMA updating rate for each target domain sample based on image-level uncertainty value.
  • Figure 5: Qualitative comparison of our method with previous SOTA methods on the ACDC dataset. Our method could better segment different pixel-wise classes such as shown in the white box.
  • ...and 4 more figures