Table of Contents
Fetching ...

DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse Conditions

Sanket Kalwar, Mihir Ungarala, Shruti Jain, Aaron Monis, Krishna Reddy Konda, Sourav Garg, K Madhava Krishna

TL;DR

DiffPrompter addresses semantic segmentation in adverse weather by introducing differentiable visual prompts and a differentiable adaptor framework that augments foundation-model backbones. It provides a $\nabla$HFC image processing block and a shallow vision encoder to jointly learn visual prompts and latent embeddings, enabling both parallel (PDA) and sequential (SDA) adaptor architectures. Through extensive experiments on datasets like BDD100K, ACDC, Wild-Dash, Dark-Zurich, COD10K, and CAMO, the approach achieves superior out-of-distribution generalization and improved segmentation performance on both high-level and low-level tasks. The work demonstrates the importance of integrating local and global representations via differentiable prompts and suggests future directions toward visual-language prompting and 3D scene understanding for robust perception in autonomous driving.

Abstract

Semantic segmentation in adverse weather scenarios is a critical task for autonomous driving systems. While foundation models have shown promise, the need for specialized adaptors becomes evident for handling more challenging scenarios. We introduce DiffPrompter, a novel differentiable visual and latent prompting mechanism aimed at expanding the learning capabilities of existing adaptors in foundation models. Our proposed $\nabla$HFC image processing block excels particularly in adverse weather conditions, where conventional methods often fall short. Furthermore, we investigate the advantages of jointly training visual and latent prompts, demonstrating that this combined approach significantly enhances performance in out-of-distribution scenarios. Our differentiable visual prompts leverage parallel and series architectures to generate prompts, effectively improving object segmentation tasks in adverse conditions. Through a comprehensive series of experiments and evaluations, we provide empirical evidence to support the efficacy of our approach. Project page at https://diffprompter.github.io.

DiffPrompter: Differentiable Implicit Visual Prompts for Semantic-Segmentation in Adverse Conditions

TL;DR

DiffPrompter addresses semantic segmentation in adverse weather by introducing differentiable visual prompts and a differentiable adaptor framework that augments foundation-model backbones. It provides a HFC image processing block and a shallow vision encoder to jointly learn visual prompts and latent embeddings, enabling both parallel (PDA) and sequential (SDA) adaptor architectures. Through extensive experiments on datasets like BDD100K, ACDC, Wild-Dash, Dark-Zurich, COD10K, and CAMO, the approach achieves superior out-of-distribution generalization and improved segmentation performance on both high-level and low-level tasks. The work demonstrates the importance of integrating local and global representations via differentiable prompts and suggests future directions toward visual-language prompting and 3D scene understanding for robust perception in autonomous driving.

Abstract

Semantic segmentation in adverse weather scenarios is a critical task for autonomous driving systems. While foundation models have shown promise, the need for specialized adaptors becomes evident for handling more challenging scenarios. We introduce DiffPrompter, a novel differentiable visual and latent prompting mechanism aimed at expanding the learning capabilities of existing adaptors in foundation models. Our proposed HFC image processing block excels particularly in adverse weather conditions, where conventional methods often fall short. Furthermore, we investigate the advantages of jointly training visual and latent prompts, demonstrating that this combined approach significantly enhances performance in out-of-distribution scenarios. Our differentiable visual prompts leverage parallel and series architectures to generate prompts, effectively improving object segmentation tasks in adverse conditions. Through a comprehensive series of experiments and evaluations, we provide empirical evidence to support the efficacy of our approach. Project page at https://diffprompter.github.io.
Paper Structure (28 sections, 8 equations, 4 figures, 6 tables)

This paper contains 28 sections, 8 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overview: To tackle Semantic Segmentation in adverse weather conditions, we introduce the DiffPrompter framework (explained later in Fig. \ref{['fig:diffprompter']}), through which we create a Serial Differentiable Adaptor (SDA) and Parallel Differentiable Adaptor (PDA). Both these adaptors achieve superior results as compared to the current state-of-the-art (SOTA) methods: SAM-Adapter samAdapt (left) and EVP evp (right), where columns in each case respectively represent the ground-truth segmentation mask, the baseline method (SAM-Adapter/EVP), and our proposed method (PDA/SDA). In this representation, the model's masked output is shown in red, with green bounding boxes highlighting correct predictions and red bounding boxes indicating missed segmentation outputs (false negatives) or incorrect segmentation outputs (false positives). It is evident that our proposed methods, PDA and SDA, demonstrate superior qualitative performance compared to the SOTA methods, EVP and SAM-Adapter.
  • Figure 2: We propose DiffPrompter, a semantic segmentation method that utilizes the visual and latent prompts generated by the prompt generator. These prompts are then used by the prompt decoder to generate a semantic mask for segmenting objects, especially in adverse conditions and low-level segmentation tasks.
  • Figure 3: Framework overview, First, the input image is sent to a Parallel Differentiable Adaptor ($PDA$, Sec.\ref{['sec:pda']}) and Sequential Differentiable Adaptor ($SDA$, Sec.\ref{['sec:sda']}) segmentation model through various stages of transformer encoder and decoder layers which in turn generates semantic segmentation mask as output, details in Sec.\ref{['sec:psm']}, $DiffAdaptor$ is used as a building block in both $PDA$ and $SDA$ which contains $DiffVP$ block that generates differentiable visual prompt details of both are discussed in Sec.\ref{['sec:da']} and Sec.\ref{['sec:dvp']}.
  • Figure 4: Precision & Recall graph, where row 1 corresponds to Precision and row 2 corresponds to Recall, also columns 1-4 belong to BDD100K, ACDC, Dark-Zurich and Wild-Dash datasets respectively. The red colour belongs to the EVP method while the blue colour belongs to SDA(ours).