Table of Contents
Fetching ...

SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching

Xinghui Li, Jingyi Lu, Kai Han, Victor Prisacariu

TL;DR

This paper demonstrates that by employing a basic prompt tuning technique, the inherent potential of Stable Diffusion can be harnessed, resulting in a significant enhancement in accuracy over previous approaches, and introduces a novel conditional prompting module that conditions the prompt on the local details of the input image pairs, leading to further improvement in performance.

Abstract

In this paper, we address the challenge of matching semantically similar keypoints across image pairs. Existing research indicates that the intermediate output of the UNet within the Stable Diffusion (SD) can serve as robust image feature maps for such a matching task. We demonstrate that by employing a basic prompt tuning technique, the inherent potential of Stable Diffusion can be harnessed, resulting in a significant enhancement in accuracy over previous approaches. We further introduce a novel conditional prompting module that conditions the prompt on the local details of the input image pairs, leading to a further improvement in performance. We designate our approach as SD4Match, short for Stable Diffusion for Semantic Matching. Comprehensive evaluations of SD4Match on the PF-Pascal, PF-Willow, and SPair-71k datasets show that it sets new benchmarks in accuracy across all these datasets. Particularly, SD4Match outperforms the previous state-of-the-art by a margin of 12 percentage points on the challenging SPair-71k dataset.

SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching

TL;DR

This paper demonstrates that by employing a basic prompt tuning technique, the inherent potential of Stable Diffusion can be harnessed, resulting in a significant enhancement in accuracy over previous approaches, and introduces a novel conditional prompting module that conditions the prompt on the local details of the input image pairs, leading to further improvement in performance.

Abstract

In this paper, we address the challenge of matching semantically similar keypoints across image pairs. Existing research indicates that the intermediate output of the UNet within the Stable Diffusion (SD) can serve as robust image feature maps for such a matching task. We demonstrate that by employing a basic prompt tuning technique, the inherent potential of Stable Diffusion can be harnessed, resulting in a significant enhancement in accuracy over previous approaches. We further introduce a novel conditional prompting module that conditions the prompt on the local details of the input image pairs, leading to a further improvement in performance. We designate our approach as SD4Match, short for Stable Diffusion for Semantic Matching. Comprehensive evaluations of SD4Match on the PF-Pascal, PF-Willow, and SPair-71k datasets show that it sets new benchmarks in accuracy across all these datasets. Particularly, SD4Match outperforms the previous state-of-the-art by a margin of 12 percentage points on the challenging SPair-71k dataset.
Paper Structure (26 sections, 8 equations, 11 figures, 4 tables)

This paper contains 26 sections, 8 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: The general pipeline of SD4Match. We present three prompt tuning options for our method: Single, Class, and conditional prompting module (CPM). The prompt is tuned by the cross-entropy loss between the predicted probability map and the ground-truth probability map of given query points. During inference, we use Kernel-Softmax proposed by lee2019sfnet to localize correspondences.
  • Figure 2: Illustration of the architecture of our conditional prompting module.
  • Figure 4: Visualization of the learned class-specific prompt in SD4Match-Class.
  • Figure 5: Visualization of the learned conditional prompt in SD4Match-CPM.
  • Figure 6: Correlation between the query feature and the target image with different prompts. Warmer colors indicate a higher correlation.
  • ...and 6 more figures