Table of Contents
Fetching ...

CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection

Jiayi Zhu, Qing Guo, Felix Juefei-Xu, Yihao Huang, Yang Liu, Geguang Pu

TL;DR

CosalPure tackles the fragility of co-salient object detection (CoSOD) under adversarial perturbations by introducing a concept-driven purification framework. It first learns a group-consistent co-salient concept c via a pre-trained text-to-image diffusion model using textual inversion, and then purifies group images by diffusion generation conditioned on c to improve CoSOD robustness. Empirical results across Cosal2015, iCoseg, CoSOD3k, and CoCA show CosalPure outperforming diffusion-based baselines like DiffPure and DDA in adversarial settings and under motion blur, validating the approach’s effectiveness. The work demonstrates that incorporating object-level semantics into purification yields clearer co-salient maps and more reliable CoSOD performance in manipulated or corrupted imagery.

Abstract

Co-salient object detection (CoSOD) aims to identify the common and salient (usually in the foreground) regions across a given group of images. Although achieving significant progress, state-of-the-art CoSODs could be easily affected by some adversarial perturbations, leading to substantial accuracy reduction. The adversarial perturbations can mislead CoSODs but do not change the high-level semantic information (e.g., concept) of the co-salient objects. In this paper, we propose a novel robustness enhancement framework by first learning the concept of the co-salient objects based on the input group images and then leveraging this concept to purify adversarial perturbations, which are subsequently fed to CoSODs for robustness enhancement. Specifically, we propose CosalPure containing two modules, i.e., group-image concept learning and concept-guided diffusion purification. For the first module, we adopt a pre-trained text-to-image diffusion model to learn the concept of co-salient objects within group images where the learned concept is robust to adversarial examples. For the second module, we map the adversarial image to the latent space and then perform diffusion generation by embedding the learned concept into the noise prediction function as an extra condition. Our method can effectively alleviate the influence of the SOTA adversarial attack containing different adversarial patterns, including exposure and noise. The extensive results demonstrate that our method could enhance the robustness of CoSODs significantly.

CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection

TL;DR

CosalPure tackles the fragility of co-salient object detection (CoSOD) under adversarial perturbations by introducing a concept-driven purification framework. It first learns a group-consistent co-salient concept c via a pre-trained text-to-image diffusion model using textual inversion, and then purifies group images by diffusion generation conditioned on c to improve CoSOD robustness. Empirical results across Cosal2015, iCoseg, CoSOD3k, and CoCA show CosalPure outperforming diffusion-based baselines like DiffPure and DDA in adversarial settings and under motion blur, validating the approach’s effectiveness. The work demonstrates that incorporating object-level semantics into purification yields clearer co-salient maps and more reliable CoSOD performance in manipulated or corrupted imagery.

Abstract

Co-salient object detection (CoSOD) aims to identify the common and salient (usually in the foreground) regions across a given group of images. Although achieving significant progress, state-of-the-art CoSODs could be easily affected by some adversarial perturbations, leading to substantial accuracy reduction. The adversarial perturbations can mislead CoSODs but do not change the high-level semantic information (e.g., concept) of the co-salient objects. In this paper, we propose a novel robustness enhancement framework by first learning the concept of the co-salient objects based on the input group images and then leveraging this concept to purify adversarial perturbations, which are subsequently fed to CoSODs for robustness enhancement. Specifically, we propose CosalPure containing two modules, i.e., group-image concept learning and concept-guided diffusion purification. For the first module, we adopt a pre-trained text-to-image diffusion model to learn the concept of co-salient objects within group images where the learned concept is robust to adversarial examples. For the second module, we map the adversarial image to the latent space and then perform diffusion generation by embedding the learned concept into the noise prediction function as an extra condition. Our method can effectively alleviate the influence of the SOTA adversarial attack containing different adversarial patterns, including exposure and noise. The extensive results demonstrate that our method could enhance the robustness of CoSODs significantly.
Paper Structure (17 sections, 11 equations, 7 figures, 4 tables)

This paper contains 17 sections, 11 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Examples of our method CosalPure and comparative results before and after purification. CosalPure comprises two modules: group-image concept learning and concept-guided purification. Firstly, the concept learning module inputs a group of images that contain some adversarial cases and obtain their shared co-salient semantic information (i.e., the learned concept), denoted as c. We can validate the effectiveness of the learned c through the visualization via a text-to-image (T2I) diffusion model. Secondly, steered by the previously learned concept, we employ certain diffusion generation techniques to purify the entire group of images. Before our purification, the co-salient object detection results are poor, but after purification, the detection results are satisfactory. Please enlarge to see more details.
  • Figure 2: CoSOD results for DiffPure.The input images are under the attack method gao2022can. Processed by DiffPure nie2022diffusion, the purified images perform inferior in the CoSOD task together with their respective group images.
  • Figure 3: Overview of CosalPure. The details of (a) are in Sec. \ref{['subsec:concept_learn']}, while the details of (b) are in Sec. \ref{['subsec:concept_inversion']}.
  • Figure 4: Demonstration of the effectiveness of concept learning. (a) Five clean images are utilized for concept learning, and the learned concept can be reconstructed into an image through a pre-trained text-to-image model. (b) The first two images are attacked by Jadena gao2022can while the subsequent three images are clean, and the learned concept can also be reconstructed into a high-quality image. (a) and (b) use the same random seed.
  • Figure 5: Attention maps for learned concepts on processed images.
  • ...and 2 more figures