CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection
Jiayi Zhu, Qing Guo, Felix Juefei-Xu, Yihao Huang, Yang Liu, Geguang Pu
TL;DR
CosalPure tackles the fragility of co-salient object detection (CoSOD) under adversarial perturbations by introducing a concept-driven purification framework. It first learns a group-consistent co-salient concept c via a pre-trained text-to-image diffusion model using textual inversion, and then purifies group images by diffusion generation conditioned on c to improve CoSOD robustness. Empirical results across Cosal2015, iCoseg, CoSOD3k, and CoCA show CosalPure outperforming diffusion-based baselines like DiffPure and DDA in adversarial settings and under motion blur, validating the approach’s effectiveness. The work demonstrates that incorporating object-level semantics into purification yields clearer co-salient maps and more reliable CoSOD performance in manipulated or corrupted imagery.
Abstract
Co-salient object detection (CoSOD) aims to identify the common and salient (usually in the foreground) regions across a given group of images. Although achieving significant progress, state-of-the-art CoSODs could be easily affected by some adversarial perturbations, leading to substantial accuracy reduction. The adversarial perturbations can mislead CoSODs but do not change the high-level semantic information (e.g., concept) of the co-salient objects. In this paper, we propose a novel robustness enhancement framework by first learning the concept of the co-salient objects based on the input group images and then leveraging this concept to purify adversarial perturbations, which are subsequently fed to CoSODs for robustness enhancement. Specifically, we propose CosalPure containing two modules, i.e., group-image concept learning and concept-guided diffusion purification. For the first module, we adopt a pre-trained text-to-image diffusion model to learn the concept of co-salient objects within group images where the learned concept is robust to adversarial examples. For the second module, we map the adversarial image to the latent space and then perform diffusion generation by embedding the learned concept into the noise prediction function as an extra condition. Our method can effectively alleviate the influence of the SOTA adversarial attack containing different adversarial patterns, including exposure and noise. The extensive results demonstrate that our method could enhance the robustness of CoSODs significantly.
