Table of Contents
Fetching ...

CONDA: Condensed Deep Association Learning for Co-Salient Object Detection

Long Li, Nian Liu, Dingwen Zhang, Zhongyu Li, Salman Khan, Rao Anwer, Hisham Cholakkal, Junwei Han, Fahad Shahbaz Khan

TL;DR

This work addresses co-salient object detection by tackling the limitations of relying on raw inter-image associations. It introduces CONDA, a deep association learning framework that converts pixel-wise hyperassociations into deep association features within an FPN-based decoder, leveraging a Progressive Association Generation (PAG) module, Correspondence-induced Association Condensation (CAC) to condense associations, and an Object-aware Cycle Consistency (OCC) loss to supervise pixel-level correspondences. The approach achieves state-of-the-art results across CoCA, CoSal2015, and CoSOD3k under various training setups, with ablations confirming the importance of PAG, CAC, and OCC and demonstrating reduced computation via condensation. By explicitly modeling inter-image association knowledge and pixel-level correspondences, CONDA offers a robust and scalable pathway for improving co-saliency detection and related inter-image tasks.

Abstract

Inter-image association modeling is crucial for co-salient object detection. Despite satisfactory performance, previous methods still have limitations on sufficient inter-image association modeling. Because most of them focus on image feature optimization under the guidance of heuristically calculated raw inter-image associations. They directly rely on raw associations which are not reliable in complex scenarios, and their image feature optimization approach is not explicit for inter-image association modeling. To alleviate these limitations, this paper proposes a deep association learning strategy that deploys deep networks on raw associations to explicitly transform them into deep association features. Specifically, we first create hyperassociations to collect dense pixel-pair-wise raw associations and then deploys deep aggregation networks on them. We design a progressive association generation module for this purpose with additional enhancement of the hyperassociation calculation. More importantly, we propose a correspondence-induced association condensation module that introduces a pretext task, i.e. semantic correspondence estimation, to condense the hyperassociations for computational burden reduction and noise elimination. We also design an object-aware cycle consistency loss for high-quality correspondence estimations. Experimental results in three benchmark datasets demonstrate the remarkable effectiveness of our proposed method with various training settings.

CONDA: Condensed Deep Association Learning for Co-Salient Object Detection

TL;DR

This work addresses co-salient object detection by tackling the limitations of relying on raw inter-image associations. It introduces CONDA, a deep association learning framework that converts pixel-wise hyperassociations into deep association features within an FPN-based decoder, leveraging a Progressive Association Generation (PAG) module, Correspondence-induced Association Condensation (CAC) to condense associations, and an Object-aware Cycle Consistency (OCC) loss to supervise pixel-level correspondences. The approach achieves state-of-the-art results across CoCA, CoSal2015, and CoSOD3k under various training setups, with ablations confirming the importance of PAG, CAC, and OCC and demonstrating reduced computation via condensation. By explicitly modeling inter-image association knowledge and pixel-level correspondences, CONDA offers a robust and scalable pathway for improving co-saliency detection and related inter-image tasks.

Abstract

Inter-image association modeling is crucial for co-salient object detection. Despite satisfactory performance, previous methods still have limitations on sufficient inter-image association modeling. Because most of them focus on image feature optimization under the guidance of heuristically calculated raw inter-image associations. They directly rely on raw associations which are not reliable in complex scenarios, and their image feature optimization approach is not explicit for inter-image association modeling. To alleviate these limitations, this paper proposes a deep association learning strategy that deploys deep networks on raw associations to explicitly transform them into deep association features. Specifically, we first create hyperassociations to collect dense pixel-pair-wise raw associations and then deploys deep aggregation networks on them. We design a progressive association generation module for this purpose with additional enhancement of the hyperassociation calculation. More importantly, we propose a correspondence-induced association condensation module that introduces a pretext task, i.e. semantic correspondence estimation, to condense the hyperassociations for computational burden reduction and noise elimination. We also design an object-aware cycle consistency loss for high-quality correspondence estimations. Experimental results in three benchmark datasets demonstrate the remarkable effectiveness of our proposed method with various training settings.
Paper Structure (15 sections, 16 equations, 5 figures, 2 tables)

This paper contains 15 sections, 16 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Difference of raw-association-based image feature optimization strategy (a) and our proposed deep association learning strategy (b). Our deep association learning deploys deep learning networks on raw associations to achieve deep association features. We also present visual samples of our calculated raw associations (Raw Asso.), optimized image feature (Opt. Fea.), and our generated deep association features (Asso. Fea.) in (c).
  • Figure 2: Overall flowchart of our CONDA model. Specifically, CONDA first utilizes the image features to calculate hyperassociations. Then, the full-pixel hyperassociations are condensed by CAC and fed into the aggregation networks to achieve deep association features. These features are then used in the FPN decoder process for the final prediction. To be concise, only three related images are shown.
  • Figure 3: Difference between full-pixel hyperassociation (a) and condensed hyperassociation (b). We provide an example of collecting the pixel associations from image $I_j$ for a pixel $(h_i, w_i)$ in image $I_i$. Full-pixel hyperassociation collects all pixel associations in $I_j$, while our condensed hyperassociation only collects the associations of its correspondence pixel $(h_j, w_j)$ (red dot) and surrounding pixels (green dots). We first heuristically find an initial pixel $(h_j^0, w_j^0)$ with a fixed surrounding window and then learn coordinate offsets to locate the optimized correspondence and surrounding pixels.
  • Figure 4: Visual samples for the correspondence estimations. Correspondences I, II, and III visually display estimated correspondences between the main image and three related images. Sparse co-salient pixels were selected and connected to their corresponding pixels using colored lines for clear visualization.
  • Figure 5: Qualitative comparisons of our model with other SOTA methods.