Table of Contents
Fetching ...

Unified Unsupervised Salient Object Detection via Knowledge Transfer

Yao Yuan, Wutao Liu, Pan Gao, Qun Dai, Jie Qin

TL;DR

This work tackles unsupervised salient object detection (USOD) across diverse tasks by proposing a unified framework that learns saliency knowledge from Natural Still Image (NSI) SOD and transfers it to non-NSI tasks. The core ideas are Progressive Curriculum Learning-based Saliency Distilling (PCL-SD) to robustly extract saliency cues from easy to hard samples, and Self-rectify Pseudo-label Refinement (SPR) to progressively improve pseudo-labels via posterior and prior rectifications, coupled with an adapter-tuning strategy to transfer knowledge to non-NSI domains. The approach achieves state-of-the-art or competitive results on RGB, RGB-D, RGB-T, video SOD, and RSI SOD benchmarks, demonstrating strong cross-task generalization and effective zero-shot transfer with targeted fine-tuning. The proposed modality-agnostic yet knowledge-sharing pipeline provides practical implications for data-scarce SOD tasks and real-world applications where annotated data is limited or unavailable.

Abstract

Recently, unsupervised salient object detection (USOD) has gained increasing attention due to its annotation-free nature. However, current methods mainly focus on specific tasks such as RGB and RGB-D, neglecting the potential for task migration. In this paper, we propose a unified USOD framework for generic USOD tasks. Firstly, we propose a Progressive Curriculum Learning-based Saliency Distilling (PCL-SD) mechanism to extract saliency cues from a pre-trained deep network. This mechanism starts with easy samples and progressively moves towards harder ones, to avoid initial interference caused by hard samples. Afterwards, the obtained saliency cues are utilized to train a saliency detector, and we employ a Self-rectify Pseudo-label Refinement (SPR) mechanism to improve the quality of pseudo-labels. Finally, an adapter-tuning method is devised to transfer the acquired saliency knowledge, leveraging shared knowledge to attain superior transferring performance on the target tasks. Extensive experiments on five representative SOD tasks confirm the effectiveness and feasibility of our proposed method. Code and supplement materials are available at https://github.com/I2-Multimedia-Lab/A2S-v3.

Unified Unsupervised Salient Object Detection via Knowledge Transfer

TL;DR

This work tackles unsupervised salient object detection (USOD) across diverse tasks by proposing a unified framework that learns saliency knowledge from Natural Still Image (NSI) SOD and transfers it to non-NSI tasks. The core ideas are Progressive Curriculum Learning-based Saliency Distilling (PCL-SD) to robustly extract saliency cues from easy to hard samples, and Self-rectify Pseudo-label Refinement (SPR) to progressively improve pseudo-labels via posterior and prior rectifications, coupled with an adapter-tuning strategy to transfer knowledge to non-NSI domains. The approach achieves state-of-the-art or competitive results on RGB, RGB-D, RGB-T, video SOD, and RSI SOD benchmarks, demonstrating strong cross-task generalization and effective zero-shot transfer with targeted fine-tuning. The proposed modality-agnostic yet knowledge-sharing pipeline provides practical implications for data-scarce SOD tasks and real-world applications where annotated data is limited or unavailable.

Abstract

Recently, unsupervised salient object detection (USOD) has gained increasing attention due to its annotation-free nature. However, current methods mainly focus on specific tasks such as RGB and RGB-D, neglecting the potential for task migration. In this paper, we propose a unified USOD framework for generic USOD tasks. Firstly, we propose a Progressive Curriculum Learning-based Saliency Distilling (PCL-SD) mechanism to extract saliency cues from a pre-trained deep network. This mechanism starts with easy samples and progressively moves towards harder ones, to avoid initial interference caused by hard samples. Afterwards, the obtained saliency cues are utilized to train a saliency detector, and we employ a Self-rectify Pseudo-label Refinement (SPR) mechanism to improve the quality of pseudo-labels. Finally, an adapter-tuning method is devised to transfer the acquired saliency knowledge, leveraging shared knowledge to attain superior transferring performance on the target tasks. Extensive experiments on five representative SOD tasks confirm the effectiveness and feasibility of our proposed method. Code and supplement materials are available at https://github.com/I2-Multimedia-Lab/A2S-v3.
Paper Structure (33 sections, 15 equations, 14 figures, 8 tables)

This paper contains 33 sections, 15 equations, 14 figures, 8 tables.

Figures (14)

  • Figure 1: The proposed framework includes two types of knowledge transfer: (1) From pre-trained deep network to saliency cue extractor; (2) From Natural Still Image (NSI) SOD to non-NSI SOD.
  • Figure 2: Overview of the proposed method. The left side represents the training process on NSI SOD, while the right side shows the training process of transferring to non-NSI SOD tasks.
  • Figure 3: Illustration of the proposed PCL-SD. Hard samples are progressively incorporated as the training progresses.
  • Figure 4: The comparison between initial pseudo-label, saliency prediction, and prior rectification.
  • Figure 5: The relevance between different SOD tasks. The overlaps can be seen as shared common knowledge.
  • ...and 9 more figures