Table of Contents
Fetching ...

Pluralistic Salient Object Detection

Xuelu Feng, Yunsheng Li, Dongdong Chen, Chunming Qiao, Junsong Yuan, Lu Yuan, Gang Hua

TL;DR

Two new SOD datasets “DUTS-MM” and “DUTS-MQ” are presented, along with newly designed evaluation metrics, and a simple yet effective pluralistic SOD baseline based on a Mixture-of-Experts (MOE) design is proposed.

Abstract

We introduce pluralistic salient object detection (PSOD), a novel task aimed at generating multiple plausible salient segmentation results for a given input image. Unlike conventional SOD methods that produce a single segmentation mask for salient objects, this new setting recognizes the inherent complexity of real-world images, comprising multiple objects, and the ambiguity in defining salient objects due to different user intentions. To study this task, we present two new SOD datasets "DUTS-MM" and "DUS-MQ", along with newly designed evaluation metrics. DUTS-MM builds upon the DUTS dataset but enriches the ground-truth mask annotations from three aspects which 1) improves the mask quality especially for boundary and fine-grained structures; 2) alleviates the annotation inconsistency issue; and 3) provides multiple ground-truth masks for images with saliency ambiguity. DUTS-MQ consists of approximately 100K image-mask pairs with human-annotated preference scores, enabling the learning of real human preferences in measuring mask quality. Building upon these two datasets, we propose a simple yet effective pluralistic SOD baseline based on a Mixture-of-Experts (MOE) design. Equipped with two prediction heads, it simultaneously predicts multiple masks using different query prompts and predicts human preference scores for each mask candidate. Extensive experiments and analyses underscore the significance of our proposed datasets and affirm the effectiveness of our PSOD framework.

Pluralistic Salient Object Detection

TL;DR

Two new SOD datasets “DUTS-MM” and “DUTS-MQ” are presented, along with newly designed evaluation metrics, and a simple yet effective pluralistic SOD baseline based on a Mixture-of-Experts (MOE) design is proposed.

Abstract

We introduce pluralistic salient object detection (PSOD), a novel task aimed at generating multiple plausible salient segmentation results for a given input image. Unlike conventional SOD methods that produce a single segmentation mask for salient objects, this new setting recognizes the inherent complexity of real-world images, comprising multiple objects, and the ambiguity in defining salient objects due to different user intentions. To study this task, we present two new SOD datasets "DUTS-MM" and "DUS-MQ", along with newly designed evaluation metrics. DUTS-MM builds upon the DUTS dataset but enriches the ground-truth mask annotations from three aspects which 1) improves the mask quality especially for boundary and fine-grained structures; 2) alleviates the annotation inconsistency issue; and 3) provides multiple ground-truth masks for images with saliency ambiguity. DUTS-MQ consists of approximately 100K image-mask pairs with human-annotated preference scores, enabling the learning of real human preferences in measuring mask quality. Building upon these two datasets, we propose a simple yet effective pluralistic SOD baseline based on a Mixture-of-Experts (MOE) design. Equipped with two prediction heads, it simultaneously predicts multiple masks using different query prompts and predicts human preference scores for each mask candidate. Extensive experiments and analyses underscore the significance of our proposed datasets and affirm the effectiveness of our PSOD framework.
Paper Structure (25 sections, 3 equations, 11 figures, 7 tables)

This paper contains 25 sections, 3 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: Three representative examples that illustrate the inherent ambiguity in defining salient objects. Salient object detection is an inherently ambiguous task. Therefore, one image with vague background or with more than two objects can have more than one salient region, which will result in more than one saliency maps.
  • Figure 2: The inference process comparison between our method and previous SOD methods. When encountered images with ambiguity like the displayed image with blurred background, previous methods may predict grey in ambiguous regions, while our method outputs multiple masks and also grades the quality of masks during the inference stage.
  • Figure 3: The overall architecture of our method, consisting of three parts: backbone encoder, FPN-based neck and prompt-driven mask decoder. To enable the model to handle two different tasks, we employ an MoE (Mixture of Experts) mechanism in the encoder.
  • Figure 4: The visualization of coarse annotation issue (a) and inconsistent annotation issue (b) in DUTS and the comparison to our newly labeled DUTS-MM dataseet. We use red boxes to denote the coarsely/inconsistently annotated regions.
  • Figure 5: Distribution of the number of ground-truth (GT) mask each image has in the DUTS-MM training (top) and test (bottom) splits respectively.
  • ...and 6 more figures