Table of Contents
Fetching ...

Salient Objects in Clutter

Deng-Ping Fan, Jing Zhang, Gang Xu, Ming-Ming Cheng, Ling Shao

TL;DR

This work addresses the data selection bias in existing SOD benchmarks that favor images with a single clear salient object. It introduces the SOC dataset, a large-scale, instance-level SOD collection that includes non-salient content and rich object attributes to reflect cluttered real-world scenes, alongside a comprehensive benchmark of 46 traditional and 54 deep models. The authors propose three dataset-enhancement strategies—label smoothing, random data augmentation, and self-supervised learning—and show they yield measurable gains across models, with a final objective that combines these techniques. Through extensive attribute-based analyses and cross-dataset generalization tests, the paper demonstrates SOC's utility for deeper insight into SOD and outlines multidimensional future directions for the field.

Abstract

This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets, which unrealistically assume that each image should contain at least one clear and uncluttered salient object. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. However, these models are still far from satisfactory when applied to real-world scenes. Based on our analyses, we propose a new high-quality dataset and update the previous saliency benchmark. Specifically, our dataset, called Salient Objects in Clutter~\textbf{(SOC)}, includes images with both salient and non-salient objects from several common object categories. In addition to object category annotations, each salient image is accompanied by attributes that reflect common challenges in common scenes, which can help provide deeper insight into the SOD problem. Further, with a given saliency encoder, e.g., the backbone network, existing saliency models are designed to achieve mapping from the training image set to the training ground-truth set. We, therefore, argue that improving the dataset can yield higher performance gains than focusing only on the decoder design. With this in mind, we investigate several dataset-enhancement strategies, including label smoothing to implicitly emphasize salient boundaries, random image augmentation to adapt saliency models to various scenarios, and self-supervised learning as a regularization strategy to learn from small datasets. Our extensive results demonstrate the effectiveness of these tricks. We also provide a comprehensive benchmark for SOD, which can be found in our repository: https://github.com/DengPingFan/SODBenchmark.

Salient Objects in Clutter

TL;DR

This work addresses the data selection bias in existing SOD benchmarks that favor images with a single clear salient object. It introduces the SOC dataset, a large-scale, instance-level SOD collection that includes non-salient content and rich object attributes to reflect cluttered real-world scenes, alongside a comprehensive benchmark of 46 traditional and 54 deep models. The authors propose three dataset-enhancement strategies—label smoothing, random data augmentation, and self-supervised learning—and show they yield measurable gains across models, with a final objective that combines these techniques. Through extensive attribute-based analyses and cross-dataset generalization tests, the paper demonstrates SOC's utility for deeper insight into SOD and outlines multidimensional future directions for the field.

Abstract

This paper identifies and addresses a serious design bias of existing salient object detection (SOD) datasets, which unrealistically assume that each image should contain at least one clear and uncluttered salient object. This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets. However, these models are still far from satisfactory when applied to real-world scenes. Based on our analyses, we propose a new high-quality dataset and update the previous saliency benchmark. Specifically, our dataset, called Salient Objects in Clutter~\textbf{(SOC)}, includes images with both salient and non-salient objects from several common object categories. In addition to object category annotations, each salient image is accompanied by attributes that reflect common challenges in common scenes, which can help provide deeper insight into the SOD problem. Further, with a given saliency encoder, e.g., the backbone network, existing saliency models are designed to achieve mapping from the training image set to the training ground-truth set. We, therefore, argue that improving the dataset can yield higher performance gains than focusing only on the decoder design. With this in mind, we investigate several dataset-enhancement strategies, including label smoothing to implicitly emphasize salient boundaries, random image augmentation to adapt saliency models to various scenarios, and self-supervised learning as a regularization strategy to learn from small datasets. Our extensive results demonstrate the effectiveness of these tricks. We also provide a comprehensive benchmark for SOD, which can be found in our repository: https://github.com/DengPingFan/SODBenchmark.

Paper Structure

This paper contains 24 sections, 11 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: Examples from our new SOC dataset, including non-salient (first row) and salient object images (rows 2 to 4). For salient object images, an instance-level ground-truth map (different color), object attributes (Attr) and category labels are provided.
  • Figure 2: Taxonomy of the saliency detection task. We highlight the scope of this study in gray. See $\S$\ref{['sec:relatedworks']} for details.
  • Figure 3: Previous SOD datasets only annotate the images by drawing pixel-accurate silhouettes around salient objects (b). Different from object segmentation datasets lin2014microsoft (d) where (objects are not necessarily salient), our SOC provides salient instances (c). We provide a high-quality and large-scale annotated dataset comprised of images that better capture the properties of real-world scenes.
  • Figure 4: (a) Number of annotated instances per category in our SOC dataset. (b, c) Global and local color contrast statistics, respectively. (d) A set of saliency maps from our dataset and their overlay map. (e) Location distribution of the salient objects in SOC. (f) Distribution of instance sizes in the SOC and ILSO li2017instance datasets. (g) Visual examples of attributes. Best view on screen and zoomed-in for details.
  • Figure 5: Examples of non-salient objects in our dataset. a) Crowded scene, b) motion blur, and c) background with non-interesting regions.
  • ...and 4 more figures