Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection

Feiran Li; Qianqian Xu; Shilong Bao; Zhiyong Yang; Runmin Cong; Xiaochun Cao; Qingming Huang

Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection

Feiran Li, Qianqian Xu, Shilong Bao, Zhiyong Yang, Runmin Cong, Xiaochun Cao, Qingming Huang

TL;DR

This work reveals that standard salient object detection metrics are biased toward larger objects in images with multiple salient targets due to size-weighted contributions. It introduces a size-invariant evaluation protocol and per-object metrics ($\mathsf{SI\text{-}MAE}, \mathsf{SI\text{-}F}, \mathsf{SI\text{-}AUC}$) by partitioning images into foreground frames and a background frame, effectively removing the weight $P_{X_i}$. It also proposes a generic size-invariant optimization objective $\mathcal{L}_{\mathsf{SI}}(f)=\sum_{k=1}^K \ell(f_k^{fore}) + \alpha \ell(f_{K+1}^{back})$ and provides a generalization bound showing favorable scaling with sample size $N$ and image size $K=H\times W$. Empirically, SI-SOD yields consistent improvements across benchmarks (MSOD, DUTS-TE) for multiple backbones, notably enhancing small-object and multi-object detection while maintaining competitive traditional metrics; code is available at the authors' GitHub repository.

Abstract

This paper explores the size-invariance of evaluation metrics in Salient Object Detection (SOD), especially when multiple targets of diverse sizes co-exist in the same image. We observe that current metrics are size-sensitive, where larger objects are focused, and smaller ones tend to be ignored. We argue that the evaluation should be size-invariant because bias based on size is unjustified without additional semantic information. In pursuit of this, we propose a generic approach that evaluates each salient object separately and then combines the results, effectively alleviating the imbalance. We further develop an optimization framework tailored to this goal, achieving considerable improvements in detecting objects of different sizes. Theoretically, we provide evidence supporting the validity of our new metrics and present the generalization analysis of SOD. Extensive experiments demonstrate the effectiveness of our method. The code is available at https://github.com/Ferry-Li/SI-SOD.

Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection

TL;DR

) by partitioning images into foreground frames and a background frame, effectively removing the weight

. It also proposes a generic size-invariant optimization objective $\mathcal{L}_{\mathsf{SI}}(f)=\sum_{k=1}^K \ell(f_k^{fore}) + \alpha \ell(f_{K+1}^{back})$ and provides a generalization bound showing favorable scaling with sample size

and image size

. Empirically, SI-SOD yields consistent improvements across benchmarks (MSOD, DUTS-TE) for multiple backbones, notably enhancing small-object and multi-object detection while maintaining competitive traditional metrics; code is available at the authors' GitHub repository.

Abstract

Paper Structure (41 sections, 9 theorems, 67 equations, 18 figures, 11 tables)

This paper contains 41 sections, 9 theorems, 67 equations, 18 figures, 11 tables.

Introduction
Related Work
A Novel Size-invariant Evaluation Protocol
Revisiting Current SOD Evaluation Metrics
Principles of Size-Invariant Evaluation
Size-Invariant $\mathsf{MAE}$
Size-Invariant Composite Metrics
How to Practically Pursue Size-Invariance?
A Generic Size-Invariant Optimization Goal
Generalization Bound
Experiments
Experimental Setups
Overall Performance
Fine-grained Analysis
Performance with Respect to Sizes
...and 26 more sections

Key Result

Proposition 3.3

Given two different predictors $f_{A}$ and $f_{B}$, the following two possible cases suggest that $\mathsf{SI\text{-}MAE}$ is more effective than $\mathsf{MAE}$ during evaluation. Case 1: Assume that there is a single salient object (i.e., $K=1$), with two different results from predictors $f_A$ and

Figures (18)

Figure 1: Statistics on dataset MSOD. \ref{['fig:msod_area']} illustrates the widely existing small salient objects, with Size(%) as the proportion of the size of an object over the whole image.\ref{['fig:msod_num']} reveals that practical SOD scenarios usually involve multiple salient objects.
Figure 2: (c) is the result of backbone EDN EDN, and (d) is the prediction optimized by our approach. (c) detects fewer salient objects, yet enjoys lower $\mathsf{MAE}$ than (d). However, $\mathsf{SI\text{-}MAE}$ can correctly distinguish two detections.
Figure 3: Examples of partitions. In \ref{['fig:object_frame1']}, there is a foreground frame ➀ and a background frame ➁. In \ref{['fig:object_frame2']}, there are five foreground frames from ➀ to ➄, and a background frame ➅.
Figure 4: $\mathsf{SI\text{-}MAE}$ performance on objects with different sizes on two representative datasets, with EDN and PoolNet as backbones.
Figure 5: $\mathsf{SI\text{-}MAE}$ performance with different object numbers on two representative datasets, with EDN and PoolNet as backbones.
...and 13 more figures

Theorems & Definitions (18)

Definition 3.1: Separable Function
Definition 3.2: Composite Function
Proposition 3.3: Informal
Proposition 4.1: Mechanism of SI-SOD
Theorem 4.2: Generalization Bound for SI-SOD
Proposition 2.1: Informal
proof
proof
proof
proof
...and 8 more

Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection

TL;DR

Abstract

Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (18)