Table of Contents
Fetching ...

Robust Salient Object Detection on Compressed Images Using Convolutional Neural Networks

Guibiao Liao, Wei Gao

TL;DR

This paper addresses the robustness challenges of CNN-based salient object detection when inputs are compressed images. It introduces Hybrid Prior Learning (HPL) and Location-aware Graph Reasoning (LGR) to enhance robust feature representation, training a Hybrid Prior Generator on clean images and a Target Network on compressed images, with three guiding strategies (RPL, LPL, SML) and a graph-based reasoning module. Extensive experiments on approximately 2.64 million images show that conventional SOD models suffer under compression, while the proposed approach achieves superior robustness across degradation levels and maintains competitive accuracy on clean data. The work provides large-scale CI SOD benchmarks, analytical insights into CI distortions, and a practical baseline that can guide future research on robustness for CNN-based SOD methods.

Abstract

Salient object detection (SOD) has achieved substantial progress in recent years. In practical scenarios, compressed images (CI) serve as the primary medium for data transmission and storage. However, scant attention has been directed towards SOD for compressed images using convolutional neural networks (CNNs). In this paper, we are dedicated to strictly benchmarking and analyzing CNN-based salient object detection on compressed images. To comprehensively study this issue, we meticulously establish various CI SOD datasets from existing public SOD datasets. Subsequently, we investigate representative CNN-based SOD methods, assessing their robustness on compressed images (approximately 2.64 million images). Importantly, our evaluation results reveal two key findings: 1) current state-of-the-art CNN-based SOD models, while excelling on clean images, exhibit significant performance bottlenecks when applied to compressed images. 2) The principal factors influencing the robustness of CI SOD are rooted in the characteristics of compressed images and the limitations in saliency feature learning. Based on these observations, we propose a simple yet promising baseline framework that focuses on robust feature representation learning to achieve robust CNN-based CI SOD. Extensive experiments demonstrate the effectiveness of our approach, showcasing markedly improved robustness across various levels of image degradation, while maintaining competitive accuracy on clean data. We hope that our benchmarking efforts, analytical insights, and proposed techniques will contribute to a more comprehensive understanding of the robustness of CNN-based SOD algorithms, inspiring future research in the community.

Robust Salient Object Detection on Compressed Images Using Convolutional Neural Networks

TL;DR

This paper addresses the robustness challenges of CNN-based salient object detection when inputs are compressed images. It introduces Hybrid Prior Learning (HPL) and Location-aware Graph Reasoning (LGR) to enhance robust feature representation, training a Hybrid Prior Generator on clean images and a Target Network on compressed images, with three guiding strategies (RPL, LPL, SML) and a graph-based reasoning module. Extensive experiments on approximately 2.64 million images show that conventional SOD models suffer under compression, while the proposed approach achieves superior robustness across degradation levels and maintains competitive accuracy on clean data. The work provides large-scale CI SOD benchmarks, analytical insights into CI distortions, and a practical baseline that can guide future research on robustness for CNN-based SOD methods.

Abstract

Salient object detection (SOD) has achieved substantial progress in recent years. In practical scenarios, compressed images (CI) serve as the primary medium for data transmission and storage. However, scant attention has been directed towards SOD for compressed images using convolutional neural networks (CNNs). In this paper, we are dedicated to strictly benchmarking and analyzing CNN-based salient object detection on compressed images. To comprehensively study this issue, we meticulously establish various CI SOD datasets from existing public SOD datasets. Subsequently, we investigate representative CNN-based SOD methods, assessing their robustness on compressed images (approximately 2.64 million images). Importantly, our evaluation results reveal two key findings: 1) current state-of-the-art CNN-based SOD models, while excelling on clean images, exhibit significant performance bottlenecks when applied to compressed images. 2) The principal factors influencing the robustness of CI SOD are rooted in the characteristics of compressed images and the limitations in saliency feature learning. Based on these observations, we propose a simple yet promising baseline framework that focuses on robust feature representation learning to achieve robust CNN-based CI SOD. Extensive experiments demonstrate the effectiveness of our approach, showcasing markedly improved robustness across various levels of image degradation, while maintaining competitive accuracy on clean data. We hope that our benchmarking efforts, analytical insights, and proposed techniques will contribute to a more comprehensive understanding of the robustness of CNN-based SOD algorithms, inspiring future research in the community.
Paper Structure (20 sections, 10 equations, 6 figures, 7 tables)

This paper contains 20 sections, 10 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Performances of representative CNN-based SOD methods on different datasets, including the original test results from models trained on the original clean image dataset (i.e., Original), and the compressed test results from models retrained on corresponding compressed image datasets (i.e., QP22, QP27, QP32, QP37, QP42). Here, QP means the quantization parameter and higher QP represents severer degradation in images. From the above benchmark results, it can be seen that 1) current SOTA models suffer from large performance bottlenecks on compressing images, although they have previously achieved great performance on clean images. 2) As the compression distortion increases (i.e., from QP22 to QP42), the performance of the model gradually decreases.
  • Figure 2: Visualization for the hierarchical feature maps of the feature extractor and predicted saliency results for representative EDN wu2022edn. It can be seen that the salient regions of the original clean image are more highlighted and accurate saliency results can be achieved. In contrast, the broken structure and blurring characteristics of compressed images pose great difficulties for contextual understanding and salient region detection, leading to poor results.
  • Figure 3: The overall architecture of our proposed approach. (a) Illustration of our framework. Particularly, the Hybrid Prior Generator network is only adopted to train the target network during training, and it is omitted during inference. (b) The overall architecture of the Hybrid Prior Generator network and Target Network, both of which have the same structure.
  • Figure 4: Detailed architecture of connection module (CM).
  • Figure 5: Visual results of setting 1, including our proposed approach and other SOTA methods. From top to bottom, the degradation effect of images gradually increases (i.e., QP22, QP27, QP32, QP37, and QP42), resulting in increased difficulty in detection. In general, it can be observed that our approach can generate more robust and accurate saliency results under different degradation levels.
  • ...and 1 more figures