Table of Contents
Fetching ...

What makes for effective detection proposals?

Jan Hosang, Rodrigo Benenson, Piotr Dollár, Bernt Schiele

TL;DR

This study interrogates why detection proposals improve object detection by systematically evaluating 12 proposal methods (plus 4 baselines) across PASCAL, ImageNet, and COCO and multiple detectors. It introduces Average Recall ($AR$) to capture both recall and localisation, and shows $AR$ correlates strongly with detector performance for LM-LLDA, R-CNN, and Fast R-CNN. The results reveal that localisation accuracy is as critical as recall, with top methods (e.g., MCG, SelectiveSearch, EdgeBoxes, Geodesic, Rigor) delivering robust detection performance while still benefiting from AR-guided tuning. The work provides practical guidance for selecting and tuning proposal methods and highlights potential upper bounds via oracle-like refinements, suggesting AR-driven optimization and closer integration with detectors for future gains.

Abstract

Current top performing object detectors employ detection proposals to guide the search for objects, thereby avoiding exhaustive sliding window search across images. Despite the popularity and widespread use of detection proposals, it is unclear which trade-offs are made when using them during object detection. We provide an in-depth analysis of twelve proposal methods along with four baselines regarding proposal repeatability, ground truth annotation recall on PASCAL, ImageNet, and MS COCO, and their impact on DPM, R-CNN, and Fast R-CNN detection performance. Our analysis shows that for object detection improving proposal localisation accuracy is as important as improving recall. We introduce a novel metric, the average recall (AR), which rewards both high recall and good localisation and correlates surprisingly well with detection performance. Our findings show common strengths and weaknesses of existing methods, and provide insights and metrics for selecting and tuning proposal methods.

What makes for effective detection proposals?

TL;DR

This study interrogates why detection proposals improve object detection by systematically evaluating 12 proposal methods (plus 4 baselines) across PASCAL, ImageNet, and COCO and multiple detectors. It introduces Average Recall () to capture both recall and localisation, and shows correlates strongly with detector performance for LM-LLDA, R-CNN, and Fast R-CNN. The results reveal that localisation accuracy is as critical as recall, with top methods (e.g., MCG, SelectiveSearch, EdgeBoxes, Geodesic, Rigor) delivering robust detection performance while still benefiting from AR-guided tuning. The work provides practical guidance for selecting and tuning proposal methods and highlights potential upper bounds via oracle-like refinements, suggesting AR-driven optimization and closer integration with detectors for future gains.

Abstract

Current top performing object detectors employ detection proposals to guide the search for objects, thereby avoiding exhaustive sliding window search across images. Despite the popularity and widespread use of detection proposals, it is unclear which trade-offs are made when using them during object detection. We provide an in-depth analysis of twelve proposal methods along with four baselines regarding proposal repeatability, ground truth annotation recall on PASCAL, ImageNet, and MS COCO, and their impact on DPM, R-CNN, and Fast R-CNN detection performance. Our analysis shows that for object detection improving proposal localisation accuracy is as important as improving recall. We introduce a novel metric, the average recall (AR), which rewards both high recall and good localisation and correlates surprisingly well with detection performance. Our findings show common strengths and weaknesses of existing methods, and provide insights and metrics for selecting and tuning proposal methods.

Paper Structure

This paper contains 23 sections, 1 equation, 15 figures, 5 tables.

Figures (15)

  • Figure 1: What makes object detection proposals effective?
  • Figure 2: Examples of rotation perturbation. (a) shows the largest rectangle with the same aspect as the original image that can fit into the image under a $20{}^{\circ}$ rotation, and (b) the resulting crop. All other rotations are cropped to the same dimensions, e.g. the $-5{}^{\circ}$ rotation in (c) to the crop in (d).
  • Figure 3: Illustration of the perturbation ranges used for the repeatability experiments.
  • Figure 4: Example of the image perturbations considered. Top to bottom, left to right: original, blur, illumination, JPEG artefact, rotation, scale perturbations, and "salt and pepper" noise.
  • Figure 5: Repeatability results under various perturbations.
  • ...and 10 more figures