What makes for effective detection proposals?
Jan Hosang, Rodrigo Benenson, Piotr Dollár, Bernt Schiele
TL;DR
This study interrogates why detection proposals improve object detection by systematically evaluating 12 proposal methods (plus 4 baselines) across PASCAL, ImageNet, and COCO and multiple detectors. It introduces Average Recall ($AR$) to capture both recall and localisation, and shows $AR$ correlates strongly with detector performance for LM-LLDA, R-CNN, and Fast R-CNN. The results reveal that localisation accuracy is as critical as recall, with top methods (e.g., MCG, SelectiveSearch, EdgeBoxes, Geodesic, Rigor) delivering robust detection performance while still benefiting from AR-guided tuning. The work provides practical guidance for selecting and tuning proposal methods and highlights potential upper bounds via oracle-like refinements, suggesting AR-driven optimization and closer integration with detectors for future gains.
Abstract
Current top performing object detectors employ detection proposals to guide the search for objects, thereby avoiding exhaustive sliding window search across images. Despite the popularity and widespread use of detection proposals, it is unclear which trade-offs are made when using them during object detection. We provide an in-depth analysis of twelve proposal methods along with four baselines regarding proposal repeatability, ground truth annotation recall on PASCAL, ImageNet, and MS COCO, and their impact on DPM, R-CNN, and Fast R-CNN detection performance. Our analysis shows that for object detection improving proposal localisation accuracy is as important as improving recall. We introduce a novel metric, the average recall (AR), which rewards both high recall and good localisation and correlates surprisingly well with detection performance. Our findings show common strengths and weaknesses of existing methods, and provide insights and metrics for selecting and tuning proposal methods.
