Enhancing Quantum-ready QUBO-based Suppression for Object Detection with Appearance and Confidence Features
Keiichiro Yamamura, Toru Mitsutake, Hiroki Ishikura, Daiki Kusuhara, Akihiro Yoshida, Katsuki Fujisawa
TL;DR
The paper tackles the limitation of greedy NMS in crowded scenes by refining QUBO-based suppression to distinguish occluded objects from redundant predictions. It introduces QAQS and QAQS-C, which incorporate an SSIM-based appearance feature and the product of confidences into the QUBO coefficient matrix, with a faster SSIM implementation to maintain throughput. Empirical results on COCO and CrowdHuman show consistent improvements in mAP and mAR, especially in crowded scenarios, while preserving reasonable runtime using a classical solver and GPU-accelerated SSIM. The work also demonstrates a quantum-ready software design and outlines strategies for accelerating suppression via sparsity, paving the way for future quantum hardware benefits.
Abstract
Quadratic Unconstrained Binary Optimization (QUBO)-based suppression in object detection is known to have superiority to conventional Non-Maximum Suppression (NMS), especially for crowded scenes where NMS possibly suppresses the (partially-) occluded true positives with low confidence scores. Whereas existing QUBO formulations are less likely to miss occluded objects than NMS, there is room for improvement because existing QUBO formulations naively consider confidence scores and pairwise scores based on spatial overlap between predictions. This study proposes new QUBO formulations that aim to distinguish whether the overlap between predictions is due to the occlusion of objects or due to redundancy in prediction, i.e., multiple predictions for a single object. The proposed QUBO formulation integrates two features into the pairwise score of the existing QUBO formulation: i) the appearance feature calculated by the image similarity metric and ii) the product of confidence scores. These features are derived from the hypothesis that redundant predictions share a similar appearance feature and (partially-) occluded objects have low confidence scores, respectively. The proposed methods demonstrate significant advancement over state-of-the-art QUBO-based suppression without a notable increase in runtime, achieving up to 4.54 points improvement in mAP and 9.89 points gain in mAR.
