Table of Contents
Fetching ...

Enhancing Quantum-ready QUBO-based Suppression for Object Detection with Appearance and Confidence Features

Keiichiro Yamamura, Toru Mitsutake, Hiroki Ishikura, Daiki Kusuhara, Akihiro Yoshida, Katsuki Fujisawa

TL;DR

The paper tackles the limitation of greedy NMS in crowded scenes by refining QUBO-based suppression to distinguish occluded objects from redundant predictions. It introduces QAQS and QAQS-C, which incorporate an SSIM-based appearance feature and the product of confidences into the QUBO coefficient matrix, with a faster SSIM implementation to maintain throughput. Empirical results on COCO and CrowdHuman show consistent improvements in mAP and mAR, especially in crowded scenarios, while preserving reasonable runtime using a classical solver and GPU-accelerated SSIM. The work also demonstrates a quantum-ready software design and outlines strategies for accelerating suppression via sparsity, paving the way for future quantum hardware benefits.

Abstract

Quadratic Unconstrained Binary Optimization (QUBO)-based suppression in object detection is known to have superiority to conventional Non-Maximum Suppression (NMS), especially for crowded scenes where NMS possibly suppresses the (partially-) occluded true positives with low confidence scores. Whereas existing QUBO formulations are less likely to miss occluded objects than NMS, there is room for improvement because existing QUBO formulations naively consider confidence scores and pairwise scores based on spatial overlap between predictions. This study proposes new QUBO formulations that aim to distinguish whether the overlap between predictions is due to the occlusion of objects or due to redundancy in prediction, i.e., multiple predictions for a single object. The proposed QUBO formulation integrates two features into the pairwise score of the existing QUBO formulation: i) the appearance feature calculated by the image similarity metric and ii) the product of confidence scores. These features are derived from the hypothesis that redundant predictions share a similar appearance feature and (partially-) occluded objects have low confidence scores, respectively. The proposed methods demonstrate significant advancement over state-of-the-art QUBO-based suppression without a notable increase in runtime, achieving up to 4.54 points improvement in mAP and 9.89 points gain in mAR.

Enhancing Quantum-ready QUBO-based Suppression for Object Detection with Appearance and Confidence Features

TL;DR

The paper tackles the limitation of greedy NMS in crowded scenes by refining QUBO-based suppression to distinguish occluded objects from redundant predictions. It introduces QAQS and QAQS-C, which incorporate an SSIM-based appearance feature and the product of confidences into the QUBO coefficient matrix, with a faster SSIM implementation to maintain throughput. Empirical results on COCO and CrowdHuman show consistent improvements in mAP and mAR, especially in crowded scenarios, while preserving reasonable runtime using a classical solver and GPU-accelerated SSIM. The work also demonstrates a quantum-ready software design and outlines strategies for accelerating suppression via sparsity, paving the way for future quantum hardware benefits.

Abstract

Quadratic Unconstrained Binary Optimization (QUBO)-based suppression in object detection is known to have superiority to conventional Non-Maximum Suppression (NMS), especially for crowded scenes where NMS possibly suppresses the (partially-) occluded true positives with low confidence scores. Whereas existing QUBO formulations are less likely to miss occluded objects than NMS, there is room for improvement because existing QUBO formulations naively consider confidence scores and pairwise scores based on spatial overlap between predictions. This study proposes new QUBO formulations that aim to distinguish whether the overlap between predictions is due to the occlusion of objects or due to redundancy in prediction, i.e., multiple predictions for a single object. The proposed QUBO formulation integrates two features into the pairwise score of the existing QUBO formulation: i) the appearance feature calculated by the image similarity metric and ii) the product of confidence scores. These features are derived from the hypothesis that redundant predictions share a similar appearance feature and (partially-) occluded objects have low confidence scores, respectively. The proposed methods demonstrate significant advancement over state-of-the-art QUBO-based suppression without a notable increase in runtime, achieving up to 4.54 points improvement in mAP and 9.89 points gain in mAR.

Paper Structure

This paper contains 21 sections, 8 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Illustration of our divide-and-conquer algorithm for SSIM calculation. First, the matrix of appearance feature ($A$) is recursively divided into four blocks at each recursion step. Second, the SSIM values of each block are calculated in parallel when the size of the divided blocks becomes sufficiently small. Blocks of lower triangular parts are excluded from the computation target because this part can be completed using the symmetry of $A$. Finally, computation results for small blocks are merged.
  • Figure 2: Visualization of Intersection Matrix$I\in\{0,1\}^{n\times n}$. Zero-value elements are colored in purple, and one-value elements are colored in yellow.
  • Figure 3: Visualization of suppressed predictions. Confidence scores of false negatives (\ref{['fig:qualitative']}) and false positives (\ref{['fig:potential_drawback']}) are shown outside of each picture.
  • Figure 4: Breakdown of QUBO-based suppression runtime per image of COCO dataset shown in \ref{['tab:ablation_qubo']}. Results on the CrowdHuman dataset show a similar tendency.
  • Figure 5: Ablation study of SSIM computation. Naive is the sequential computation on the CPU based on the implementation of scikit-image. GPU represents the GPU parallelization. Rec represents the recursive computation using the divide-and-conquer algorithm. Ord shows the reordering of Intersection Matrix to avoid redundant computation.
  • ...and 2 more figures