MCAQ-YOLO: Morphological Complexity-Aware Quantization for Efficient Object Detection with Curriculum Learning
Yoonjae Seo, Ermal Elbasani, Jaehong Lee
TL;DR
MCAQ-YOLO addresses efficient object detection under resource constraints by introducing morphology-aware tile-wise spatial mixed-precision quantization. It defines a signal-centric morphological complexity score from five descriptors and learns a monotonic mapping to per-tile bit-widths, enabled by a calibration-time bit-map and a CUDA kernel for real-time inference. A curriculum-based quantization-aware training scheme stabilizes optimization and accelerates convergence, yielding substantial gains on a construction-safety dataset ($85.6\%$ $mAP@0.5$ with average $4.2$ bits and $7.63\times$ compression, outperforming uniform 4-bit quantization by $3.5$ points) and consistent improvements on COCO 2017 and Pascal VOC 2012. The work demonstrates that spatial mixed-precision quantization is practically deployable in real-time detectors, with gains correlating to intra-image complexity variation, while highlighting limitations in theoretical grounding, hardware overhead, and dataset dependence.
Abstract
Most neural network quantization methods apply uniform bit precision across spatial regions, disregarding the heterogeneous complexity inherent in visual data. This paper introduces MCAQ-YOLO, a practical framework for tile-wise spatial mixed-precision quantization in real-time object detectors. Morphological complexity--quantified through five complementary metrics (fractal dimension, texture entropy, gradient variance, edge density, and contour complexity)--is proposed as a signal-centric predictor of spatial quantization sensitivity. A calibration-time analysis design enables spatial bit allocation with only 0.3ms inference overhead, achieving 151 FPS throughput. Additionally, a curriculum-based training scheme that progressively increases quantization difficulty is introduced to stabilize optimization and accelerate convergence. On a construction safety equipment dataset exhibiting high morphological variability, MCAQ-YOLO achieves 85.6% mAP@0.5 with an average bit-width of 4.2 bits and a 7.6x compression ratio, outperforming uniform 4-bit quantization by 3.5 percentage points. Cross-dataset evaluation on COCO 2017 (+2.9%) and Pascal VOC 2012 (+2.3%) demonstrates consistent improvements, with performance gains correlating with within-image complexity variation.
