Table of Contents
Fetching ...

MCAQ-YOLO: Morphological Complexity-Aware Quantization for Efficient Object Detection with Curriculum Learning

Yoonjae Seo, Ermal Elbasani, Jaehong Lee

TL;DR

MCAQ-YOLO addresses efficient object detection under resource constraints by introducing morphology-aware tile-wise spatial mixed-precision quantization. It defines a signal-centric morphological complexity score from five descriptors and learns a monotonic mapping to per-tile bit-widths, enabled by a calibration-time bit-map and a CUDA kernel for real-time inference. A curriculum-based quantization-aware training scheme stabilizes optimization and accelerates convergence, yielding substantial gains on a construction-safety dataset ($85.6\%$ $mAP@0.5$ with average $4.2$ bits and $7.63\times$ compression, outperforming uniform 4-bit quantization by $3.5$ points) and consistent improvements on COCO 2017 and Pascal VOC 2012. The work demonstrates that spatial mixed-precision quantization is practically deployable in real-time detectors, with gains correlating to intra-image complexity variation, while highlighting limitations in theoretical grounding, hardware overhead, and dataset dependence.

Abstract

Most neural network quantization methods apply uniform bit precision across spatial regions, disregarding the heterogeneous complexity inherent in visual data. This paper introduces MCAQ-YOLO, a practical framework for tile-wise spatial mixed-precision quantization in real-time object detectors. Morphological complexity--quantified through five complementary metrics (fractal dimension, texture entropy, gradient variance, edge density, and contour complexity)--is proposed as a signal-centric predictor of spatial quantization sensitivity. A calibration-time analysis design enables spatial bit allocation with only 0.3ms inference overhead, achieving 151 FPS throughput. Additionally, a curriculum-based training scheme that progressively increases quantization difficulty is introduced to stabilize optimization and accelerate convergence. On a construction safety equipment dataset exhibiting high morphological variability, MCAQ-YOLO achieves 85.6% mAP@0.5 with an average bit-width of 4.2 bits and a 7.6x compression ratio, outperforming uniform 4-bit quantization by 3.5 percentage points. Cross-dataset evaluation on COCO 2017 (+2.9%) and Pascal VOC 2012 (+2.3%) demonstrates consistent improvements, with performance gains correlating with within-image complexity variation.

MCAQ-YOLO: Morphological Complexity-Aware Quantization for Efficient Object Detection with Curriculum Learning

TL;DR

MCAQ-YOLO addresses efficient object detection under resource constraints by introducing morphology-aware tile-wise spatial mixed-precision quantization. It defines a signal-centric morphological complexity score from five descriptors and learns a monotonic mapping to per-tile bit-widths, enabled by a calibration-time bit-map and a CUDA kernel for real-time inference. A curriculum-based quantization-aware training scheme stabilizes optimization and accelerates convergence, yielding substantial gains on a construction-safety dataset ( with average bits and compression, outperforming uniform 4-bit quantization by points) and consistent improvements on COCO 2017 and Pascal VOC 2012. The work demonstrates that spatial mixed-precision quantization is practically deployable in real-time detectors, with gains correlating to intra-image complexity variation, while highlighting limitations in theoretical grounding, hardware overhead, and dataset dependence.

Abstract

Most neural network quantization methods apply uniform bit precision across spatial regions, disregarding the heterogeneous complexity inherent in visual data. This paper introduces MCAQ-YOLO, a practical framework for tile-wise spatial mixed-precision quantization in real-time object detectors. Morphological complexity--quantified through five complementary metrics (fractal dimension, texture entropy, gradient variance, edge density, and contour complexity)--is proposed as a signal-centric predictor of spatial quantization sensitivity. A calibration-time analysis design enables spatial bit allocation with only 0.3ms inference overhead, achieving 151 FPS throughput. Additionally, a curriculum-based training scheme that progressively increases quantization difficulty is introduced to stabilize optimization and accelerate convergence. On a construction safety equipment dataset exhibiting high morphological variability, MCAQ-YOLO achieves 85.6% mAP@0.5 with an average bit-width of 4.2 bits and a 7.6x compression ratio, outperforming uniform 4-bit quantization by 3.5 percentage points. Cross-dataset evaluation on COCO 2017 (+2.9%) and Pascal VOC 2012 (+2.3%) demonstrates consistent improvements, with performance gains correlating with within-image complexity variation.

Paper Structure

This paper contains 51 sections, 20 equations, 5 figures, 11 tables, 3 algorithms.

Figures (5)

  • Figure 1: Relationship between morphological complexity and quantization error. (a) Low-complexity regions (e.g., sky, wall) exhibit narrow activation distributions and low quantization error even at 2--4 bits. (b) High-complexity regions (e.g., edges, textures) produce wide activation distributions with sharp transitions, leading to high quantization error at low bit-widths. (c) This connection motivates complexity-aware bit allocation.
  • Figure 2: Overview of MCAQ-YOLO. A hierarchical morphology analyzer produces a spatial complexity map $\mathcal{C}$, which is mapped to tile-wise bit-widths by a learnable function. The quantization module applies mixed precision before the detection head.
  • Figure 3: Three-stage curriculum schedule for quantization-aware training. Stage 1 (warm-up) uses only low-complexity samples with high precision. Stage 2 (transition) introduces mixed-complexity samples with dynamic bit allocation. Stage 3 (full MCAQ) applies aggressive quantization across all samples.
  • Figure 4: Training convergence comparison. Curriculum learning accelerates convergence ($2.5\times$ faster to reach 80% mAP) and reduces validation mAP variance by approximately 60%.
  • Figure 5: Qualitative visualization of MCAQ-YOLO. From left to right: input image, complexity heatmap $\mathcal{C}$, bit allocation map, and detection result. High-complexity regions (person boundaries, textured areas) receive more bits (red), while simple backgrounds use fewer bits (blue).