YOLO-NAS-Bench: A Surrogate Benchmark with Self-Evolving Predictors for YOLO Architecture Search

Zhe Li; Xiaoyu Ding; Jiaxin Zheng; Yongtao Wang

YOLO-NAS-Bench: A Surrogate Benchmark with Self-Evolving Predictors for YOLO Architecture Search

Zhe Li, Xiaoyu Ding, Jiaxin Zheng, Yongtao Wang

TL;DR

YOLO-NAS-Bench is introduced, the first surrogate benchmark tailored to YOLO-style detectors, and a Self-Evolving Mechanism is proposed that progressively aligns the predictor's training distribution with the high-performance frontier, by using the predictor itself to discover and evaluate informative architectures in each iteration.

Abstract

Neural Architecture Search (NAS) for object detection is severely bottlenecked by high evaluation cost, as fully training each candidate YOLO architecture on COCO demands days of GPU time. Meanwhile, existing NAS benchmarks largely target image classification, leaving the detection community without a comparable benchmark for NAS evaluation. To address this gap, we introduce YOLO-NAS-Bench, the first surrogate benchmark tailored to YOLO-style detectors. YOLO-NAS-Bench defines a search space spanning channel width, block depth, and operator type across both backbone and neck, covering the core modules of YOLOv8 through YOLO12. We sample 1,000 architectures via random, stratified, and Latin Hypercube strategies, train them on COCO-mini, and build a LightGBM surrogate predictor. To sharpen the predictor in the high-performance regime most relevant to NAS, we propose a Self-Evolving Mechanism that progressively aligns the predictor's training distribution with the high-performance frontier, by using the predictor itself to discover and evaluate informative architectures in each iteration. This method grows the pool to 1,500 architectures and raises the ensemble predictor's R2 from 0.770 to 0.815 and Sparse Kendall Tau from 0.694 to 0.752, demonstrating strong predictive accuracy and ranking consistency. Using the final predictor as the fitness function for evolutionary search, we discover architectures that surpass all official YOLOv8-YOLO12 baselines at comparable latency on COCO-mini, confirming the predictor's discriminative power for top-performing detection architectures.

YOLO-NAS-Bench: A Surrogate Benchmark with Self-Evolving Predictors for YOLO Architecture Search

TL;DR

Abstract

Paper Structure (21 sections, 4 figures, 6 tables)

This paper contains 21 sections, 4 figures, 6 tables.

Introduction
Related Work
Real-Time Object Detectors.
NAS Benchmarks.
NAS for Object Detection.
Method
Benchmark Construction
Search Space Design.
Architecture Sampling.
Architecture Encoding and Surrogate Predictor.
Self-Evolving Predictor
Experiments
Experimental Setup
Main Results
Predictor Quality.
...and 6 more sections

Figures (4)

Figure 1: Latency vs. mAP on COCO-mini. Architectures discovered by our predictor-guided EA search consistently Pareto-dominate all official YOLO baselines (v8--v12) across the full latency spectrum, demonstrating the strong discriminative power of the YOLO-NAS-Bench surrogate predictor.
Figure 2: Overview of the YOLO-NAS-Bench pipeline. (1) A YOLO-style search space spanning channel, depth, and operator dimensions across both backbone and neck is defined. (2) 1,000 architectures are sampled via three complementary strategies and trained on COCO-mini. (3) A LightGBM predictor is trained on the resulting {architecture, mAP} pairs. (4) The Self-Evolving Predictor iteratively expands the pool with high-value architectures discovered by evolutionary search, and retrains the predictor over 10 rounds, yielding an ensemble of 10 LightGBM models over 1,500 architectures.
Figure 3: Self-Evolving Predictor. Starting from 1,000 architectures, the loop partitions latency into 10 buckets. For each bucket, EA search selects the top 5 architectures using predicted mAP as fitness and real latency as constraint. In each round, these 50 new architectures are trained on COCO-mini, merged into the pool, and the predictor is retrained. After 10 rounds the pool grows to 1,500 architectures, yielding an enhanced benchmark enriched in the high-performance regime most relevant to NAS.
Figure 4: Predicted vs. ground-truth mAP on the full 1,500-architecture pool. Each point is an architecture colored by its sampling source. Points cluster closely around the $y{=}x$ diagonal, confirming strong agreement between the ensemble predictor and ground-truth performance.

YOLO-NAS-Bench: A Surrogate Benchmark with Self-Evolving Predictors for YOLO Architecture Search

TL;DR

Abstract

YOLO-NAS-Bench: A Surrogate Benchmark with Self-Evolving Predictors for YOLO Architecture Search

Authors

TL;DR

Abstract

Table of Contents

Figures (4)