CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection

Woojin Shin; Donghwa Kang; Byeongyun Park; Brent Byunghoon Kang; Jinkyu Lee; Hyeongboo Baek

CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection

Woojin Shin, Donghwa Kang, Byeongyun Park, Brent Byunghoon Kang, Jinkyu Lee, Hyeongboo Baek

TL;DR

CF-DETR tackles the real-time multi-task DETR challenge in autonomous driving by introducing a coarse-to-fine Transformer architecture and a dedicated NPFP** scheduler. It leverages four mechanisms—coarse-to-fine inference, selective region refinement, multi-level batching, and batch-enabled scheduling—to dynamically adjust patch granularity and attention scope while preserving safety-critical deadlines. Key contributions include the architectural CF-DETR, the NPFP** scheduling framework, and a robust evaluation showing improved critical and overall mAP with competitive throughput, including a practical emergency braking case. The approach delivers a practical, transformer-aware solution that meets real-time constraints for safety-critical AV perception without reliance on additional sensors.

Abstract

Detection Transformers (DETR) are increasingly adopted in autonomous vehicle (AV) perception systems due to their superior accuracy over convolutional networks. However, concurrently executing multiple DETR tasks presents significant challenges in meeting firm real-time deadlines (R1) and high accuracy requirements (R2), particularly for safety-critical objects, while navigating the inherent latency-accuracy trade-off under resource constraints. Existing real-time DNN scheduling approaches often treat models generically, failing to leverage Transformer-specific properties for efficient resource allocation. To address these challenges, we propose CF-DETR, an integrated system featuring a novel coarse-to-fine Transformer architecture and a dedicated real-time scheduling framework NPFP**. CF-DETR employs three key strategies (A1: coarse-to-fine inference, A2: selective fine inference, A3: multi-level batch inference) that exploit Transformer properties to dynamically adjust patch granularity and attention scope based on object criticality, aiming to satisfy R2. The NPFP** scheduling framework (A4) orchestrates these adaptive mechanisms A1-A3. It partitions each DETR task into a safety-critical coarse subtask for guaranteed critical object detection within its deadline (ensuring R1), and an optional fine subtask for enhanced overall accuracy (R2), while managing individual and batched execution. Our extensive evaluations on server, GPU-enabled embedded platforms, and actual AV platforms demonstrate that CF-DETR, under an NPFP** policy, successfully meets strict timing guarantees for critical operations and achieves significantly higher overall and critical object detection accuracy compared to existing baselines across diverse AV workloads.

CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection

TL;DR

Abstract

CF-DETR: Coarse-to-Fine Transformer for Real-Time Object Detection

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (3)