Table of Contents
Fetching ...

Test-time Correction: An Online 3D Detection System via Visual Prompting

Hanxue Zhang, Zetong Yang, Yanan Sun, Li Chen, Fei Xia, Fatma Güney, Hongyang Li

TL;DR

Test-time Correction (TTC) introduces an online, prompt-based framework that rectifies missed or incorrect 3D detections in autonomous driving without retraining. It augments frozen detectors with an Online Adapter (OA) and a visual prompt buffer to process diverse auxiliary feedback as visual prompts, enabling continuous error rectification across streaming frames. Across nuScenes, TTC yields notable gains in mAP and robustness under limited labels, zero-shot, and domain-shift conditions, while maintaining low latency and modest impact on false positives. The work demonstrates the practicality of online rectification via visual prompts and motivates future exploration of prompt-driven adaptation for safe, post-deployment perception systems.

Abstract

This paper introduces Test-time Correction (TTC), an online 3D detection system designed to rectify test-time errors using various auxiliary feedback, aiming to enhance the safety of deployed autonomous driving systems. Unlike conventional offline 3D detectors that remain fixed during inference, TTC enables immediate online error correction without retraining, allowing autonomous vehicles to adapt to new scenarios and reduce deployment risks. To achieve this, we equip existing 3D detectors with an Online Adapter (OA) module -- a prompt-driven query generator for real-time correction. At the core of OA module are visual prompts: image-based descriptions of objects of interest derived from auxiliary feedback such as mismatches with 2D detections, road descriptions, or user clicks. These visual prompts, collected from risky objects during inference, are maintained in a visual prompt buffer to enable continuous correction in future frames. By leveraging this mechanism, TTC consistently detects risky objects, achieving reliable, adaptive, and versatile driving autonomy. Extensive experiments show that TTC significantly improves instant error rectification over frozen 3D detectors, even under limited labels, zero-shot settings, and adverse conditions. We hope this work inspires future research on post-deployment online rectification systems for autonomous driving.

Test-time Correction: An Online 3D Detection System via Visual Prompting

TL;DR

Test-time Correction (TTC) introduces an online, prompt-based framework that rectifies missed or incorrect 3D detections in autonomous driving without retraining. It augments frozen detectors with an Online Adapter (OA) and a visual prompt buffer to process diverse auxiliary feedback as visual prompts, enabling continuous error rectification across streaming frames. Across nuScenes, TTC yields notable gains in mAP and robustness under limited labels, zero-shot, and domain-shift conditions, while maintaining low latency and modest impact on false positives. The work demonstrates the practicality of online rectification via visual prompts and motivates future exploration of prompt-driven adaptation for safe, post-deployment perception systems.

Abstract

This paper introduces Test-time Correction (TTC), an online 3D detection system designed to rectify test-time errors using various auxiliary feedback, aiming to enhance the safety of deployed autonomous driving systems. Unlike conventional offline 3D detectors that remain fixed during inference, TTC enables immediate online error correction without retraining, allowing autonomous vehicles to adapt to new scenarios and reduce deployment risks. To achieve this, we equip existing 3D detectors with an Online Adapter (OA) module -- a prompt-driven query generator for real-time correction. At the core of OA module are visual prompts: image-based descriptions of objects of interest derived from auxiliary feedback such as mismatches with 2D detections, road descriptions, or user clicks. These visual prompts, collected from risky objects during inference, are maintained in a visual prompt buffer to enable continuous correction in future frames. By leveraging this mechanism, TTC consistently detects risky objects, achieving reliable, adaptive, and versatile driving autonomy. Extensive experiments show that TTC significantly improves instant error rectification over frozen 3D detectors, even under limited labels, zero-shot settings, and adverse conditions. We hope this work inspires future research on post-deployment online rectification systems for autonomous driving.

Paper Structure

This paper contains 27 sections, 25 figures, 15 tables.

Figures (25)

  • Figure 1: Comparison of Error Correction between the conventional offline loop (left) and the new proposed online TTC System (right). Offline error correction pipeline improves model capability during the development stage, which typically requires expensive workloads and computational overhead over days or weeks for model updates. While TTC system additionally enables deployed 3D detectors with on-the-fly error rectification ability.
  • Figure 2: Visual prompts could be arbitrary views of objects, across zones, styles, timestamps, etc.
  • Figure 3: Overall Framework. (Left:) The TTC system centers on a TTC-3D Detector which utilizes visual prompts $\mathcal{P}_v$ from the visual prompt buffer for test-time error rectification. (Right:) The TTC-3D Detector can be based on any traditional detector (BEV or monocular). It supports 3D detection from any combination of four prompts, i.e., object $\mathcal{P}_o$, box $\mathcal{P}_b$, point $\mathcal{P}_p$, and novel visual prompts $\mathcal{P}_v$, arbitrary views of target objects across scenarios and timestamps.
  • Figure 4: Concrete mechanism of visual prompt alignment. This figure illustrates monocular input. When multi-view images are employed, this alignment operation flattens the different views and still generates $N$ peak candidate positions.
  • Figure 5: Qualitative visualization of real-world scenes (collected from YouTube). We visualize the zero-shot 3D detection results in a real-world scenario. In this case, the prompt buffer contains a visual prompt of a deer. Higher responses from the visual prompt alignment are highlighted by brighter colors. As shown, although trained solely on nuScenes, TTC system can still accurately localize "unseen" objects in the input image. Best viewed in color.
  • ...and 20 more figures