Table of Contents
Fetching ...

AIDOVECL: AI-generated Dataset of Outpainted Vehicles for Eye-level Classification and Localization

Amir Kazemi, Qurat ul ain Fatima, Volodymyr Kindratenko, Christopher Tessum

TL;DR

This work tackles the data-annotation bottleneck for eye-level vehicle classification and localization by introducing AIDOVECL, an automatic dataset built with seed-derived crops outpainted onto larger canvases to generate diverse contexts. It leverages prompt-guided latent-diffusion inpainting, with strict quality controls and a multi-model consensus seed-detection step, to produce high-quality, automatically labeled data. Across multiple detectors, including YOLOv8 and FCOS, AIDOVECL yields consistent improvements (up to 10%) and substantial gains (up to 40%) under distribution shifts, with underrepresented classes like vans showing notable true-positive increases. The authors also provide a full dataset and modular pipeline to enable replication and extension, enabling rapid, scalable generation of fine-grained, context-rich vehicle datasets for autonomous driving and urban analysis.

Abstract

Image labeling is a critical bottleneck in the development of computer vision technologies, often constraining the potential of machine learning models due to the time-intensive nature of manual annotations. This work introduces a novel approach that leverages outpainting to mitigate the problem of annotated data scarcity by generating artificial contexts and annotations, significantly reducing manual labeling efforts. We apply this technique to a particularly acute challenge in autonomous driving, urban planning, and environmental monitoring: the lack of diverse, eye-level vehicle images in desired classes. Our dataset comprises AI-generated vehicle images obtained by detecting and cropping vehicles from manually selected seed images, which are then outpainted onto larger canvases to simulate varied real-world conditions. The outpainted images include detailed annotations, providing high-quality ground truth data. Advanced outpainting techniques and image quality assessments ensure visual fidelity and contextual relevance. Ablation results show that incorporating AIDOVECL improves overall detection performance by up to 10%, and delivers gains of up to 40% in settings with greater diversity of context, object scale, and placement, with underrepresented classes achieving up to 50% higher true positives. AIDOVECL enhances vehicle detection by augmenting real training data and supporting evaluation across diverse scenarios. By demonstrating outpainting as an automatic annotation paradigm, it offers a practical and versatile solution for building fine-grained datasets with reduced labeling effort across multiple machine learning domains. The code and links to datasets used in this study are available for further research and replication at https://github.com/amir-kazemi/aidovecl .

AIDOVECL: AI-generated Dataset of Outpainted Vehicles for Eye-level Classification and Localization

TL;DR

This work tackles the data-annotation bottleneck for eye-level vehicle classification and localization by introducing AIDOVECL, an automatic dataset built with seed-derived crops outpainted onto larger canvases to generate diverse contexts. It leverages prompt-guided latent-diffusion inpainting, with strict quality controls and a multi-model consensus seed-detection step, to produce high-quality, automatically labeled data. Across multiple detectors, including YOLOv8 and FCOS, AIDOVECL yields consistent improvements (up to 10%) and substantial gains (up to 40%) under distribution shifts, with underrepresented classes like vans showing notable true-positive increases. The authors also provide a full dataset and modular pipeline to enable replication and extension, enabling rapid, scalable generation of fine-grained, context-rich vehicle datasets for autonomous driving and urban analysis.

Abstract

Image labeling is a critical bottleneck in the development of computer vision technologies, often constraining the potential of machine learning models due to the time-intensive nature of manual annotations. This work introduces a novel approach that leverages outpainting to mitigate the problem of annotated data scarcity by generating artificial contexts and annotations, significantly reducing manual labeling efforts. We apply this technique to a particularly acute challenge in autonomous driving, urban planning, and environmental monitoring: the lack of diverse, eye-level vehicle images in desired classes. Our dataset comprises AI-generated vehicle images obtained by detecting and cropping vehicles from manually selected seed images, which are then outpainted onto larger canvases to simulate varied real-world conditions. The outpainted images include detailed annotations, providing high-quality ground truth data. Advanced outpainting techniques and image quality assessments ensure visual fidelity and contextual relevance. Ablation results show that incorporating AIDOVECL improves overall detection performance by up to 10%, and delivers gains of up to 40% in settings with greater diversity of context, object scale, and placement, with underrepresented classes achieving up to 50% higher true positives. AIDOVECL enhances vehicle detection by augmenting real training data and supporting evaluation across diverse scenarios. By demonstrating outpainting as an automatic annotation paradigm, it offers a practical and versatile solution for building fine-grained datasets with reduced labeling effort across multiple machine learning domains. The code and links to datasets used in this study are available for further research and replication at https://github.com/amir-kazemi/aidovecl .

Paper Structure

This paper contains 34 sections, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Vehicles from authentic images are randomly scaled and positioned on a canvas, then outpainted using structured prompts and blurred masks.
  • Figure 2: Consensus level of every detection model is defined as the percentage of affirmative votes received from other models, averaged across all objects in that class.
  • Figure 3: Outpainted images of various vehicle classes satisfying criteria for BRISQUE, CLIP-IQA, and QualiCLIP.
  • Figure 4: KDE plots for visual quality scores (a–c) and semantic similarities of prompt and captions (d–f). Larger values generally indicate better quality and semantics, except for BRISQUE where smaller values are better.
  • Figure 5: Confusion matrices of the YOLOv8 model tested on the real dataset under different augmentation settings, including dataset augmentation with AIDOVECL and on-the-fly MixUp/Mosaic.
  • ...and 5 more figures