Table of Contents
Fetching ...

Exploring Model Quantization in GenAI-based Image Inpainting and Detection of Arable Plants

Sourav Modak, Ahmet Oğuz Saltık, Anthony Stein

TL;DR

The paper addresses data diversity and on-device compute constraints in GenAI-based weed detection. It introduces a progressive Stable Diffusion inpainting data-augmentation pipeline, augmented by post-training quantization (FP16/INT8), to expand training samples by up to 200%. Evaluations on YOLO11(l) and RT-DETR(l) show quantization effects are model- and augmentation-dependent, with inpainting helping recover accuracy under lower precision and enabling edge deployment on Jetson Orin Nano. The work demonstrates practical feasibility and highlights directions to expand quantization strategies and annotation strategies for robust field deployment.

Abstract

Deep learning-based weed control systems often suffer from limited training data diversity and constrained on-board computation, impacting their real-world performance. To overcome these challenges, we propose a framework that leverages Stable Diffusion-based inpainting to augment training data progressively in 10% increments -- up to an additional 200%, thus enhancing both the volume and diversity of samples. Our approach is evaluated on two state-of-the-art object detection models, YOLO11(l) and RT-DETR(l), using the mAP50 metric to assess detection performance. We explore quantization strategies (FP16 and INT8) for both the generative inpainting and detection models to strike a balance between inference speed and accuracy. Deployment of the downstream models on the Jetson Orin Nano demonstrates the practical viability of our framework in resource-constrained environments, ultimately improving detection accuracy and computational efficiency in intelligent weed management systems.

Exploring Model Quantization in GenAI-based Image Inpainting and Detection of Arable Plants

TL;DR

The paper addresses data diversity and on-device compute constraints in GenAI-based weed detection. It introduces a progressive Stable Diffusion inpainting data-augmentation pipeline, augmented by post-training quantization (FP16/INT8), to expand training samples by up to 200%. Evaluations on YOLO11(l) and RT-DETR(l) show quantization effects are model- and augmentation-dependent, with inpainting helping recover accuracy under lower precision and enabling edge deployment on Jetson Orin Nano. The work demonstrates practical feasibility and highlights directions to expand quantization strategies and annotation strategies for robust field deployment.

Abstract

Deep learning-based weed control systems often suffer from limited training data diversity and constrained on-board computation, impacting their real-world performance. To overcome these challenges, we propose a framework that leverages Stable Diffusion-based inpainting to augment training data progressively in 10% increments -- up to an additional 200%, thus enhancing both the volume and diversity of samples. Our approach is evaluated on two state-of-the-art object detection models, YOLO11(l) and RT-DETR(l), using the mAP50 metric to assess detection performance. We explore quantization strategies (FP16 and INT8) for both the generative inpainting and detection models to strike a balance between inference speed and accuracy. Deployment of the downstream models on the Jetson Orin Nano demonstrates the practical viability of our framework in resource-constrained environments, ultimately improving detection accuracy and computational efficiency in intelligent weed management systems.

Paper Structure

This paper contains 18 sections, 11 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Workflow of the quantization process and training of downstream models. Image generation using Stable Diffusion models at different precision levels. Synthetic image augmentation (10%–200% of the original dataset). Two downstream models (YOLO11(l) and RT-DETR(l)) are trained per dataset and later quantized and evaluated on a fixed set of real validation and test data.
  • Figure 2: A representative sample of pseudo-RGB images highlighting sugar beet crops distributed with various weed species on the euro-pallets.
  • Figure 3: Overview of the proposed inpainting pipeline architecture. In (a), real-world images are manually annotated. In (b), the SAM converts user-specified bounding boxes into precise polygon masks, which facilitate object extraction in (c). In (d), a Stable Diffusion model is fine-tuned using the extracted plants and weeds. A novel inpainting method is then applied in (e), using a simple prompt (e.g., “A photo of HoPla Fallopia”) along with dynamically generated inpainting masks to indicate where new plant or weed elements should be inserted, while a fine-tuned object detector prevents overlap with existing objects. Finally, in (f), the same object detector is used to label the inpainted images.
  • Figure 4: A visual representation of image inpainting. The left side shows the original image, while the right side displays the synthetic image generated using a text prompt -- 'A photo of HoPla Convolvulus', where the highlighted regions indicate the inpainted areas.