DODA: Adapting Object Detectors to Dynamic Agricultural Environments in Real-Time with Diffusion

Shuai Xiang; Pieter M. Blok; James Burridge; Haozhou Wang; Wei Guo

DODA: Adapting Object Detectors to Dynamic Agricultural Environments in Real-Time with Diffusion

Shuai Xiang, Pieter M. Blok, James Burridge, Haozhou Wang, Wei Guo

TL;DR

DODA tackles the practical problem of domain shift in agricultural object detection by enabling real-time adaptation to unseen environments without retraining. It decouples domain-specific features from the diffusion model through external domain embeddings and introduces LI2I to tightly control layout, allowing high-quality, domain-consistent synthetic data generation. A two-stage training strategy further enhances data quality by leveraging unlabeled target-domain images. Empirical results on GWHD show consistent AP improvements across domains, with adaptation as fast as 2 minutes on a consumer GPU, highlighting DODA’s potential for practical, scalable deployment in diverse agricultural settings.

Abstract

Object detection has wide applications in agriculture, but domain shifts of diverse environments limit the broader use of the trained models. Existing domain adaptation methods usually require retraining the model for new domains, which is impractical for agricultural applications due to constantly changing environments. In this paper, we propose DODA ($D$iffusion for $O$bject-detection $D$omain Adaptation in $A$griculture), a diffusion-based framework that can adapt the detector to a new domain in just 2 minutes. DODA incorporates external domain embeddings and an improved layout-to-image approach, allowing it to generate high-quality detection data for new domains without additional training. We demonstrate DODA's effectiveness on the Global Wheat Head Detection dataset, where fine-tuning detectors on DODA-generated data yields significant improvements across multiple domains. DODA provides a simple yet powerful solution for agricultural domain adaptation, reducing the barriers for growers to use detection in personalised environments. The code is available at https://github.com/UTokyo-FieldPhenomics-Lab/DODA.

DODA: Adapting Object Detectors to Dynamic Agricultural Environments in Real-Time with Diffusion

TL;DR

Abstract

iffusion for

bject-detection

omain Adaptation in

griculture), a diffusion-based framework that can adapt the detector to a new domain in just 2 minutes. DODA incorporates external domain embeddings and an improved layout-to-image approach, allowing it to generate high-quality detection data for new domains without additional training. We demonstrate DODA's effectiveness on the Global Wheat Head Detection dataset, where fine-tuning detectors on DODA-generated data yields significant improvements across multiple domains. DODA provides a simple yet powerful solution for agricultural domain adaptation, reducing the barriers for growers to use detection in personalised environments. The code is available at https://github.com/UTokyo-FieldPhenomics-Lab/DODA.

Paper Structure (24 sections, 1 theorem, 15 equations, 7 figures, 12 tables, 1 algorithm)

This paper contains 24 sections, 1 theorem, 15 equations, 7 figures, 12 tables, 1 algorithm.

Introduction
Related work
Method
Preliminaries
Problem Formulation
DODA
Incorporating Domain Embedding for Domain-Aware Image Generation
Encoding Layout Images with Vision Model for Simpler and Better Alignment
Unified Optimization Objective for Multi-Conditional Diffusion
Experiment
Main Results
Synthetic Data for Agricultural Object Detection Domain Adaptation
Comparisons with Previous Domain Adaption Methods
Comparisons with Previous Layout-to-image Methods
Ablation study
...and 9 more sections

Key Result

Proposition 1

The solution that minimizes $\mathbb{E}_{t \sim U(0, T)} \mathbb{E}_{\mathbf{x}_0,\mathbf{y}_1,\mathbf{y}_2 \sim p(\mathbf{x}_0,\mathbf{y}_1,\mathbf{y}_2)} \mathbb{E}_{\mathbf{x}_t \sim p(\mathbf{x}_t|\mathbf{x}_0)} [\lambda(t)\| s(\mathbf{x}_t,\mathbf{y}_1,\mathbf{y}_2,t;{\boldsymbol{\theta}}) - \n

Figures (7)

Figure 1: Overview. Left, We propose DODA to generate detection data for diverse agricultural domains, the context of the generated images matches the target domain, and the layout of the generated images aligns with the input layout images. Right, fine-tuning detector on DODA-generated data yields significant improvements across multiple domains.
Figure 2: (a) Visualization of the image features from the GWHD training set. The image features are extracted by MAE he2022masked and different subdomains are distinguishable by color. (b) Features in shallow layers are relatively noisy, while deeper layers progressively form a clearer layout of the image. (c) Pipelines of existing text-based L2I methods and our LI2I method, by simplifying existing methods, LI2I can better retain spatial information and integrate layout features into U-Net.
Figure 3: The architecture of DODA: Blue part is the domain encoder, which extracts domain embeddings from the reference image and guide the style of generated image. The orange part is the layout encoder, which encodes the layout image into a feature map and guide the layout.
Figure 4: Ablations on the number of generated images. For most domains, 200 generated images are sufficient.
Figure 5: Examples of generated images in domain "Ukyoto_1". Left, many generated images have unnatural black edges. Middle, normal generated images, which are better aligned with the input layout (blue bounding boxes) than the images with black edges. Right, some real images used for pre-training also have black edges.
...and 2 more figures

Theorems & Definitions (2)

Proposition 1
proof

DODA: Adapting Object Detectors to Dynamic Agricultural Environments in Real-Time with Diffusion

TL;DR

Abstract

DODA: Adapting Object Detectors to Dynamic Agricultural Environments in Real-Time with Diffusion

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (2)