Table of Contents
Fetching ...

Denoised Diffusion for Object-Focused Image Augmentation

Nisha Pillai, Aditi Virupakshaiah, Harrison W. Smith, Amanda J. Ashworth, Prasanna Gowda, Phillip R. Owens, Adam R. Rivers, Bindu Nanduri, Mahalingam Ramkumar

TL;DR

This paper tackles the challenge of limited, domain-specific data for UAV-based animal health monitoring by introducing an object-focused augmentation framework that combines segmentation, scene re-composition, and diffusion-based synthesis to create realistic, occlusion-aware animal scenes. The method uses CVAT for bounding-box annotation, SAM for precise animal masks, Albumentations for pose and lighting variations, and a denoised diffusion probabilistic model to generate synthetic animals that are composited into diverse backgrounds. Experimental results show that models trained on the augmented dataset outperform strong transfer-learning baselines on animal detection tasks, demonstrating the practical potential of domain-specific data generation in data-scarce agricultural settings. While effective, the study is limited by its narrow dataset (68 Angus-cross cattle in a single paddock), and future work should broaden species and farm conditions, as well as explore multimodal generative approaches for richer context-aware augmentations.

Abstract

Modern agricultural operations increasingly rely on integrated monitoring systems that combine multiple data sources for farm optimization. Aerial drone-based animal health monitoring serves as a key component but faces limited data availability, compounded by scene-specific issues such as small, occluded, or partially visible animals. Transfer learning approaches often fail to address this limitation due to the unavailability of large datasets that reflect specific farm conditions, including variations in animal breeds, environments, and behaviors. Therefore, there is a need for developing a problem-specific, animal-focused data augmentation strategy tailored to these unique challenges. To address this gap, we propose an object-focused data augmentation framework designed explicitly for animal health monitoring in constrained data settings. Our approach segments animals from backgrounds and augments them through transformations and diffusion-based synthesis to create realistic, diverse scenes that enhance animal detection and monitoring performance. Our initial experiments demonstrate that our augmented dataset yields superior performance compared to our baseline models on the animal detection task. By generating domain-specific data, our method empowers real-time animal health monitoring solutions even in data-scarce scenarios, bridging the gap between limited data and practical applicability.

Denoised Diffusion for Object-Focused Image Augmentation

TL;DR

This paper tackles the challenge of limited, domain-specific data for UAV-based animal health monitoring by introducing an object-focused augmentation framework that combines segmentation, scene re-composition, and diffusion-based synthesis to create realistic, occlusion-aware animal scenes. The method uses CVAT for bounding-box annotation, SAM for precise animal masks, Albumentations for pose and lighting variations, and a denoised diffusion probabilistic model to generate synthetic animals that are composited into diverse backgrounds. Experimental results show that models trained on the augmented dataset outperform strong transfer-learning baselines on animal detection tasks, demonstrating the practical potential of domain-specific data generation in data-scarce agricultural settings. While effective, the study is limited by its narrow dataset (68 Angus-cross cattle in a single paddock), and future work should broaden species and farm conditions, as well as explore multimodal generative approaches for richer context-aware augmentations.

Abstract

Modern agricultural operations increasingly rely on integrated monitoring systems that combine multiple data sources for farm optimization. Aerial drone-based animal health monitoring serves as a key component but faces limited data availability, compounded by scene-specific issues such as small, occluded, or partially visible animals. Transfer learning approaches often fail to address this limitation due to the unavailability of large datasets that reflect specific farm conditions, including variations in animal breeds, environments, and behaviors. Therefore, there is a need for developing a problem-specific, animal-focused data augmentation strategy tailored to these unique challenges. To address this gap, we propose an object-focused data augmentation framework designed explicitly for animal health monitoring in constrained data settings. Our approach segments animals from backgrounds and augments them through transformations and diffusion-based synthesis to create realistic, diverse scenes that enhance animal detection and monitoring performance. Our initial experiments demonstrate that our augmented dataset yields superior performance compared to our baseline models on the animal detection task. By generating domain-specific data, our method empowers real-time animal health monitoring solutions even in data-scarce scenarios, bridging the gap between limited data and practical applicability.

Paper Structure

This paper contains 12 sections, 2 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: The object-focused image augmentation approach creates high-quality synthetic images by extracting animals from backgrounds using bounding box annotations, generating diverse scene backgrounds, creating precise animal masks through segmentation, applying lighting and position variations, and using generative models to seamlessly integrate regenerated animals into synthetic scenes.
  • Figure 2: Bounding box annotations using Computer Vision Annotation Tool (CVAT) involve selecting each animal in the image with a rectangular box. This process provides the XY coordinates of the bounding boxes, accurately defining the spatial location of each animal within the image.
  • Figure 3: The segmentation results of animal images that were obtained using SAM v2.1, leveraging object detection weights pretrained on the COCO dataset.
  • Figure 4: The results of our augmentations effectively replicate real-world scenarios, including variations in weather, lighting, and animal positioning.
  • Figure 5: Examples of synthetically generated animal images using denoised diffusion probabilistic models. The animals display variations in orientation, shape, and partial occlusions, effectively simulating a realistic illusion of occlusion.
  • ...and 1 more figures