Denoised Diffusion for Object-Focused Image Augmentation
Nisha Pillai, Aditi Virupakshaiah, Harrison W. Smith, Amanda J. Ashworth, Prasanna Gowda, Phillip R. Owens, Adam R. Rivers, Bindu Nanduri, Mahalingam Ramkumar
TL;DR
This paper tackles the challenge of limited, domain-specific data for UAV-based animal health monitoring by introducing an object-focused augmentation framework that combines segmentation, scene re-composition, and diffusion-based synthesis to create realistic, occlusion-aware animal scenes. The method uses CVAT for bounding-box annotation, SAM for precise animal masks, Albumentations for pose and lighting variations, and a denoised diffusion probabilistic model to generate synthetic animals that are composited into diverse backgrounds. Experimental results show that models trained on the augmented dataset outperform strong transfer-learning baselines on animal detection tasks, demonstrating the practical potential of domain-specific data generation in data-scarce agricultural settings. While effective, the study is limited by its narrow dataset (68 Angus-cross cattle in a single paddock), and future work should broaden species and farm conditions, as well as explore multimodal generative approaches for richer context-aware augmentations.
Abstract
Modern agricultural operations increasingly rely on integrated monitoring systems that combine multiple data sources for farm optimization. Aerial drone-based animal health monitoring serves as a key component but faces limited data availability, compounded by scene-specific issues such as small, occluded, or partially visible animals. Transfer learning approaches often fail to address this limitation due to the unavailability of large datasets that reflect specific farm conditions, including variations in animal breeds, environments, and behaviors. Therefore, there is a need for developing a problem-specific, animal-focused data augmentation strategy tailored to these unique challenges. To address this gap, we propose an object-focused data augmentation framework designed explicitly for animal health monitoring in constrained data settings. Our approach segments animals from backgrounds and augments them through transformations and diffusion-based synthesis to create realistic, diverse scenes that enhance animal detection and monitoring performance. Our initial experiments demonstrate that our augmented dataset yields superior performance compared to our baseline models on the animal detection task. By generating domain-specific data, our method empowers real-time animal health monitoring solutions even in data-scarce scenarios, bridging the gap between limited data and practical applicability.
