Photorealistic Inpainting for Perturbation-based Explanations in Ecological Monitoring
Günel Aghakishiyeva, Jiayi Zhou, Saagar Arya, Julian Dale, James David Poling, Holly R. Houliston, Jamie N. Womble, Gregory D. Larsen, David W. Johnston, Brinnae Bent
TL;DR
This work tackles the issue of explainability in ecological monitoring by introducing inpainting-guided, perturbation-based explanations that produce photorealistic, in-distribution edits. The approach combines object-level removals/replacements and background substitutions, powered by YOLOv9 for detection, SAM for mask refinement, and Stable Diffusion for inpainting, with evaluation via detector re-scoring and expert review. Key findings show that removing morphologically distinctive regions significantly disrupts detections, while background alterations largely preserve detections, revealing morphology- and texture-based cues and context dependencies. The method offers actionable insights for model validation and data curation in ecological applications, though it entails computational overhead that motivates strategic use and future efficiency improvements for broader deployment.
Abstract
Ecological monitoring is increasingly automated by vision models, yet opaque predictions limit trust and field adoption. We present an inpainting-guided, perturbation-based explanation technique that produces photorealistic, mask-localized edits that preserve scene context. Unlike masking or blurring, these edits stay in-distribution and reveal which fine-grained morphological cues drive predictions in tasks such as species recognition and trait attribution. We demonstrate the approach on a YOLOv9 detector fine-tuned for harbor seal detection in Glacier Bay drone imagery, using Segment-Anything-Model-refined masks to support two interventions: (i) object removal/replacement (e.g., replacing seals with plausible ice/water or boats) and (ii) background replacement with original animals composited onto new scenes. Explanations are assessed by re-scoring perturbed images (flip rate, confidence drop) and by expert review for ecological plausibility and interpretability. The resulting explanations localize diagnostic structures, avoid deletion artifacts common to traditional perturbations, and yield domain-relevant insights that support expert validation and more trustworthy deployment of AI in ecology.
