Table of Contents
Fetching ...

Photorealistic Inpainting for Perturbation-based Explanations in Ecological Monitoring

Günel Aghakishiyeva, Jiayi Zhou, Saagar Arya, Julian Dale, James David Poling, Holly R. Houliston, Jamie N. Womble, Gregory D. Larsen, David W. Johnston, Brinnae Bent

TL;DR

This work tackles the issue of explainability in ecological monitoring by introducing inpainting-guided, perturbation-based explanations that produce photorealistic, in-distribution edits. The approach combines object-level removals/replacements and background substitutions, powered by YOLOv9 for detection, SAM for mask refinement, and Stable Diffusion for inpainting, with evaluation via detector re-scoring and expert review. Key findings show that removing morphologically distinctive regions significantly disrupts detections, while background alterations largely preserve detections, revealing morphology- and texture-based cues and context dependencies. The method offers actionable insights for model validation and data curation in ecological applications, though it entails computational overhead that motivates strategic use and future efficiency improvements for broader deployment.

Abstract

Ecological monitoring is increasingly automated by vision models, yet opaque predictions limit trust and field adoption. We present an inpainting-guided, perturbation-based explanation technique that produces photorealistic, mask-localized edits that preserve scene context. Unlike masking or blurring, these edits stay in-distribution and reveal which fine-grained morphological cues drive predictions in tasks such as species recognition and trait attribution. We demonstrate the approach on a YOLOv9 detector fine-tuned for harbor seal detection in Glacier Bay drone imagery, using Segment-Anything-Model-refined masks to support two interventions: (i) object removal/replacement (e.g., replacing seals with plausible ice/water or boats) and (ii) background replacement with original animals composited onto new scenes. Explanations are assessed by re-scoring perturbed images (flip rate, confidence drop) and by expert review for ecological plausibility and interpretability. The resulting explanations localize diagnostic structures, avoid deletion artifacts common to traditional perturbations, and yield domain-relevant insights that support expert validation and more trustworthy deployment of AI in ecology.

Photorealistic Inpainting for Perturbation-based Explanations in Ecological Monitoring

TL;DR

This work tackles the issue of explainability in ecological monitoring by introducing inpainting-guided, perturbation-based explanations that produce photorealistic, in-distribution edits. The approach combines object-level removals/replacements and background substitutions, powered by YOLOv9 for detection, SAM for mask refinement, and Stable Diffusion for inpainting, with evaluation via detector re-scoring and expert review. Key findings show that removing morphologically distinctive regions significantly disrupts detections, while background alterations largely preserve detections, revealing morphology- and texture-based cues and context dependencies. The method offers actionable insights for model validation and data curation in ecological applications, though it entails computational overhead that motivates strategic use and future efficiency improvements for broader deployment.

Abstract

Ecological monitoring is increasingly automated by vision models, yet opaque predictions limit trust and field adoption. We present an inpainting-guided, perturbation-based explanation technique that produces photorealistic, mask-localized edits that preserve scene context. Unlike masking or blurring, these edits stay in-distribution and reveal which fine-grained morphological cues drive predictions in tasks such as species recognition and trait attribution. We demonstrate the approach on a YOLOv9 detector fine-tuned for harbor seal detection in Glacier Bay drone imagery, using Segment-Anything-Model-refined masks to support two interventions: (i) object removal/replacement (e.g., replacing seals with plausible ice/water or boats) and (ii) background replacement with original animals composited onto new scenes. Explanations are assessed by re-scoring perturbed images (flip rate, confidence drop) and by expert review for ecological plausibility and interpretability. The resulting explanations localize diagnostic structures, avoid deletion artifacts common to traditional perturbations, and yield domain-relevant insights that support expert validation and more trustworthy deployment of AI in ecology.

Paper Structure

This paper contains 25 sections, 2 equations, 4 figures.

Figures (4)

  • Figure 1: Inpainting-based perturbations for harbor seal detections. (A–D) Examples from the test set. Column 1 shows the original image with YOLOv9 detections, column 2 the SAM mask, and column 3 the perturbed image. (A) Removing a seal and filling with ice eliminates all detections. (B) Removing one seal lowers its detection but raises confidence for a neighbor. (C) Replacing a seal with a boat removes the detection. (D) A boat replacement is misclassified as a seal.
  • Figure 2: Background replacement for harbor seal detections. (A–D) Examples from the test set. Column 1 shows the original image with YOLOv9 detections, column 2 the SAM mask, and column 3 the perturbed image. (A–B) Seals remained detected at high confidence when placed into incongruent backgrounds such as winter landscapes and deserts. (C) Detection was occasionally suppressed in beach contexts. (D) Studio replacements produced spurious detections, with background objects misclassified as seals.
  • Figure 3: Pixel-level sensitivity in harbor seal detections. (A–B) Examples from the test set. Column 1 shows the original image with YOLOv9 detections, column 2 the SAM mask, and column 3 the perturbed image. (A) Removing seals eliminated detections. (B) A visually similar perturbation produced a spurious detection, showing detector sensitivity to pixel-level changes.
  • Figure 4: Runtime distribution for removal and replacement experiments using SAM-based segmentation and bounding box masking.