Table of Contents
Fetching ...

AGRI-Fidelity: Evaluating the Reliability of Listenable Explanations for Poultry Disease Detection

Sindhuja Madabushi, Arda Dogan, Jonathan Liu, Dian Chen, Dong S. Ha, Sook Shin, Sam H. Noh, Jin-Hee Cho

Abstract

Existing XAI metrics measure faithfulness for a single model, ignoring model multiplicity where near-optimal classifiers rely on different or spurious acoustic cues. In noisy farm environments, stationary artifacts such as ventilation noise can produce explanations that are faithful yet unreliable, as masking-based metrics fail to penalize redundant shortcuts. We propose AGRI-Fidelity, a reliability-oriented evaluation framework for listenable explanations in poultry disease detection without spatial ground truth. The method combines cross-model consensus with cyclic temporal permutation to construct null distributions and compute a False Discovery Rate (FDR), suppressing stationary artifacts while preserving time-localized bioacoustic markers. Across real and controlled datasets, AGRI-Fidelity effectively provides reliability-aware discrimination for all data points versus masking-based metrics.

AGRI-Fidelity: Evaluating the Reliability of Listenable Explanations for Poultry Disease Detection

Abstract

Existing XAI metrics measure faithfulness for a single model, ignoring model multiplicity where near-optimal classifiers rely on different or spurious acoustic cues. In noisy farm environments, stationary artifacts such as ventilation noise can produce explanations that are faithful yet unreliable, as masking-based metrics fail to penalize redundant shortcuts. We propose AGRI-Fidelity, a reliability-oriented evaluation framework for listenable explanations in poultry disease detection without spatial ground truth. The method combines cross-model consensus with cyclic temporal permutation to construct null distributions and compute a False Discovery Rate (FDR), suppressing stationary artifacts while preserving time-localized bioacoustic markers. Across real and controlled datasets, AGRI-Fidelity effectively provides reliability-aware discrimination for all data points versus masking-based metrics.
Paper Structure (19 sections, 2 theorems, 17 equations, 4 figures, 4 tables)

This paper contains 19 sections, 2 theorems, 17 equations, 4 figures, 4 tables.

Key Result

Theorem 1

If the committee models converge on a feature that is strictly stationary (time-invariant), the Consensus algorithm will asymptotically assign a False Discovery Rate (FDR) of $1.0$, resulting in a Reliability Score of $0$.

Figures (4)

  • Figure 1: AGRI-Fidelity Framework for Reliability-Oriented Evaluation via Tiered Cross-Model Consensus and Permutation-Based FDR Validation.
  • Figure 2: Cross-Model Attribution Consistency: Integrated gradients for a healthy sample in the denoised poultry dataset across CNN, MLP, LSTM, and ResNet consensus models.
  • Figure 3: Fidelity–Reliability Quadrant Distributions Across Datasets. Each plot shows per-sample AGRI-Fidelity components using CoughLIME Explainer. Denoised datasets exhibit stronger concentration in the high-fidelity, high-reliability quadrant, whereas noisy datasets demonstrate reliability dispersion under spurious contamination.
  • Figure 4: Fidelity–Reliability Quadrant Analysis on the Controlled Spurious Dataset. Each point represents a sample plotted by its mean fidelity and reliability score. Dashed lines denote median splits, forming four interpretative quadrants.

Theorems & Definitions (4)

  • Theorem 1: Safety against Stationary Artifacts
  • proof
  • Theorem 2: Sensitivity to Sparse Signals
  • proof