Table of Contents
Fetching ...

FecalFed: Privacy-Preserving Poultry Disease Detection via Federated Learning

Tien-Yu Chi

Abstract

Early detection of highly pathogenic avian influenza (HPAI) and endemic poultry diseases is critical for global food security. While computer vision models excel at classifying diseases from fecal imaging, deploying these systems at scale is bottlenecked by farm data privacy concerns and institutional data silos. Furthermore, existing open-source agricultural datasets frequently suffer from severe, undocumented data contamination. In this paper, we introduce $\textbf{FecalFed}$, a privacy-preserving federated learning framework for poultry disease classification. We first curate and release $\texttt{poultry-fecal-fl}$, a rigorously deduplicated dataset of 8,770 unique images across four disease classes, revealing and eliminating a 46.89$\%$ duplication rate in popular public repositories. To simulate realistic agricultural environments, we evaluate FecalFed under highly heterogeneous, non-IID conditions (Dirichlet $α=0.5$). While isolated single-farm training collapses under this data heterogeneity, yielding only 64.86$\%$ accuracy, our federated approach recovers performance without centralizing sensitive data. Specifically, utilizing server-side adaptive optimization (FedAdam) with a Swin-Small architecture achieves 90.31$\%$ accuracy, closely approaching the centralized upper bound of 95.10\%. Furthermore, we demonstrate that an edge-optimized Swin-Tiny model maintains highly competitive performance at 89.74$\%$, establishing a highly efficient, privacy-first blueprint for on-farm avian disease monitoring.

FecalFed: Privacy-Preserving Poultry Disease Detection via Federated Learning

Abstract

Early detection of highly pathogenic avian influenza (HPAI) and endemic poultry diseases is critical for global food security. While computer vision models excel at classifying diseases from fecal imaging, deploying these systems at scale is bottlenecked by farm data privacy concerns and institutional data silos. Furthermore, existing open-source agricultural datasets frequently suffer from severe, undocumented data contamination. In this paper, we introduce , a privacy-preserving federated learning framework for poultry disease classification. We first curate and release , a rigorously deduplicated dataset of 8,770 unique images across four disease classes, revealing and eliminating a 46.89 duplication rate in popular public repositories. To simulate realistic agricultural environments, we evaluate FecalFed under highly heterogeneous, non-IID conditions (Dirichlet ). While isolated single-farm training collapses under this data heterogeneity, yielding only 64.86 accuracy, our federated approach recovers performance without centralizing sensitive data. Specifically, utilizing server-side adaptive optimization (FedAdam) with a Swin-Small architecture achieves 90.31 accuracy, closely approaching the centralized upper bound of 95.10\%. Furthermore, we demonstrate that an edge-optimized Swin-Tiny model maintains highly competitive performance at 89.74, establishing a highly efficient, privacy-first blueprint for on-farm avian disease monitoring.

Paper Structure

This paper contains 15 sections, 1 equation, 5 figures, 1 table.

Figures (5)

  • Figure 1: Left: The FecalFed cross-silo federated learning architecture. Raw images remain isolated on-farm, while only Swin-Tiny model weights are communicated to the central server for FedAdam aggregation. Right: Test accuracy across the 10 isolated farm partitions (gray) demonstrating extreme variance and collapse under non-IID conditions, compared to the stable performance recovery achieved by FecalFed (orange).
  • Figure 2: The FecalFed data curation pipeline. (Top) The dual-hash deduplication process eliminated 46.89% of the raw aggregated dataset. (Bottom) A visual example of severe cross-source contamination, where original field data was found repeatedly down-sampled and duplicated in synthetic open-source repositories.
  • Figure 3: Data distribution across 10 isolated farm clients under a Dirichlet ($\alpha=0.5$) non-IID partitioning strategy. This extreme skew mimics realistic agricultural settings where localized outbreaks cause specific edge devices to hold a vast majority of samples for a single disease.
  • Figure 4: Federated test accuracy versus model parameter footprint. The Swin-Tiny architecture (28M parameters) emerges as the optimal candidate for edge deployment, achieving highly competitive accuracy (89.74%) while drastically reducing the computational and memory requirements compared to larger models like ViT-B/16.
  • Figure 5: Ablation study on global communication rounds using the ViT-B/16 baseline under FedAvg. The top panel shows test accuracy convergence, while the bottom panel shows test loss decay. Extending the training duration to 20 rounds continuously mitigates the effects of non-IID data skew, yielding smooth, stable performance improvements without stalling.