Table of Contents
Fetching ...

FedReplay: A Feature Replay Assisted Federated Transfer Learning Framework for Efficient and Privacy-Preserving Smart Agriculture

Long Li, Jiajia Li, Dong Chen, Lina Pu, Haibo Yao, Yanbo Huang

TL;DR

The paper tackles privacy and communication efficiency in federated learning for smart agriculture by introducing FedReplay, a framework that freezes a CLIP vision encoder and trains a small Transformer head within FL. To mitigate non-IID data across farms, it shares a tiny, non-reversible feature replay pool (1%) and employs warm-start, replay-assisted local training, and Row-Gated FedAvg for orderly integration of new clients. The approach yields $86.6\%$ accuracy on a crop–weed dataset while cutting communication overhead by approximately $98\%$ compared to full-model FL, and it remains robust to late-joining farms. Together, these contributions offer a practical, privacy-preserving path to scalable, high-performance agricultural intelligence in distributed settings.

Abstract

Accurate classification plays a pivotal role in smart agriculture, enabling applications such as crop monitoring, fruit recognition, and pest detection. However, conventional centralized training often requires large-scale data collection, which raises privacy concerns, while standard federated learning struggles with non-independent and identically distributed (non-IID) data and incurs high communication costs. To address these challenges, we propose a federated learning framework that integrates a frozen Contrastive Language-Image Pre-training (CLIP) vision transformer (ViT) with a lightweight transformer classifier. By leveraging the strong feature extraction capability of the pre-trained CLIP ViT, the framework avoids training large-scale models from scratch and restricts federated updates to a compact classifier, thereby reducing transmission overhead significantly. Furthermore, to mitigate performance degradation caused by non-IID data distribution, a small subset (1%) of CLIP-extracted feature representations from all classes is shared across clients. These shared features are non-reversible to raw images, ensuring privacy preservation while aligning class representation across participants. Experimental results on agricultural classification tasks show that the proposed method achieve 86.6% accuracy, which is more than 4 times higher compared to baseline federated learning approaches. This demonstrates the effectiveness and efficiency of combining vision-language model features with federated learning for privacy-preserving and scalable agricultural intelligence.

FedReplay: A Feature Replay Assisted Federated Transfer Learning Framework for Efficient and Privacy-Preserving Smart Agriculture

TL;DR

The paper tackles privacy and communication efficiency in federated learning for smart agriculture by introducing FedReplay, a framework that freezes a CLIP vision encoder and trains a small Transformer head within FL. To mitigate non-IID data across farms, it shares a tiny, non-reversible feature replay pool (1%) and employs warm-start, replay-assisted local training, and Row-Gated FedAvg for orderly integration of new clients. The approach yields accuracy on a crop–weed dataset while cutting communication overhead by approximately compared to full-model FL, and it remains robust to late-joining farms. Together, these contributions offer a practical, privacy-preserving path to scalable, high-performance agricultural intelligence in distributed settings.

Abstract

Accurate classification plays a pivotal role in smart agriculture, enabling applications such as crop monitoring, fruit recognition, and pest detection. However, conventional centralized training often requires large-scale data collection, which raises privacy concerns, while standard federated learning struggles with non-independent and identically distributed (non-IID) data and incurs high communication costs. To address these challenges, we propose a federated learning framework that integrates a frozen Contrastive Language-Image Pre-training (CLIP) vision transformer (ViT) with a lightweight transformer classifier. By leveraging the strong feature extraction capability of the pre-trained CLIP ViT, the framework avoids training large-scale models from scratch and restricts federated updates to a compact classifier, thereby reducing transmission overhead significantly. Furthermore, to mitigate performance degradation caused by non-IID data distribution, a small subset (1%) of CLIP-extracted feature representations from all classes is shared across clients. These shared features are non-reversible to raw images, ensuring privacy preservation while aligning class representation across participants. Experimental results on agricultural classification tasks show that the proposed method achieve 86.6% accuracy, which is more than 4 times higher compared to baseline federated learning approaches. This demonstrates the effectiveness and efficiency of combining vision-language model features with federated learning for privacy-preserving and scalable agricultural intelligence.

Paper Structure

This paper contains 32 sections, 6 equations, 10 figures, 2 tables, 1 algorithm.

Figures (10)

  • Figure 1: Illustration of knowledge conflict cause by non-IID dataset distribution in an FL framework
  • Figure 2: Illustration of how non-IID data distribution degrades federated training performance.
  • Figure 3: Structure of the proposed transformer-based classification model with transfer learning
  • Figure 4: Workflow of training process in the proposed FL framework
  • Figure 5: Workflow of the late-joining client integration process in the proposed FL framework
  • ...and 5 more figures