FedReplay: A Feature Replay Assisted Federated Transfer Learning Framework for Efficient and Privacy-Preserving Smart Agriculture
Long Li, Jiajia Li, Dong Chen, Lina Pu, Haibo Yao, Yanbo Huang
TL;DR
The paper tackles privacy and communication efficiency in federated learning for smart agriculture by introducing FedReplay, a framework that freezes a CLIP vision encoder and trains a small Transformer head within FL. To mitigate non-IID data across farms, it shares a tiny, non-reversible feature replay pool (1%) and employs warm-start, replay-assisted local training, and Row-Gated FedAvg for orderly integration of new clients. The approach yields $86.6\%$ accuracy on a crop–weed dataset while cutting communication overhead by approximately $98\%$ compared to full-model FL, and it remains robust to late-joining farms. Together, these contributions offer a practical, privacy-preserving path to scalable, high-performance agricultural intelligence in distributed settings.
Abstract
Accurate classification plays a pivotal role in smart agriculture, enabling applications such as crop monitoring, fruit recognition, and pest detection. However, conventional centralized training often requires large-scale data collection, which raises privacy concerns, while standard federated learning struggles with non-independent and identically distributed (non-IID) data and incurs high communication costs. To address these challenges, we propose a federated learning framework that integrates a frozen Contrastive Language-Image Pre-training (CLIP) vision transformer (ViT) with a lightweight transformer classifier. By leveraging the strong feature extraction capability of the pre-trained CLIP ViT, the framework avoids training large-scale models from scratch and restricts federated updates to a compact classifier, thereby reducing transmission overhead significantly. Furthermore, to mitigate performance degradation caused by non-IID data distribution, a small subset (1%) of CLIP-extracted feature representations from all classes is shared across clients. These shared features are non-reversible to raw images, ensuring privacy preservation while aligning class representation across participants. Experimental results on agricultural classification tasks show that the proposed method achieve 86.6% accuracy, which is more than 4 times higher compared to baseline federated learning approaches. This demonstrates the effectiveness and efficiency of combining vision-language model features with federated learning for privacy-preserving and scalable agricultural intelligence.
