Federated Learning and RAG Integration: A Scalable Approach for Medical Large Language Models
Jincheol Jung, Hongju Jeong, Eui-Nam Huh
TL;DR
The paper addresses privacy concerns in medical LLM deployment by comparing centralized and federated training with and without Retrieval-Augmented Generation (RAG). It introduces a federated LLM framework using the Flower framework and client-specific RAG, evaluated on PMC-derived data, and benchmarks four configurations. Results show that federated learning with RAG consistently outperforms centralized setups across key metrics, with performance improving as more clients participate. The findings demonstrate a scalable, privacy-preserving pathway for high-quality domain-specific medical text generation and set a foundation for extending RAG-FL to other sensitive domains.
Abstract
This study analyzes the performance of domain-specific Large Language Models (LLMs) for the medical field by integrating Retrieval-Augmented Generation (RAG) systems within a federated learning framework. Leveraging the inherent advantages of federated learning, such as preserving data privacy and enabling distributed computation, this research explores the integration of RAG systems with models trained under varying client configurations to optimize performance. Experimental results demonstrate that the federated learning-based models integrated with RAG systems consistently outperform their non-integrated counterparts across all evaluation metrics. This study highlights the potential of combining federated learning and RAG systems for developing domain-specific LLMs in the medical field, providing a scalable and privacy-preserving solution for enhancing text generation capabilities.
