FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF

Flint Xiaofeng Fan; Cheston Tan; Yew-Soon Ong; Roger Wattenhofer; Wei-Tsang Ooi

FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF

Flint Xiaofeng Fan, Cheston Tan, Yew-Soon Ong, Roger Wattenhofer, Wei-Tsang Ooi

TL;DR

FedRLHF tackles privacy and personalization challenges in RLHF by decentralizing the RLHF loop across K clients and aggregating only model updates via FedAvg. The framework allows client-specific reward shaping with human feedback, and the authors provide convergence guarantees and a sample-complexity bound that incorporate the influence of human feedback through λH_max. A formal personalization-performance trade-off is established, showing that stronger personalization (larger λ) improves client-specific rewards but modestly degrades global performance and increases sample requirements. Empirically, FedRLHF matches or surpasses centralized RLHF performance on MovieLens and IMDb while enhancing personalization, demonstrating practical privacy-preserving and scalable personalization for real-world systems.

Abstract

In the era of increasing privacy concerns and demand for personalized experiences, traditional Reinforcement Learning with Human Feedback (RLHF) frameworks face significant challenges due to their reliance on centralized data. We introduce Federated Reinforcement Learning with Human Feedback (FedRLHF), a novel framework that decentralizes the RLHF process. FedRLHF enables collaborative policy learning across multiple clients without necessitating the sharing of raw data or human feedback, thereby ensuring robust privacy preservation. Leveraging federated reinforcement learning, each client integrates human feedback locally into their reward functions and updates their policies through personalized RLHF processes. We establish rigorous theoretical foundations for FedRLHF, providing convergence guarantees, and deriving sample complexity bounds that scale efficiently with the number of clients. Empirical evaluations on the MovieLens and IMDb datasets demonstrate that FedRLHF not only preserves user privacy but also achieves performance on par with centralized RLHF, while enhancing personalization across diverse client environments.

FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF

TL;DR

Abstract

FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (19)