Table of Contents
Fetching ...

FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF

Flint Xiaofeng Fan, Cheston Tan, Yew-Soon Ong, Roger Wattenhofer, Wei-Tsang Ooi

TL;DR

FedRLHF tackles privacy and personalization challenges in RLHF by decentralizing the RLHF loop across K clients and aggregating only model updates via FedAvg. The framework allows client-specific reward shaping with human feedback, and the authors provide convergence guarantees and a sample-complexity bound that incorporate the influence of human feedback through λH_max. A formal personalization-performance trade-off is established, showing that stronger personalization (larger λ) improves client-specific rewards but modestly degrades global performance and increases sample requirements. Empirically, FedRLHF matches or surpasses centralized RLHF performance on MovieLens and IMDb while enhancing personalization, demonstrating practical privacy-preserving and scalable personalization for real-world systems.

Abstract

In the era of increasing privacy concerns and demand for personalized experiences, traditional Reinforcement Learning with Human Feedback (RLHF) frameworks face significant challenges due to their reliance on centralized data. We introduce Federated Reinforcement Learning with Human Feedback (FedRLHF), a novel framework that decentralizes the RLHF process. FedRLHF enables collaborative policy learning across multiple clients without necessitating the sharing of raw data or human feedback, thereby ensuring robust privacy preservation. Leveraging federated reinforcement learning, each client integrates human feedback locally into their reward functions and updates their policies through personalized RLHF processes. We establish rigorous theoretical foundations for FedRLHF, providing convergence guarantees, and deriving sample complexity bounds that scale efficiently with the number of clients. Empirical evaluations on the MovieLens and IMDb datasets demonstrate that FedRLHF not only preserves user privacy but also achieves performance on par with centralized RLHF, while enhancing personalization across diverse client environments.

FedRLHF: A Convergence-Guaranteed Federated Framework for Privacy-Preserving and Personalized RLHF

TL;DR

FedRLHF tackles privacy and personalization challenges in RLHF by decentralizing the RLHF loop across K clients and aggregating only model updates via FedAvg. The framework allows client-specific reward shaping with human feedback, and the authors provide convergence guarantees and a sample-complexity bound that incorporate the influence of human feedback through λH_max. A formal personalization-performance trade-off is established, showing that stronger personalization (larger λ) improves client-specific rewards but modestly degrades global performance and increases sample requirements. Empirically, FedRLHF matches or surpasses centralized RLHF performance on MovieLens and IMDb while enhancing personalization, demonstrating practical privacy-preserving and scalable personalization for real-world systems.

Abstract

In the era of increasing privacy concerns and demand for personalized experiences, traditional Reinforcement Learning with Human Feedback (RLHF) frameworks face significant challenges due to their reliance on centralized data. We introduce Federated Reinforcement Learning with Human Feedback (FedRLHF), a novel framework that decentralizes the RLHF process. FedRLHF enables collaborative policy learning across multiple clients without necessitating the sharing of raw data or human feedback, thereby ensuring robust privacy preservation. Leveraging federated reinforcement learning, each client integrates human feedback locally into their reward functions and updates their policies through personalized RLHF processes. We establish rigorous theoretical foundations for FedRLHF, providing convergence guarantees, and deriving sample complexity bounds that scale efficiently with the number of clients. Empirical evaluations on the MovieLens and IMDb datasets demonstrate that FedRLHF not only preserves user privacy but also achieves performance on par with centralized RLHF, while enhancing personalization across diverse client environments.

Paper Structure

This paper contains 37 sections, 12 theorems, 28 equations, 5 figures, 1 algorithm.

Key Result

Lemma 4.1

Under Assumptions assumption:L-smoothness, assumption:G-bounded-gradients, and assumption:bounded-variance, for any communication round $t$ and client $k$, we have: where $\theta_t^k$ is the local model of client $k$, $\theta_t$ is the global model, $\eta$ is the learning rate, $\tau$ is the number of local updates, $G$ is the gradient bound, and $\sigma^2$ is the variance bound.

Figures (5)

  • Figure 1: Comparison of the FedRLHF framework to conventional RLHF methods. Top: Conventional RLHF requires centralized collection of user data and feedback, in order to train the policy model and feedback model respectively. Bottom: In FedRLHF, clients maintain local policy models trained on-device using RLHF with local data and feedback models. Only policy model updates are shared with a central server, which aggregates them to refine a global policy.
  • Figure 2: Learning curves on MovieLens: (top) Global vs. Client Accuracy, (bottom) Client Spearman correlation.
  • Figure 3: Distribution of $K=10$ client accuracies and Spearman rank correlations per round for the MovieLens task.
  • Figure 4: Performance evaluation of FedRLHF in comparison to centralized RLHF. (a) Tracks the rewards and losses of clients and the global performance over federation rounds. (b) Compares the sample efficiency of FedRLHF (K = 5) with centralized RLHF in terms of average rewards and losses over training samples.
  • Figure 5: Trends of intrinsic rewards, sentiment rewards, and combined rewards over communication rounds for each client. Each subplot corresponds to one client, illustrating personalization effects due to varying $\lambda_k$ values.

Theorems & Definitions (19)

  • Remark
  • Remark
  • Lemma 4.1: Bounded Local-Global Difference
  • Remark
  • Lemma 4.2: One-Step Descent
  • Remark
  • Theorem 4.1: Convergence of FedRLHF
  • Theorem 4.2: Sample Complexity of FedRLHF
  • Definition 5.1: Maximum Reward
  • Definition 5.2: Personalization Score
  • ...and 9 more