Table of Contents
Fetching ...

PluralLLM: Pluralistic Alignment in LLMs via Federated Learning

Mahmoud Srewa, Tianyu Zhao, Salma Elmalaki

TL;DR

The paper tackles aligning large language models to diverse human values while preserving privacy and fairness. It proposes PluralLLM, a federated learning framework that trains a transformer-based group preference predictor (gpo) across user groups without sharing raw data, enabling its use as a lightweight reward model for alignment. Using FedAvg, it achieves $46\%$ faster convergence and roughly $4\%$ higher alignment scores with fairness comparable to centralized training on a Q/A preference alignment task derived from Pew Global Attitudes Surveys. The work demonstrates that federated preference learning is a scalable, privacy-preserving alternative for pluralistic alignment and suggests future work to extend to other tasks and refine fairness across groups.

Abstract

Ensuring Large Language Models (LLMs) align with diverse human preferences while preserving privacy and fairness remains a challenge. Existing methods, such as Reinforcement Learning from Human Feedback (RLHF), rely on centralized data collection, making them computationally expensive and privacy-invasive. We introduce PluralLLM a federated learning-based approach that enables multiple user groups to collaboratively train a transformer-based preference predictor without sharing sensitive data, which can also serve as a reward model for aligning LLMs. Our method leverages Federated Averaging (FedAvg) to aggregate preference updates efficiently, achieving 46% faster convergence, a 4% improvement in alignment scores, and nearly the same group fairness measure as in centralized training. Evaluated on a Q/A preference alignment task, PluralLLM demonstrates that federated preference learning offers a scalable and privacy-preserving alternative for aligning LLMs with diverse human values.

PluralLLM: Pluralistic Alignment in LLMs via Federated Learning

TL;DR

The paper tackles aligning large language models to diverse human values while preserving privacy and fairness. It proposes PluralLLM, a federated learning framework that trains a transformer-based group preference predictor (gpo) across user groups without sharing raw data, enabling its use as a lightweight reward model for alignment. Using FedAvg, it achieves faster convergence and roughly higher alignment scores with fairness comparable to centralized training on a Q/A preference alignment task derived from Pew Global Attitudes Surveys. The work demonstrates that federated preference learning is a scalable, privacy-preserving alternative for pluralistic alignment and suggests future work to extend to other tasks and refine fairness across groups.

Abstract

Ensuring Large Language Models (LLMs) align with diverse human preferences while preserving privacy and fairness remains a challenge. Existing methods, such as Reinforcement Learning from Human Feedback (RLHF), rely on centralized data collection, making them computationally expensive and privacy-invasive. We introduce PluralLLM a federated learning-based approach that enables multiple user groups to collaboratively train a transformer-based preference predictor without sharing sensitive data, which can also serve as a reward model for aligning LLMs. Our method leverages Federated Averaging (FedAvg) to aggregate preference updates efficiently, achieving 46% faster convergence, a 4% improvement in alignment scores, and nearly the same group fairness measure as in centralized training. Evaluated on a Q/A preference alignment task, PluralLLM demonstrates that federated preference learning offers a scalable and privacy-preserving alternative for aligning LLMs with diverse human values.

Paper Structure

This paper contains 14 sections, 3 equations, 5 figures.

Figures (5)

  • Figure 1: PluralLLM: Pluralistic alignment in LLMs via Federated Learning.
  • Figure 2: Comparison of training loss curves for centralized learning GPO and PluralLLM. PluralLLM achieves a lower loss compared to Centralized Training GPO.
  • Figure 3: Comparison of preference distributions across Ground Truth, Centralized Learning GPO, and PluralLLM for a given question.
  • Figure 4: Comparison of mean evaluation group alignment scores for centralized learning GPO and PluralLLM.
  • Figure 5: Comparison of Fairness Index between centralized learning and PluralLLM. The utilization of PluralLLM in the training of a preference transformer Does Not result in significant disparities among groups.