Position: Federated Foundation Language Model Post-Training Should Focus on Open-Source Models

Nikita Agrawal; Simon Mertel; Ruben Mayer

Position: Federated Foundation Language Model Post-Training Should Focus on Open-Source Models

Nikita Agrawal, Simon Mertel, Ruben Mayer

TL;DR

<3-5 sentence high-level summary>This position paper argues that federated foundation language model post-training should prioritize open-source (and open-weight) models to preserve privacy, autonomy, and transparency in FL. It defines a four-way openness taxonomy and analyzes how openness enables or restricts post-training methods (FFT, LoRA, adapters, prompt/instruction tuning, RLHF) while examining licensing and data considerations. The authors contend that open models align with FL principles and present a privacy/security analysis and a model-selection guide, while outlining significant risks associated with closed/black-box models. They conclude that a disciplined focus on open models yields more trustworthy, regulatory-compatible, and controllable FL post-training outcomes.

Abstract

Post-training of foundation language models has emerged as a promising research domain in federated learning (FL) with the goal to enable privacy-preserving model improvements and adaptations to user's downstream tasks. Recent advances in this area adopt centralized post-training approaches that build upon black-box foundation language models where there is no access to model weights and architecture details. Although the use of black-box models has been successful in centralized post-training, their blind replication in FL raises several concerns. Our position is that using black-box models in FL contradicts the core principles of federation such as data privacy and autonomy. In this position paper, we critically analyze the usage of black-box models in federated post-training, and provide a detailed account of various aspects of openness and their implications for FL.

Position: Federated Foundation Language Model Post-Training Should Focus on Open-Source Models

TL;DR

Abstract

Position: Federated Foundation Language Model Post-Training Should Focus on Open-Source Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)