Table of Contents
Fetching ...

Can Textual Gradient Work in Federated Learning?

Minghui Chen, Ruinan Jin, Wenlong Deng, Yuanyuan Chen, Zhi Huang, Han Yu, Xiaoxiao Li

TL;DR

This work interrogates whether textual gradients can be leveraged in federated learning for fine-tuning large language models. It presents FedTextGrad, a paradigm in which clients upload locally optimized prompts derived from TextGrad, and a server aggregates these prompts to form a global prompt. The paper identifies a key challenge—retaining essential information during prompt aggregation—and introduces an UID-based summarization method to balance information density. Empirical results across BBH reasoning tasks and GSM8K reveal that simple concatenation leads to unmanageable prompts and that standard summarization can hurt performance, while UID-based summarization improves accuracy and stability in federated settings. Overall, the work establishes a foundation for text-based federated prompt optimization of LLMs and outlines directions for privacy-preserving, scalable future research.

Abstract

Recent studies highlight the promise of LLM-based prompt optimization, especially with TextGrad, which automates differentiation'' via texts and backpropagates textual feedback. This approach facilitates training in various real-world applications that do not support numerical gradient propagation or loss calculation. In this paper, we systematically explore the potential and challenges of incorporating textual gradient into Federated Learning (FL). Our contributions are fourfold. Firstly, we introduce a novel FL paradigm, Federated Textual Gradient (FedTextGrad), that allows clients to upload locally optimized prompts derived from textual gradients, while the server aggregates the received prompts. Unlike traditional FL frameworks, which are designed for numerical aggregation, FedTextGrad is specifically tailored for handling textual data, expanding the applicability of FL to a broader range of problems that lack well-defined numerical loss functions. Secondly, building on this design, we conduct extensive experiments to explore the feasibility of FedTextGrad. Our findings highlight the importance of properly tuning key factors (e.g., local steps) in FL training. Thirdly, we highlight a major challenge in FedTextGrad aggregation: retaining essential information from distributed prompt updates. Last but not least, in response to this issue, we improve the vanilla variant of FedTextGrad by providing actionable guidance to the LLM when summarizing client prompts by leveraging the Uniform Information Density principle. Through this principled study, we enable the adoption of textual gradients in FL for optimizing LLMs, identify important issues, and pinpoint future directions, thereby opening up a new research area that warrants further investigation.

Can Textual Gradient Work in Federated Learning?

TL;DR

This work interrogates whether textual gradients can be leveraged in federated learning for fine-tuning large language models. It presents FedTextGrad, a paradigm in which clients upload locally optimized prompts derived from TextGrad, and a server aggregates these prompts to form a global prompt. The paper identifies a key challenge—retaining essential information during prompt aggregation—and introduces an UID-based summarization method to balance information density. Empirical results across BBH reasoning tasks and GSM8K reveal that simple concatenation leads to unmanageable prompts and that standard summarization can hurt performance, while UID-based summarization improves accuracy and stability in federated settings. Overall, the work establishes a foundation for text-based federated prompt optimization of LLMs and outlines directions for privacy-preserving, scalable future research.

Abstract

Recent studies highlight the promise of LLM-based prompt optimization, especially with TextGrad, which automates differentiation'' via texts and backpropagates textual feedback. This approach facilitates training in various real-world applications that do not support numerical gradient propagation or loss calculation. In this paper, we systematically explore the potential and challenges of incorporating textual gradient into Federated Learning (FL). Our contributions are fourfold. Firstly, we introduce a novel FL paradigm, Federated Textual Gradient (FedTextGrad), that allows clients to upload locally optimized prompts derived from textual gradients, while the server aggregates the received prompts. Unlike traditional FL frameworks, which are designed for numerical aggregation, FedTextGrad is specifically tailored for handling textual data, expanding the applicability of FL to a broader range of problems that lack well-defined numerical loss functions. Secondly, building on this design, we conduct extensive experiments to explore the feasibility of FedTextGrad. Our findings highlight the importance of properly tuning key factors (e.g., local steps) in FL training. Thirdly, we highlight a major challenge in FedTextGrad aggregation: retaining essential information from distributed prompt updates. Last but not least, in response to this issue, we improve the vanilla variant of FedTextGrad by providing actionable guidance to the LLM when summarizing client prompts by leveraging the Uniform Information Density principle. Through this principled study, we enable the adoption of textual gradients in FL for optimizing LLMs, identify important issues, and pinpoint future directions, thereby opening up a new research area that warrants further investigation.

Paper Structure

This paper contains 54 sections, 2 equations, 9 figures, 4 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of FedTextGrad, where the upper part (blue boxes) and the lower part (green boxes) represent two different clients. Within each client, circles represent the prompts, and boxes represent the LLM. FedTextGrad consists of four steps for local updating, proceeding from left to right. In step-1 (Prompt), the client is tasked with answering the Query by initializing a Prompt to the LLM to obtain a response. Then, in step-2 (Response), the LLM performs multi-step reasoning (e.g., CoT) and generates a Response. In step-3 (Evaluation), the Response is evaluated against the ground truth by the LLM, and a Evaluation score is generated. Finally, in step-4 (Textual Grad), the Prompt is updated "backward" based on feedback from the LLM. After this, the client sends the Updated Prompt to the server. On the server-side, the collected prompts from all clients are aggregated by the server, which acts as a trusted third party, and then sent back to the clients, as shown in step-5. Two aggregation strategies are available: simply concatenating the prompts or using the server-side LLM to summarize them. The system iteratively performs local updates (multiple local epochs of steps 1-4) and global aggregation (step-5) for optimization in the FL system.
  • Figure 2: Ablation study of the impact of three key FL hyper-parameters on FedTextGrad, evaluated across three datasets.
  • Figure 3: Comparison of the impact of different LLMs on (a) Centralized TextGrad and (b) FedTextGrad for BBH Object Counting tasks.
  • Figure 4: Increasing token length of concatenated prompts.
  • Figure 5: Illustration of the three types of prompt aggregation proposed in this paper: 1) Concatenation – where prompts from clients are directly concatenated; 2) Summarization – where a large language model (LLM) is employed to summarize the prompts provided by the clients; 3) Summarization with UID (SUM w/ UID) – where the summarization process is enhanced by applying uniform information density principles.
  • ...and 4 more figures