Federated Reasoning Distillation Framework with Model Learnability-Aware Data Allocation

Wei Guo; Siyuan Lu; Xiangdong Ran; Yiqi Tong; Yikun Ban; Zelong Xu; Jing Fan; Zixuan Huang; Xiao Zhang; Zhaojun Hu; Fuzhen Zhuang

Federated Reasoning Distillation Framework with Model Learnability-Aware Data Allocation

Wei Guo, Siyuan Lu, Xiangdong Ran, Yiqi Tong, Yikun Ban, Zelong Xu, Jing Fan, Zixuan Huang, Xiao Zhang, Zhaojun Hu, Fuzhen Zhuang

TL;DR

LaDa is proposed, a federated reasoning distillation framework with model learnability-aware data allocation that adaptively allocates high-reward samples based on the learnability gap between each SLM and LLM pair, effectively facilitating bidirectional knowledge transfer.

Abstract

Data allocation plays a critical role in federated large language model (LLM) and small language models (SLMs) reasoning collaboration. Nevertheless, existing data allocation methods fail to address an under-explored challenge in collaboration: bidirectional model learnability gap, where client-side SLMs cannot identify high-reward samples matching their learnability constraints for effective knowledge transfer from LLMs, while LLMs struggle to select samples contributing novel knowledge beyond their existing data. Furthermore, these collaboration frameworks face another key challenge: domain-agnostic reasoning transfer, where existing reasoning transfer methods fail to flexibly adapt to the local domain data, preventing SLMs from effectively acquiring step-by-step reasoning abilities within from general LLM. To address these challenges, we propose LaDa, a federated reasoning distillation framework with model learnability-aware data allocation. It introduces a model learnability-aware data filter that adaptively allocates high-reward samples based on the learnability gap between each SLM and LLM pair, effectively facilitating bidirectional knowledge transfer. We further design a domain adaptive reasoning distillation method that aligns joint probabilities of reasoning paths on filtered high-reward samples through contrastive distillation learning between SLM and LLM, enabling SLM to capture underlying reasoning patterns under local data distribution. LaDa operates as a plug-in module for existing collaboration frameworks, adapting knowledge transfer based on model learnability gaps.

Federated Reasoning Distillation Framework with Model Learnability-Aware Data Allocation

TL;DR

Abstract

Paper Structure (22 sections, 20 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 22 sections, 20 equations, 6 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Federated Large-Small Model Collaboration
Data Allocation
Problem Formulation
Problem Verification
Optimization Objective
Methodology
Domain Adaptive Reasoning Distillation
Model Learnability-Aware Data Filter
Convergence Analysis
Experiments
Experimental Setup
Overall Performance
Reasoning Capacity Comparison
...and 7 more sections

Figures (6)

Figure 1: LaDa identify an under-explored challenge in federated large-small model collaboration: bidirectional model learnability gap, where small langauge models (SLMs) and large language model (LLM) cannot identify their own high-reward samples matching their learnability constraint. Additionally, we address another key challenge: domain-agnostic reasoning transfer, where existing methods fail to flexible adapt to local domain data.
Figure 2: Illustration of bidirectional model learnability gap: (a) client SLMs exhibit different learning capacities when collaborating with server LLMs of varying sizes; (b) Server LLM also shows varying knowledge absorption from different client SLM configurations.
Figure 3: The overview of LaDa framework. Model learnability-aware data filter, which dynamically allocates high-reward samples based on bidirectional learnability gaps of each SLM and LLM pair; and domain adaptive reasoning distillation, which aligns joint probabilities of reasoning paths through contrastive distillation learning, enabling domain adaptive reasoning capacity learning between each SLM and LLM pair.
Figure 4: Convergence analysis results of our proposed ReDA.
Figure 5: Communication cost comparison of our proposed ReDA.
...and 1 more figures

Federated Reasoning Distillation Framework with Model Learnability-Aware Data Allocation

TL;DR

Abstract

Federated Reasoning Distillation Framework with Model Learnability-Aware Data Allocation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)