Table of Contents
Fetching ...

DFAMS: Dynamic-flow guided Federated Alignment based Multi-prototype Search

Zhibang Yang, Xinke Jiang, Rihong Qiu, Ruiqing Li, Yihang Zhang, Yue Fang, Yongxin Xu, Hongxin Ding, Xu Chu, Junfeng Zhao, Yasha Wang

TL;DR

DFAMS addresses the challenge of obtaining high-quality retrieval across distributed knowledge sources in federated settings by explicitly modeling Dynamic Information Flow (DIF) within LLMs. It identifies latent query intents and subdomain activations via gradient-based Shapley attribution, then aligns DIF embeddings to a multi-prototype knowledge space using inter- and intra-KB contrastive losses. The framework employs adaptive prototype-guided routing to allocate retrieval slots across knowledge bases, achieving higher Cls Acc, Recall, and QA performance while maintaining efficiency. This approach advances privacy-preserving federated retrieval by preserving source boundaries and enabling fine-grained semantic routing across heterogeneous knowledge bases.

Abstract

Federated Retrieval (FR) routes queries across multiple external knowledge sources, to mitigate hallucinations of LLMs, when necessary external knowledge is distributed. However, existing methods struggle to retrieve high-quality and relevant documents for ambiguous queries, especially in cross-domain scenarios, which significantly limits their effectiveness in supporting downstream generation tasks. Inspired by Dynamic Information Flow (DIF), we propose DFAMS, a novel framework that leverages DIF to identify latent query intents and construct semantically aligned knowledge partitions for accurate retrieval across heterogeneous sources. Specifically, DFAMS probes the DIF in LLMs by leveraging gradient signals from a few annotated queries and employing Shapley value-based attribution to trace neuron activation paths associated with intent recognition and subdomain boundary detection. Then, DFAMS leverages DIF to train an alignment module via multi-prototype contrastive learning, enabling fine-grained intra-source modeling and inter-source semantic alignment across knowledge bases. Experimental results across five benchmarks show that DFAMS outperforms advanced FR methods by up to 14.37\% in knowledge classification accuracy, 5.38\% in retrieval recall, and 6.45\% in downstream QA accuracy, demonstrating its effectiveness in complex FR scenarios. Our code are anonymous available at https://anonymous.4open.science/r/DFAMS/

DFAMS: Dynamic-flow guided Federated Alignment based Multi-prototype Search

TL;DR

DFAMS addresses the challenge of obtaining high-quality retrieval across distributed knowledge sources in federated settings by explicitly modeling Dynamic Information Flow (DIF) within LLMs. It identifies latent query intents and subdomain activations via gradient-based Shapley attribution, then aligns DIF embeddings to a multi-prototype knowledge space using inter- and intra-KB contrastive losses. The framework employs adaptive prototype-guided routing to allocate retrieval slots across knowledge bases, achieving higher Cls Acc, Recall, and QA performance while maintaining efficiency. This approach advances privacy-preserving federated retrieval by preserving source boundaries and enabling fine-grained semantic routing across heterogeneous knowledge bases.

Abstract

Federated Retrieval (FR) routes queries across multiple external knowledge sources, to mitigate hallucinations of LLMs, when necessary external knowledge is distributed. However, existing methods struggle to retrieve high-quality and relevant documents for ambiguous queries, especially in cross-domain scenarios, which significantly limits their effectiveness in supporting downstream generation tasks. Inspired by Dynamic Information Flow (DIF), we propose DFAMS, a novel framework that leverages DIF to identify latent query intents and construct semantically aligned knowledge partitions for accurate retrieval across heterogeneous sources. Specifically, DFAMS probes the DIF in LLMs by leveraging gradient signals from a few annotated queries and employing Shapley value-based attribution to trace neuron activation paths associated with intent recognition and subdomain boundary detection. Then, DFAMS leverages DIF to train an alignment module via multi-prototype contrastive learning, enabling fine-grained intra-source modeling and inter-source semantic alignment across knowledge bases. Experimental results across five benchmarks show that DFAMS outperforms advanced FR methods by up to 14.37\% in knowledge classification accuracy, 5.38\% in retrieval recall, and 6.45\% in downstream QA accuracy, demonstrating its effectiveness in complex FR scenarios. Our code are anonymous available at https://anonymous.4open.science/r/DFAMS/

Paper Structure

This paper contains 64 sections, 6 equations, 4 figures, 10 tables, 3 algorithms.

Figures (4)

  • Figure 1: Hypothesized process of dynamic information flow (DIF) within LLMs for knowledge base selection. When a user asks “Can hypertensive patients use ibuprofen?”, the LLM first infers the latent intent—where a student seeks basic pharmacological understanding, while a doctor requires clinical evidence. The identified intent (e.g., as a doctor) triggers distinct neural and knowledge activations, forming DIF signals that guide retrieval: clinical pathways access ibuprofen records in EHR, whereas conceptual pathways retrieve NSAID-related information from PubMed.
  • Figure 2: DFAMS dynamically detects relevant information flow in LLMs and employs multi-prototype alignment and routing to accurately associate queries with domain-specific knowledge bases.
  • Figure 3: Hyperparameter Analysis of Prototypes per Class (Left) and Selected Prototypes for Routing (Right)
  • Figure 4: Heatmaps of Aggregated Shapley Values Across Layers and Neuron Groups for Qwen2.5-7B, Qwen2.5-3B, Qwen2.5-5B, and LLaMA3.1-8B Models