Table of Contents
Fetching ...

KVComm: Enabling Efficient LLM Communication through Selective KV Sharing

Xiangyu Shi, Marco Chiesa, Gerald Q. Maguire, Dejan Kostic

TL;DR

KVComm addresses the inefficiencies of inter-LLM communication by replacing purely natural-language or hidden-state channels with selective sharing of per-layer KV pairs. It uses attention-derived layer importance scores and a Gaussian prior to choose non-contiguous KV layers to share, enabling the receiver to attend to both its own context and the sender's information. Across eight model pairs and diverse tasks, KVComm matches or outperforms baselines while reducing data transmission and computation by up to several-fold, and even surpasses Skyline on some datasets. The approach generalizes across architectures and requires only a small calibration set, making it practical for scalable multi-agent LLM systems.

Abstract

Large Language Models (LLMs) are increasingly deployed in multi-agent systems, where effective inter-model communication is crucial. Existing communication protocols either rely on natural language, incurring high inference costs and information loss, or on hidden states, which suffer from information concentration bias and inefficiency. To address these limitations, we propose KVComm, a novel communication framework that enables efficient communication between LLMs through selective sharing of KV pairs. KVComm leverages the rich information encoded in the KV pairs while avoiding the pitfalls of hidden states. We introduce a KV layer-wise selection strategy based on attention importance scores with a Gaussian prior to identify the most informative KV pairs for communication. Extensive experiments across diverse tasks and model pairs demonstrate that KVComm achieves comparable performance to the upper-bound method, which directly merges inputs to one model without any communication, while transmitting as few as 30\% of layers' KV pairs. Our study highlights the potential of KV pairs as an effective medium for inter-LLM communication, paving the way for scalable and efficient multi-agent systems.

KVComm: Enabling Efficient LLM Communication through Selective KV Sharing

TL;DR

KVComm addresses the inefficiencies of inter-LLM communication by replacing purely natural-language or hidden-state channels with selective sharing of per-layer KV pairs. It uses attention-derived layer importance scores and a Gaussian prior to choose non-contiguous KV layers to share, enabling the receiver to attend to both its own context and the sender's information. Across eight model pairs and diverse tasks, KVComm matches or outperforms baselines while reducing data transmission and computation by up to several-fold, and even surpasses Skyline on some datasets. The approach generalizes across architectures and requires only a small calibration set, making it practical for scalable multi-agent LLM systems.

Abstract

Large Language Models (LLMs) are increasingly deployed in multi-agent systems, where effective inter-model communication is crucial. Existing communication protocols either rely on natural language, incurring high inference costs and information loss, or on hidden states, which suffer from information concentration bias and inefficiency. To address these limitations, we propose KVComm, a novel communication framework that enables efficient communication between LLMs through selective sharing of KV pairs. KVComm leverages the rich information encoded in the KV pairs while avoiding the pitfalls of hidden states. We introduce a KV layer-wise selection strategy based on attention importance scores with a Gaussian prior to identify the most informative KV pairs for communication. Extensive experiments across diverse tasks and model pairs demonstrate that KVComm achieves comparable performance to the upper-bound method, which directly merges inputs to one model without any communication, while transmitting as few as 30\% of layers' KV pairs. Our study highlights the potential of KV pairs as an effective medium for inter-LLM communication, paving the way for scalable and efficient multi-agent systems.

Paper Structure

This paper contains 38 sections, 8 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: KVComm framework for efficient LLM communication through selective KV sharing.
  • Figure 2: Compared to other token positions, the last token's hidden state is the most critical, especially in later layers.
  • Figure 3: Prepending hidden states is not effective unless hidden states are from and to the early layers.
  • Figure 4: Effective communication with limited hyperparameters.
  • Figure 5: KVComm achieves nearly the best or even outperforms contig. chunks.
  • ...and 7 more figures