Table of Contents
Fetching ...

CoLM: Collaborative Large Models via A Client-Server Paradigm

Siqi Huang, Sida Huang, Hongyuan Zhang

TL;DR

The paper tackles deployment realities where many clients rely on a finite set of server-side large models and proposes CoLM, a client-server collaboration framework that lets lightweight client models generate independent references while a centralized server synthesizes guidance to refine each client’s output. For language tasks, CoLM uses a three-stage pipeline: select top-k domain-specialized clients, collect their responses, and have the server produce guidance that clients then use to revise their answers; for vision-language models, outputs from multiple client VLMs are concatenated and refined through prompt-based collaboration. Extensive experiments on both LLM and VLM benchmarks show consistent improvements, with the strongest gains observed for weaker models, and ablations confirm the importance of model diversity, the number of collaborators, and collaboration rounds. The approach demonstrates a deployment-friendly alternative to server-to-server ensembles, enabling scalable, robust reasoning in real-world settings and extending to multimodal tasks with practical gains in accuracy and reliability.

Abstract

Large models have achieved remarkable performance across a range of reasoning and understanding tasks. Prior work often utilizes model ensembles or multi-agent systems to collaboratively generate responses, effectively operating in a server-to-server paradigm. However, such approaches do not align well with practical deployment settings, where a limited number of server-side models are shared by many clients under modern internet architectures. In this paper, we introduce \textbf{CoLM} (\textbf{Co}llaboration in \textbf{L}arge-\textbf{M}odels), a novel framework for collaborative reasoning that redefines cooperation among large models from a client-server perspective. Unlike traditional ensemble methods that rely on simultaneous inference from multiple models to produce a single output, CoLM allows the outputs of multiple models to be aggregated or shared, enabling each client model to independently refine and update its own generation based on these high-quality outputs. This design enables collaborative benefits by fully leveraging both client-side and shared server-side models. We further extend CoLM to vision-language models (VLMs), demonstrating its applicability beyond language tasks. Experimental results across multiple benchmarks show that CoLM consistently improves model performance on previously failed queries, highlighting the effectiveness of collaborative guidance in enhancing single-model capabilities.

CoLM: Collaborative Large Models via A Client-Server Paradigm

TL;DR

The paper tackles deployment realities where many clients rely on a finite set of server-side large models and proposes CoLM, a client-server collaboration framework that lets lightweight client models generate independent references while a centralized server synthesizes guidance to refine each client’s output. For language tasks, CoLM uses a three-stage pipeline: select top-k domain-specialized clients, collect their responses, and have the server produce guidance that clients then use to revise their answers; for vision-language models, outputs from multiple client VLMs are concatenated and refined through prompt-based collaboration. Extensive experiments on both LLM and VLM benchmarks show consistent improvements, with the strongest gains observed for weaker models, and ablations confirm the importance of model diversity, the number of collaborators, and collaboration rounds. The approach demonstrates a deployment-friendly alternative to server-to-server ensembles, enabling scalable, robust reasoning in real-world settings and extending to multimodal tasks with practical gains in accuracy and reliability.

Abstract

Large models have achieved remarkable performance across a range of reasoning and understanding tasks. Prior work often utilizes model ensembles or multi-agent systems to collaboratively generate responses, effectively operating in a server-to-server paradigm. However, such approaches do not align well with practical deployment settings, where a limited number of server-side models are shared by many clients under modern internet architectures. In this paper, we introduce \textbf{CoLM} (\textbf{Co}llaboration in \textbf{L}arge-\textbf{M}odels), a novel framework for collaborative reasoning that redefines cooperation among large models from a client-server perspective. Unlike traditional ensemble methods that rely on simultaneous inference from multiple models to produce a single output, CoLM allows the outputs of multiple models to be aggregated or shared, enabling each client model to independently refine and update its own generation based on these high-quality outputs. This design enables collaborative benefits by fully leveraging both client-side and shared server-side models. We further extend CoLM to vision-language models (VLMs), demonstrating its applicability beyond language tasks. Experimental results across multiple benchmarks show that CoLM consistently improves model performance on previously failed queries, highlighting the effectiveness of collaborative guidance in enhancing single-model capabilities.

Paper Structure

This paper contains 27 sections, 1 equation, 8 figures, 4 tables.

Figures (8)

  • Figure 1: (a) Initial model performance across MME shows no single model excels universally. (b) Models enhanced by our method (marked with *) show consistent improvements across datasets.
  • Figure 2: Left: Traditional server-to-server collaboration paradigm, where multiple large models interact directly during inference. These approaches often rely on interactions among general-purpose models, lacking specialization structure. Right: Our proposed client-server collaboration paradigm, where lightweight client models receive guidance from shared server-side models. This design allows each client to maintain long-lived, domain-specific expertise while improving response quality through collaboration.
  • Figure 3: Examples of Janus-Pro-7B responses on VQA tasks. Our method enables the model to produce more accurate answers through collaborative inference.
  • Figure 4: Ablation study on the effect of collaborative user scale on LLM performance. Experiments conducted on three benchmarks show that increasing the number of collaborative clients leads to consistent performance improvement.
  • Figure 5: Performance improvements with increasing collaboration rounds across multiple datasets and models. Iterative interaction consistently enhances results, especially for models with weaker initial outputs, but shows diminishing returns after several rounds.
  • ...and 3 more figures