CoLM: Collaborative Large Models via A Client-Server Paradigm
Siqi Huang, Sida Huang, Hongyuan Zhang
TL;DR
The paper tackles deployment realities where many clients rely on a finite set of server-side large models and proposes CoLM, a client-server collaboration framework that lets lightweight client models generate independent references while a centralized server synthesizes guidance to refine each client’s output. For language tasks, CoLM uses a three-stage pipeline: select top-k domain-specialized clients, collect their responses, and have the server produce guidance that clients then use to revise their answers; for vision-language models, outputs from multiple client VLMs are concatenated and refined through prompt-based collaboration. Extensive experiments on both LLM and VLM benchmarks show consistent improvements, with the strongest gains observed for weaker models, and ablations confirm the importance of model diversity, the number of collaborators, and collaboration rounds. The approach demonstrates a deployment-friendly alternative to server-to-server ensembles, enabling scalable, robust reasoning in real-world settings and extending to multimodal tasks with practical gains in accuracy and reliability.
Abstract
Large models have achieved remarkable performance across a range of reasoning and understanding tasks. Prior work often utilizes model ensembles or multi-agent systems to collaboratively generate responses, effectively operating in a server-to-server paradigm. However, such approaches do not align well with practical deployment settings, where a limited number of server-side models are shared by many clients under modern internet architectures. In this paper, we introduce \textbf{CoLM} (\textbf{Co}llaboration in \textbf{L}arge-\textbf{M}odels), a novel framework for collaborative reasoning that redefines cooperation among large models from a client-server perspective. Unlike traditional ensemble methods that rely on simultaneous inference from multiple models to produce a single output, CoLM allows the outputs of multiple models to be aggregated or shared, enabling each client model to independently refine and update its own generation based on these high-quality outputs. This design enables collaborative benefits by fully leveraging both client-side and shared server-side models. We further extend CoLM to vision-language models (VLMs), demonstrating its applicability beyond language tasks. Experimental results across multiple benchmarks show that CoLM consistently improves model performance on previously failed queries, highlighting the effectiveness of collaborative guidance in enhancing single-model capabilities.
