Table of Contents
Fetching ...

TRINITY: An Evolved LLM Coordinator

Jinglue Xu, Qi Sun, Peter Schwendeman, Stefan Nielsen, Edoardo Cetin, Yujin Tang

TL;DR

This work introduces Trinity, a lightweight coordinator that orchestrates multiple diverse LLMs without weight merging by leveraging hidden-state signals from a compact 0.6B SLM and a 10K-parameter head. It employs tri-role coordination (Thinker, Worker, Verifier) over multi-turn interactions and trains the coordinator with separable-CMA-ES to exploit block-epsilon separability under tight evaluation budgets. Trinity achieves state-of-the-art results on LiveCodeBench and demonstrates strong zero-shot generalization to unseen tasks, supported by analyses of representation separability and objective separability. The approach suggests a scalable path for collaborative AI systems by engineering effective, budget-conscious model coordination rather than pursuing further monolithic scaling.

Abstract

Combining diverse foundation models is promising, but weight-merging is limited by mismatched architectures and closed APIs. Trinity addresses this with a lightweight coordinator that orchestrates collaboration among large language models (LLMs). The coordinator, comprising a compact language model (approximately $0.6$B parameters) and a lightweight head (approximately $10$K parameters), is optimized with an evolutionary strategy for efficient and adaptive delegation. Trinity processes queries over multiple turns, where at each turn the coordinator assigns one of three roles (Thinker, Worker, or Verifier) to a selected LLM, effectively offloading complex skill acquisition from the coordinator itself. Experiments show that Trinity consistently outperforms individual models and existing methods across coding, math, reasoning, and domain knowledge tasks, and generalizes robustly to out-of-distribution tasks. On standard benchmarks, Trinity achieves state-of-the-art results, including a score of 86.2% on LiveCodeBench. Theoretical and empirical analyses identify two main factors behind this performance: (1) the coordinator's hidden-state representations provide rich contextualization of inputs, and (2) under high dimensionality and strict budget constraints, the separable Covariance Matrix Adaptation Evolution Strategy offers advantages over reinforcement learning, imitation learning, and random search by exploiting potential block-epsilon-separability.

TRINITY: An Evolved LLM Coordinator

TL;DR

This work introduces Trinity, a lightweight coordinator that orchestrates multiple diverse LLMs without weight merging by leveraging hidden-state signals from a compact 0.6B SLM and a 10K-parameter head. It employs tri-role coordination (Thinker, Worker, Verifier) over multi-turn interactions and trains the coordinator with separable-CMA-ES to exploit block-epsilon separability under tight evaluation budgets. Trinity achieves state-of-the-art results on LiveCodeBench and demonstrates strong zero-shot generalization to unseen tasks, supported by analyses of representation separability and objective separability. The approach suggests a scalable path for collaborative AI systems by engineering effective, budget-conscious model coordination rather than pursuing further monolithic scaling.

Abstract

Combining diverse foundation models is promising, but weight-merging is limited by mismatched architectures and closed APIs. Trinity addresses this with a lightweight coordinator that orchestrates collaboration among large language models (LLMs). The coordinator, comprising a compact language model (approximately B parameters) and a lightweight head (approximately K parameters), is optimized with an evolutionary strategy for efficient and adaptive delegation. Trinity processes queries over multiple turns, where at each turn the coordinator assigns one of three roles (Thinker, Worker, or Verifier) to a selected LLM, effectively offloading complex skill acquisition from the coordinator itself. Experiments show that Trinity consistently outperforms individual models and existing methods across coding, math, reasoning, and domain knowledge tasks, and generalizes robustly to out-of-distribution tasks. On standard benchmarks, Trinity achieves state-of-the-art results, including a score of 86.2% on LiveCodeBench. Theoretical and empirical analyses identify two main factors behind this performance: (1) the coordinator's hidden-state representations provide rich contextualization of inputs, and (2) under high dimensionality and strict budget constraints, the separable Covariance Matrix Adaptation Evolution Strategy offers advantages over reinforcement learning, imitation learning, and random search by exploiting potential block-epsilon-separability.

Paper Structure

This paper contains 40 sections, 2 theorems, 45 equations, 15 figures, 12 tables.

Key Result

Proposition 1

Fix $T\in[2,60]$ and let the CMA budget be $B_{\mathrm{env}}=m_{\mathrm{CMA}}\lambda T$. If the replication schedule ensures $\tilde{\rho}_{\mathrm{CMA}}/\tilde{\rho}_{\mathrm{RS}}\ge \eta\in(0,1]$ and the metric-alignment efficiency stays comparable across iterations (Assumption ass:chi-comp), then The inequality holds for oracle step-sizes and, up to a universal constant factor, for fixed step-s

Figures (15)

  • Figure 1: Overview and an example of our coordination method.Left: The cyclical coordination architecture. In each turn, the full conversation transcript is passed to a compact coordinator model. A lightweight head selects an LLM and assigns it one of three roles: Thinker (T), Worker (W), or Verifier (V). A message processing module injects a role-specific prompt before the request is sent to the chosen LLM. Right: An example of multi-turn coordination. To solve a complex depreciation problem, Trinity invokes a Thinker (Turn 1) to decompose the task, a Worker (Turn 2) to perform the calculation, and a Verifier (Turn 3) to validate the answer and identify edge cases.
  • Figure 2: Parametrization of the Trinity coordinator. A lightweight head (see Appendix \ref{['app_head_design']}) operates in parallel to the base model's LM head. It takes the hidden state $h$ corresponding to the penultimate output token as its sole input. This head $f_\theta$ is responsible for all coordination decisions, producing two sets of logits, one to select an LLM from the pool of $L$ models, and another to assign one of three roles. As a secondary optimization, we also fine-tune the singular value scales of the parameter matrices in the SLM's layers, indicated by the red diagonal lines. In the figure, the hidden state at the position marked by "$<$Head Input$>$" is the input to lightweight head. Note that the semantic correspondence of the decoded message "$<$BOS$>$ ..." to the hidden state is only for illustrative purpose, as the lightweight head operates on the internal hidden state from that position, not the final decoded text.
  • Figure 3: Trinity outperforms single- and multi-model baselines across four benchmarks. Our approach (boldface on the x-axis) achieves the highest performance across four tasks, surpassing the baseline methods. In Math500, MMLU and LiveCodeBench, our performance is close to "Per-Question-Best", representing an upper bound achieved by taking the union of all correct answers from the single LLMs.
  • Figure 4: LiveCodeBench Results. Top:Trinity achieves state-of-the-art. Bottom:Trinity benefits from increasing maximum turns budgets.
  • Figure 5: Task type separability in extracted hidden states. Both are based on penultimate-token hidden states processed by the SLM on the input sequence, and the labels are from the task metadata.
  • ...and 10 more figures

Theorems & Definitions (5)

  • Definition 1: Hessian-based block-$\varepsilon$ separability in $\mathcal{P}$
  • Definition 2: Metric–alignment factor
  • Definition 3: Rank attenuation under replication
  • Proposition 1
  • Proposition 2