Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration
Junqi Gao, Zhichang Guo, Dazhi Zhang, Dong Li, Runze Liu, Pengfei Li, Kai Tian, Biqing Qi
TL;DR
Bohdi addresses the limited domain coverage and fixed data allocation in heterogeneous LLM fusion by introducing a synthetic-data-only framework that organizes knowledge into a hierarchical tree and dynamically expands domains using Sprout and Harvest. It connects domain expansion and data allocation to a Hierarchical Multi-Armed Bandit (HMAB) with DynaBranches, augmented by the Introspection-Rebirth (IR) mechanism and SWBLRT for online adaptation, operationalized through Meditation (exploration) and Enlightenment (training). Empirical results on targets such as Llama3.2-3B-Instruct and Gemma2-9B-IT show consistent gains across diverse benchmarks and notable data efficiency (around 1.7K data) while reducing capability imbalance compared to EF/IF baselines. The work introduces a principled, data-efficient pathway for multi-source LLM synergy and weak-to-strong supervision, with potential extensions in adaptive prompting and multi-agent collaboration.
Abstract
Heterogeneous Large Language Model (LLM) fusion integrates the strengths of multiple source LLMs with different architectures into a target LLM with low computational overhead. While promising, existing methods suffer from two major limitations: 1) reliance on real data from limited domain for knowledge fusion, preventing the target LLM from fully acquiring knowledge across diverse domains, and 2) fixed data allocation proportions across domains, failing to dynamically adjust according to the target LLM's varying capabilities across domains, leading to a capability imbalance. To overcome these limitations, we propose Bohdi, a synthetic-data-only heterogeneous LLM fusion framework. Through the organization of knowledge domains into a hierarchical tree structure, Bohdi enables automatic domain exploration and multi-domain data generation through multi-model collaboration, thereby comprehensively extracting knowledge from source LLMs. By formalizing domain expansion and data sampling proportion allocation on the knowledge tree as a Hierarchical Multi-Armed Bandit problem, Bohdi leverages the designed DynaBranches mechanism to adaptively adjust sampling proportions based on the target LLM's performance feedback across domains. Integrated with our proposed Introspection-Rebirth (IR) mechanism, DynaBranches dynamically tracks capability shifts during target LLM's updates via Sliding Window Binomial Likelihood Ratio Testing (SWBLRT), further enhancing its online adaptation capability. Comparative experimental results on a comprehensive suite of benchmarks demonstrate that Bohdi significantly outperforms existing baselines on multiple target LLMs, exhibits higher data efficiency, and virtually eliminates the imbalance in the target LLM's capabilities. Our code is available at https://github.com/gjq100/Bohdi.git.
