HM3: Heterogeneous Multi-Class Model Merging
Stefan Hackmann
TL;DR
HM3 introduces a training-free approach to merge multiple text classifiers with heterogeneous output spaces by expanding each model’s final layer with zeros and applying group-wise softmax, enabling a single, multi-task model that preserves or improves task performance while reducing inference cost. The method builds on existing model-merging techniques (Model Soup, TIES, DARE) and introduces a formal HM3 expansion and a model-search step (DARE-TIES) to navigate task-vector densities. Case studies merging jailbreaking, hate speech, phishing, and sentiment detectors demonstrate that HM3 can yield merged models with comparable or superior F1-scores and up to significant latency reductions; self-merging experiments reveal density-dependent effects, raising questions about task-vector pruning. The results suggest practical benefits for deploying guardrail and auxiliary detectors in LLM pipelines, with potential extensions to other modalities and future tooling like AdaMerging to further optimize multi-task inference. Overall, HM3 offers a scalable, training-free path to consolidate heterogeneous guardrails and domain detectors into a single, efficient model with competitive performance.
Abstract
Foundation language model deployments often include auxiliary guard-rail models to filter or classify text, detecting jailbreak attempts, biased or toxic output, or ensuring topic adherence. These additional models increase the complexity and cost of model inference, especially since many are also large language models. To address this issue, we explore training-free model merging techniques to consolidate these models into a single, multi-functional model. We propose Heterogeneous Multi-Class Model Merging (HM3) as a simple technique for merging multi-class classifiers with heterogeneous label spaces. Unlike parameter-efficient fine-tuning techniques like LoRA, which require extensive training and add complexity during inference, recent advancements allow models to be merged in a training-free manner. We report promising results for merging BERT-based guard models, some of which attain an average F1-score higher than the source models while reducing the inference time by up to 44%. We introduce self-merging to assess the impact of reduced task-vector density, finding that the more poorly performing hate speech classifier benefits from self-merging while higher-performing classifiers do not, which raises questions about using task vector reduction for model tuning.
