HM3: Heterogeneous Multi-Class Model Merging

Stefan Hackmann

HM3: Heterogeneous Multi-Class Model Merging

Stefan Hackmann

TL;DR

HM3 introduces a training-free approach to merge multiple text classifiers with heterogeneous output spaces by expanding each model’s final layer with zeros and applying group-wise softmax, enabling a single, multi-task model that preserves or improves task performance while reducing inference cost. The method builds on existing model-merging techniques (Model Soup, TIES, DARE) and introduces a formal HM3 expansion and a model-search step (DARE-TIES) to navigate task-vector densities. Case studies merging jailbreaking, hate speech, phishing, and sentiment detectors demonstrate that HM3 can yield merged models with comparable or superior F1-scores and up to significant latency reductions; self-merging experiments reveal density-dependent effects, raising questions about task-vector pruning. The results suggest practical benefits for deploying guardrail and auxiliary detectors in LLM pipelines, with potential extensions to other modalities and future tooling like AdaMerging to further optimize multi-task inference. Overall, HM3 offers a scalable, training-free path to consolidate heterogeneous guardrails and domain detectors into a single, efficient model with competitive performance.

Abstract

Foundation language model deployments often include auxiliary guard-rail models to filter or classify text, detecting jailbreak attempts, biased or toxic output, or ensuring topic adherence. These additional models increase the complexity and cost of model inference, especially since many are also large language models. To address this issue, we explore training-free model merging techniques to consolidate these models into a single, multi-functional model. We propose Heterogeneous Multi-Class Model Merging (HM3) as a simple technique for merging multi-class classifiers with heterogeneous label spaces. Unlike parameter-efficient fine-tuning techniques like LoRA, which require extensive training and add complexity during inference, recent advancements allow models to be merged in a training-free manner. We report promising results for merging BERT-based guard models, some of which attain an average F1-score higher than the source models while reducing the inference time by up to 44%. We introduce self-merging to assess the impact of reduced task-vector density, finding that the more poorly performing hate speech classifier benefits from self-merging while higher-performing classifiers do not, which raises questions about using task vector reduction for model tuning.

HM3: Heterogeneous Multi-Class Model Merging

TL;DR

Abstract

Paper Structure (21 sections, 7 equations, 30 figures, 7 tables, 3 algorithms)

This paper contains 21 sections, 7 equations, 30 figures, 7 tables, 3 algorithms.

Introduction
Background
Model Merging
Related Work
Ensembling
Multi-Task Learning
LoRA
Heterogeneous Multi-Class Model Merging
Case Studies
LLM Moderation
Case Study 1
Case Study 2
Self-Merging
Limitations and Future Directions
Conclusion
...and 6 more sections

Figures (30)

Figure 1: Our proposed method HM3 transforms a constellation of text classifiers with heterogeneous output dimensions so that they have the same output classifier structure and can be merged. The output layer of the base model is replaced with a classifier that consists only of zeros. Class probabilities of the merged model should be computed group-wise: softmax is applied separately on the two groups (J1, J2) and (H1, H2, H3) so that for each group the probabilities sum up to one. See Section \ref{['sec:our_method']} for details.
Figure 2: For model search, we sample task vector densities that are used by DARE-TIES after pre-processing with HM3. We use a Beta distribution for sampling that is skewed to the right to quickly explore the interesting cases with low but non-zero density. We select a distribution with zero density at zero as dropping all task vector values is equivalent to undoing all of the fine-tuning. Similarly, there is no variability in outcomes when the density is 1 and the merge algorithm is deterministic.
Figure 3: Both plots show F1-scores resulting from 500 guard models. We merged a jailbreak classifier with 2 output labels, jackhhao/jailbreak-classifier, and a hate speech classifier with 3 output labels, Hate-speech-CNERG/bert-base-uncased-hatexplain, into a single classifier with 5 output labels using HM3 followed by DARE-TIES, where we use the same but changing task-vector densities for both input models. The horizontal blue lines show the average macro F1-score of the original models. We evaluated 3000 test examples per dataset to compute the baseline. The stars mark additional test results for new best merged models found during model search, see Algorithm \ref{['alg:search']}, and the bold dots are the corresponding validation results that triggered the additional test. For densities close to 1, merging generally improves the overall performance a bit. For densities close to 0, the merged models frequently fail to classify examples correctly. Surprisingly, the best models are generated with quite low task-vector densities.
Figure 4: Here we compute the F1-score of the original classifier and the merged model resulting from model search for each dataset individually. The merged model is the model with the best validation score from model search. Note that we "cross-check" the merged model where possible, for instance, we test the jailbreak part of the merged model using the dataset with positive hate speech examples because we like to see that the merged model labels those as no jailbreaks. Positive jailbreak examples often contain problematic language and therefore we omit testing the hate speech part with positive jailbreak examples. See Table \ref{['table:bert-base-uncased-expected-labels-for-tests']} for a list of expected labels, Appendix \ref{['sub:accuracies']} for a complete collection of model accuracy comparisons, and Appendix \ref{['sub:testing']} for more insights into our testing process.
Figure 5: Merge of the phishing detector ealvaradob/bert-finetuned-phishing with the sentiment classifier assemblyai/bert-large-uncased-sst2 using DARE-TIES. Like in Figure \ref{['fig:bert-base-uncased__search']}, we find the best as well as the worst F1-scores for low densities. The best merged model has roughly the F1-score as the individual models on average, see Table \ref{['table:overview']}
...and 25 more figures

Theorems & Definitions (1)

Definition 1

HM3: Heterogeneous Multi-Class Model Merging

TL;DR

Abstract

HM3: Heterogeneous Multi-Class Model Merging

Authors

TL;DR

Abstract

Table of Contents

Figures (30)

Theorems & Definitions (1)