Table of Contents
Fetching ...

A Novel Hierarchical Integration Method for Efficient Model Merging in Medical LLMs

Prakrit Timilsina, Anuj Nepal, Rajan Kadel, Robin Doss

TL;DR

This work addresses the challenge of efficiently consolidating medical expertise across distributed edge settings by evaluating six parameter-space merging techniques on architecturally compatible medical LLMs. It introduces a novel Hierarchical Cosine-OT-LERP method that combines task-vector similarity with selective attention-head alignment to mitigate permutation variance while preserving edge-deployment efficiency. Across five medical benchmarks, simple merging methods—especially Task Arithmetic and Linear Averaging—consistently outperform complex approaches, achieving up to 45.80% accuracy on MedQA and often surpassing the base model on QA tasks. The findings suggest a practical path for privacy-preserving, scalable medical AI in IoT-enabled environments, highlighting compatibility-aware design and favoring lightweight merging baselines over retraining in resource-constrained settings.

Abstract

Large Language Models (LLMs) face significant challenges in distributed healthcare, including consolidating specialized domain knowledge across institutions while maintaining privacy, reducing computational overhead, and preventing catastrophic forgetting during model updates.This paper presents a systematic evaluation of six parameter-space merging techniques applied to two architecturally compatible medical LLMs derived from the Mistral-7B base model. We introduce a novel hierarchical method that combines selective Optimal Transport (OT) alignment for attention layers with cosine similarity-weighted interpolation, designed to address permutation variance while minimizing computational overhead for edge deployment scenarios. Our study evaluates Task Arithmetic, Linear Averaging, DARE-TIES, DELLA, Breadcrumbs, and our Hierarchical approach across five medical benchmarks. Results demonstrate that architecturally compatible models benefit significantly from simple averaging methods, with Task Arithmetic achieving 45.80% accuracy on MedQA, outperforming complex pruning-based approaches. These findings offer critical insights for the deployment of distributed medical AI in resource-constrained IoT environments, where computational efficiency and model compatibility are paramount. Our work establishes that for architecturally compatible models, simple averaging provides a robust and computationally efficient baseline for knowledge consolidation, offering a pragmatic path forward for scalable medical AI systems.

A Novel Hierarchical Integration Method for Efficient Model Merging in Medical LLMs

TL;DR

This work addresses the challenge of efficiently consolidating medical expertise across distributed edge settings by evaluating six parameter-space merging techniques on architecturally compatible medical LLMs. It introduces a novel Hierarchical Cosine-OT-LERP method that combines task-vector similarity with selective attention-head alignment to mitigate permutation variance while preserving edge-deployment efficiency. Across five medical benchmarks, simple merging methods—especially Task Arithmetic and Linear Averaging—consistently outperform complex approaches, achieving up to 45.80% accuracy on MedQA and often surpassing the base model on QA tasks. The findings suggest a practical path for privacy-preserving, scalable medical AI in IoT-enabled environments, highlighting compatibility-aware design and favoring lightweight merging baselines over retraining in resource-constrained settings.

Abstract

Large Language Models (LLMs) face significant challenges in distributed healthcare, including consolidating specialized domain knowledge across institutions while maintaining privacy, reducing computational overhead, and preventing catastrophic forgetting during model updates.This paper presents a systematic evaluation of six parameter-space merging techniques applied to two architecturally compatible medical LLMs derived from the Mistral-7B base model. We introduce a novel hierarchical method that combines selective Optimal Transport (OT) alignment for attention layers with cosine similarity-weighted interpolation, designed to address permutation variance while minimizing computational overhead for edge deployment scenarios. Our study evaluates Task Arithmetic, Linear Averaging, DARE-TIES, DELLA, Breadcrumbs, and our Hierarchical approach across five medical benchmarks. Results demonstrate that architecturally compatible models benefit significantly from simple averaging methods, with Task Arithmetic achieving 45.80% accuracy on MedQA, outperforming complex pruning-based approaches. These findings offer critical insights for the deployment of distributed medical AI in resource-constrained IoT environments, where computational efficiency and model compatibility are paramount. Our work establishes that for architecturally compatible models, simple averaging provides a robust and computationally efficient baseline for knowledge consolidation, offering a pragmatic path forward for scalable medical AI systems.

Paper Structure

This paper contains 25 sections, 3 equations, 4 figures, 4 tables, 2 algorithms.

Figures (4)

  • Figure 1: Three-stage workflow architecture for healthcare model merging.
  • Figure 2: Average accuracy ranking of all models across the five evaluated benchmarks.
  • Figure 3: Accuracy Point Difference vs. Base Model.
  • Figure 4: Normalized performance profile comparing key models (Best=1.0 on each axis)