LEWIS (LayEr WIse Sparsity) -- A Training Free Guided Model Merging Approach
Hetarth Chopra, Vidhi Rambhia, Vikram Adve
TL;DR
LEWIS tackles the challenge of data-free model merging by introducing activation-based, layer-wise sparsity guidance derived from a calibration dataset. By computing per-layer importance from activation norms and constraining pruning within bounds, LEWIS preserves essential task-specific knowledge while merging fine-tuned models with methods like TIES and DARE. Empirical results on code instruction-following and math solving demonstrate consistent improvements, including up to 11.3% FE and 11.2% SM in GSM8K-related tasks, validating the approach across merging frameworks. This work enables more effective, targeted merging that enhances task specialization without additional data or retraining, with potential extension to broader domains and architectures.
Abstract
As specialized large language models (LLMs) become increasingly prevalent, model merging methods are being used to combine them to create a single multi-task model without requiring any additional data or training. However, these approaches fall short when the objective of merging is to increase the downstream model's performance on a particular task-specific benchmark. In this work, we propose LEWIS (Layer Wise Sparsity), a guided model-merging framework that uses activation-based layer importance to dynamically adjust layer-wise task-vector sparsity required for the merge process. LEWIS uses a calibration dataset to prioritize critical layers during the task-vector pruning process required for model merging. This approach guides existing merging methods by preserving essential layer-wise task-specific knowledge while ensuring the merged model performs the best at benchmarks resembling the calibration dataset. Our experiments demonstrate the effectiveness of LEWIS with performance improvements of code instruction-following and math-solving models created through model merging up to 4 percent and 11.3 percent, respectively, outperforming unguided data-less model merging approaches that use uniform-sparsity.
