Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion
Bowen Tian, Songning Lai, Yutao Yue
TL;DR
AutoFusion presents an unsupervised, end-to-end framework for dynamic parameter fusion between identical-architecture models trained on disjoint tasks. It learns layer-wise parameter permutations via a differentiable Sinkhorn-based mechanism, guided by an alignment loss $\mathcal{L}_{align}$ and a retention loss $\mathcal{L}_{retain}$, with pseudo-label supervision to preserve multitask capabilities. Across MNIST, Fashion-MNIST, KMNIST and CIFAR-10 with MLP and CNN backbones, AutoFusion surpasses Weight Interpolation, Git Re-Basin, and ZipIt in joint accuracy, and ablations confirm the necessity of normalization and joint optimization. The method demonstrates scalability and flexibility for integrating diverse task models without requiring pre-trained checkpoints or labeled data, paving the way for more versatile multi-task architectures.
Abstract
In the rapidly evolving field of deep learning, specialized models have driven significant advancements in tasks such as computer vision and natural language processing. However, this specialization leads to a fragmented ecosystem where models lack the adaptability for broader applications. To overcome this, we introduce AutoFusion, an innovative framework that fuses distinct model parameters(with the same architecture) for multi-task learning without pre-trained checkpoints. Using an unsupervised, end-to-end approach, AutoFusion dynamically permutes model parameters at each layer, optimizing the combination through a loss-minimization process that does not require labeled data. We validate AutoFusion's effectiveness through experiments on commonly used benchmark datasets, demonstrating superior performance over established methods like Weight Interpolation, Git Re-Basin, and ZipIt. Our framework offers a scalable and flexible solution for model integration, positioning it as a powerful tool for future research and practical applications.
