Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion

Bowen Tian; Songning Lai; Yutao Yue

Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion

Bowen Tian, Songning Lai, Yutao Yue

TL;DR

AutoFusion presents an unsupervised, end-to-end framework for dynamic parameter fusion between identical-architecture models trained on disjoint tasks. It learns layer-wise parameter permutations via a differentiable Sinkhorn-based mechanism, guided by an alignment loss $\mathcal{L}_{align}$ and a retention loss $\mathcal{L}_{retain}$, with pseudo-label supervision to preserve multitask capabilities. Across MNIST, Fashion-MNIST, KMNIST and CIFAR-10 with MLP and CNN backbones, AutoFusion surpasses Weight Interpolation, Git Re-Basin, and ZipIt in joint accuracy, and ablations confirm the necessity of normalization and joint optimization. The method demonstrates scalability and flexibility for integrating diverse task models without requiring pre-trained checkpoints or labeled data, paving the way for more versatile multi-task architectures.

Abstract

In the rapidly evolving field of deep learning, specialized models have driven significant advancements in tasks such as computer vision and natural language processing. However, this specialization leads to a fragmented ecosystem where models lack the adaptability for broader applications. To overcome this, we introduce AutoFusion, an innovative framework that fuses distinct model parameters(with the same architecture) for multi-task learning without pre-trained checkpoints. Using an unsupervised, end-to-end approach, AutoFusion dynamically permutes model parameters at each layer, optimizing the combination through a loss-minimization process that does not require labeled data. We validate AutoFusion's effectiveness through experiments on commonly used benchmark datasets, demonstrating superior performance over established methods like Weight Interpolation, Git Re-Basin, and ZipIt. Our framework offers a scalable and flexible solution for model integration, positioning it as a powerful tool for future research and practical applications.

Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion

TL;DR

and a retention loss

, with pseudo-label supervision to preserve multitask capabilities. Across MNIST, Fashion-MNIST, KMNIST and CIFAR-10 with MLP and CNN backbones, AutoFusion surpasses Weight Interpolation, Git Re-Basin, and ZipIt in joint accuracy, and ablations confirm the necessity of normalization and joint optimization. The method demonstrates scalability and flexibility for integrating diverse task models without requiring pre-trained checkpoints or labeled data, paving the way for more versatile multi-task architectures.

Abstract

Paper Structure (36 sections, 29 equations, 9 figures, 9 tables)

This paper contains 36 sections, 29 equations, 9 figures, 9 tables.

Introduction
Preliminary
Weight Interpolation
Re-basin
Model Zip
AutoFusion
From Rule-based to End-to-end
Design of Optimization Targets
Results
Comparison with other Methods
Ablation Study And Optimization Strategies
Fusion of Task Models with Different Distributions
Visualization
Limitation
Conclusion
...and 21 more sections

Figures (9)

Figure 1: Hand-designed fusion algorithms tend to rely on a priori, resulting in lower flexibility and suboptimal results, but our goal is to build a data-driven, learnable fusion algorithm to approximate the optimal solution.
Figure 2: This is an overview of our AutoFusion methodology, implementation details can be found in \ref{['autofusion']}
Figure 3: The interpolation test of each model on task A and task B after parameter fusion is carried out through the permutation matrices learned from different optimization objectives.
Figure 4: The interpolation test on the joint dataset.
Figure 5: Export the trained permutation matrix and compare it with Git Re-Basin Method.
...and 4 more figures

Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion

TL;DR

Abstract

Wolf2Pack: The AutoFusion Framework for Dynamic Parameter Fusion

Authors

TL;DR

Abstract

Table of Contents

Figures (9)