Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models

Yongxian Wei; Zixuan Hu; Li Shen; Zhenyi Wang; Yu Li; Chun Yuan; Dacheng Tao

Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models

Yongxian Wei, Zixuan Hu, Li Shen, Zhenyi Wang, Yu Li, Chun Yuan, Dacheng Tao

TL;DR

This work addresses data-free meta-learning (DFML) when pre-trained models are heterogeneous. It reveals a heterogeneity-homogeneity trade-off where diverse models can both regularize and interfere with learning shared representations, and it proposes Task Groupings Regularization to exploit heterogeneity while mitigating conflicts. The method groups dissimilar pre-trained models via task-space embeddings computed from the Fisher Information Matrix and applies implicit gradient regularization (IGR) within each group to align gradient directions across tasks. Empirical results across CIFAR-FS, miniImageNet, and CUB, including multi-domain and multi-architecture settings, show that the approach consistently surpasses baselines, demonstrating robust generalization and practical impact for data-free meta-learning in heterogeneous environments.

Abstract

Data-Free Meta-Learning (DFML) aims to derive knowledge from a collection of pre-trained models without accessing their original data, enabling the rapid adaptation to new unseen tasks. Current methods often overlook the heterogeneity among pre-trained models, which leads to performance degradation due to task conflicts. In this paper, we empirically and theoretically identify and analyze the model heterogeneity in DFML. We find that model heterogeneity introduces a heterogeneity-homogeneity trade-off, where homogeneous models reduce task conflicts but also increase the overfitting risk. Balancing this trade-off is crucial for learning shared representations across tasks. Based on our findings, we propose Task Groupings Regularization that benefits from model heterogeneity by grouping and aligning conflicting tasks. Specifically, we embed pre-trained models into a task space to compute dissimilarity, and group heterogeneous models together based on this measure. Then, we introduce implicit gradient regularization within each group to mitigate potential conflicts. By encouraging a gradient direction suitable for all tasks, the meta-model captures shared representations that generalize across tasks. Comprehensive experiments showcase the superiority of our approach in multiple benchmarks, effectively tackling the model heterogeneity in challenging multi-domain and multi-architecture scenarios.

Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models

TL;DR

Abstract

Paper Structure (27 sections, 4 theorems, 20 equations, 6 figures, 8 tables, 2 algorithms)

This paper contains 27 sections, 4 theorems, 20 equations, 6 figures, 8 tables, 2 algorithms.

Introduction
Related Work
Revisit DFML
Problem Setup of DFML
Rethinking Model Heterogeneity in DFML
Methodology
Heterogeneous Pre-trained Models Grouping
Conflicting Task Regularization
Experiments
Experimental Setup
Main Results
Comparisons with baselines.
Multi-domain scenario.
Multi-architecture scenario.
Ablation Studies
...and 12 more sections

Key Result

Theorem 3.2

Assume $M_{meta}(\cdot;\boldsymbol{\theta})$ is probably approximately correct (PAC), i.e., there exists $\zeta(N,\delta)\geq0$ monotonically decreasing with $N$, and the loss function $\ell(\cdot)$ is $K$-Lipschitz continuous. Then, with probability at least $1-2 \delta$ the following bounds hold: where $E=\sum_{t=\mathrm{bas}}^{\mathrm{aux}}\mathbb{E}_{\mathcal{P}_t}[\ell\left(M_{meta}(x,\bolds

Figures (6)

Figure 1: Similarity heatmaps of pre-trained models measured by CKA. We compare (a) pre-trained models from different datasets, and (b) pre-trained models with different architectures. Coordinate axes indicate the corresponding model index, a total of 100 pre-trained models involved. Best viewed when zoomed in.
Figure 2: Relationships between the model heterogeneity and Accuracy Gain. We select a pre-trained Conv4 as the basic model, combining it with an auxiliary pre-trained model to assess the AG.
Figure 3: DFML training pipeline. We embed pre-trained models $\mathcal{M}_{pool}$ into a task space to compute dissimilarity, and divide heterogeneous models into task groups. Then, conflicting task regularization is introduced within each group to train the meta-model for new tasks.
Figure 4: Model dissimilarity measured by FIM accurately reflects the model heterogeneity. We assess model heterogeneity by overlapping training classes and distinct model architectures.
Figure 5: The gradient discrepancy across tasks. We plot the progression of gradient regularizer loss and gradient cosine similarity during training, confirming the efficacy of IGR in implicitly minimizing the gradient regularizer and aligning gradient directions.
...and 1 more figures

Theorems & Definitions (7)

Definition 3.1
Theorem 3.2
Theorem 4.1
Theorem 3.1
proof
Theorem 3.2
proof

Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models

TL;DR

Abstract

Task Groupings Regularization: Data-Free Meta-Learning with Heterogeneous Pre-trained Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (7)