Table of Contents
Fetching ...

FREE: Faster and Better Data-Free Meta-Learning

Yongxian Wei, Zixuan Hu, Zhenyi Wang, Li Shen, Chun Yuan, Dacheng Tao

TL;DR

The paper addresses data-free meta-learning when training data is unavailable, focusing on efficiency and model heterogeneity. It proposes FREE, combining a meta-generator (FIve) that rapidly adapts to each pre-trained model in $k$ steps and a gradient-aligned meta-learner (BelL) that uses implicit gradient alignment and cross-task distillation to generalize to unseen tasks. Empirical results on mini-ImageNet, CIFAR-FS, and CUB show a ~20× speed-up in data recovery and consistent accuracy gains (approximately $1.42$–$4.78\%$) over state-of-the-art, including robust performance in multi-domain and multi-architecture settings. The approach advances privacy-preserving meta-learning by enabling fast reconstruction of task distributions across heterogeneous model pools and learning task-invariant representations for unseen tasks.

Abstract

Data-Free Meta-Learning (DFML) aims to extract knowledge from a collection of pre-trained models without requiring the original data, presenting practical benefits in contexts constrained by data privacy concerns. Current DFML methods primarily focus on the data recovery from these pre-trained models. However, they suffer from slow recovery speed and overlook gaps inherent in heterogeneous pre-trained models. In response to these challenges, we introduce the Faster and Better Data-Free Meta-Learning (FREE) framework, which contains: (i) a meta-generator for rapidly recovering training tasks from pre-trained models; and (ii) a meta-learner for generalizing to new unseen tasks. Specifically, within the module Faster Inversion via Meta-Generator, each pre-trained model is perceived as a distinct task. The meta-generator can rapidly adapt to a specific task in just five steps, significantly accelerating the data recovery. Furthermore, we propose Better Generalization via Meta-Learner and introduce an implicit gradient alignment algorithm to optimize the meta-learner. This is achieved as aligned gradient directions alleviate potential conflicts among tasks from heterogeneous pre-trained models. Empirical experiments on multiple benchmarks affirm the superiority of our approach, marking a notable speed-up (20$\times$) and performance enhancement (1.42%$\sim$4.78%) in comparison to the state-of-the-art.

FREE: Faster and Better Data-Free Meta-Learning

TL;DR

The paper addresses data-free meta-learning when training data is unavailable, focusing on efficiency and model heterogeneity. It proposes FREE, combining a meta-generator (FIve) that rapidly adapts to each pre-trained model in steps and a gradient-aligned meta-learner (BelL) that uses implicit gradient alignment and cross-task distillation to generalize to unseen tasks. Empirical results on mini-ImageNet, CIFAR-FS, and CUB show a ~20× speed-up in data recovery and consistent accuracy gains (approximately ) over state-of-the-art, including robust performance in multi-domain and multi-architecture settings. The approach advances privacy-preserving meta-learning by enabling fast reconstruction of task distributions across heterogeneous model pools and learning task-invariant representations for unseen tasks.

Abstract

Data-Free Meta-Learning (DFML) aims to extract knowledge from a collection of pre-trained models without requiring the original data, presenting practical benefits in contexts constrained by data privacy concerns. Current DFML methods primarily focus on the data recovery from these pre-trained models. However, they suffer from slow recovery speed and overlook gaps inherent in heterogeneous pre-trained models. In response to these challenges, we introduce the Faster and Better Data-Free Meta-Learning (FREE) framework, which contains: (i) a meta-generator for rapidly recovering training tasks from pre-trained models; and (ii) a meta-learner for generalizing to new unseen tasks. Specifically, within the module Faster Inversion via Meta-Generator, each pre-trained model is perceived as a distinct task. The meta-generator can rapidly adapt to a specific task in just five steps, significantly accelerating the data recovery. Furthermore, we propose Better Generalization via Meta-Learner and introduce an implicit gradient alignment algorithm to optimize the meta-learner. This is achieved as aligned gradient directions alleviate potential conflicts among tasks from heterogeneous pre-trained models. Empirical experiments on multiple benchmarks affirm the superiority of our approach, marking a notable speed-up (20) and performance enhancement (1.42%4.78%) in comparison to the state-of-the-art.
Paper Structure (13 sections, 2 theorems, 6 equations, 7 figures, 5 tables, 3 algorithms)

This paper contains 13 sections, 2 theorems, 6 equations, 7 figures, 5 tables, 3 algorithms.

Key Result

Lemma 1

If $\mathcal{L}_{KD}$ has Lipschitz Hessian, then: where $\alpha$ is the step size of the inner loop.

Figures (7)

  • Figure 1: Faster Inversion via Meta-Generator significantly enhances the efficiency of task generation. Tasks recovered from pre-trained models are used for training in the data-free setting. For each task, prior works need to train a specific generator with hundreds of generate-forward-backward iterations, while we only need a 5-step adaptation using the single meta-generator.
  • Figure 2: Pre-trained models from different domains inherently exhibit distribution differences. Pre-trained models, even when trained on different classes of the same dataset, display variations in performance quality. As a result, their recovered tasks naturally present a gap in the distribution. Overlooking such model heterogeneity will cause the meta-learner to bias towards specific tasks, leading to local optima. Our proposed BelL optimizes the meta-learner by encouraging a positive inner product of gradients across tasks, thus enhancing its generalization ability.
  • Figure 3: An illustration of the proposed DFML framework. The framework consists of multiple pre-trained models ($\mathcal{M}_{pool}$), a meta-generator and a meta-learner. The model inversion loss ($\mathcal{L}_G$) optimizes the meta-generator, while the knowledge distillation loss ($\mathcal{L}_{KD}$) optimizes the meta-learner. After adapting to pre-trained models, the meta-generator recovers specific tasks. The meta-learner learns from recovered tasks and their respective pre-trained models by multi-task knowledge distillation.
  • Figure 4: The $t$-SNE visualization of same samples input into different pre-trained models.
  • Figure 5: The gradient inner product across tasks from different pre-trained models.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Lemma 1
  • Theorem 1