Architecture, Dataset and Model-Scale Agnostic Data-free Meta-Learning

Zixuan Hu; Li Shen; Zhenyi Wang; Tongliang Liu; Chun Yuan; Dacheng Tao

Architecture, Dataset and Model-Scale Agnostic Data-free Meta-Learning

Zixuan Hu, Li Shen, Zhenyi Wang, Tongliang Liu, Chun Yuan, Dacheng Tao

TL;DR

PURER tackles data-free meta-learning by leveraging data knowledge distilled from pre-trained models, not just parameter estimates. It introduces Episode Curriculum Inversion (ECI) to progressively generate harder pseudo-episodes during meta training and Inversion Calibration following Inner Loop (ICFIL) to align meta-test adaptation with training-time distributions. Empirical results across SS, SH, and MH scenarios on CIFAR-FS and MiniImageNet show substantial gains over baselines, demonstrating robustness to architecture and dataset heterogeneity. By making DFML architecture-, dataset-, and model-scale-agnostic, PURER broadens practical applicability while reducing reliance on real training data.

Abstract

The goal of data-free meta-learning is to learn useful prior knowledge from a collection of pre-trained models without accessing their training data. However, existing works only solve the problem in parameter space, which (i) ignore the fruitful data knowledge contained in the pre-trained models; (ii) can not scale to large-scale pre-trained models; (iii) can only meta-learn pre-trained models with the same network architecture. To address those issues, we propose a unified framework, dubbed PURER, which contains: (1) ePisode cUrriculum inveRsion (ECI) during data-free meta training; and (2) invErsion calibRation following inner loop (ICFIL) during meta testing. During meta training, we propose ECI to perform pseudo episode training for learning to adapt fast to new unseen tasks. Specifically, we progressively synthesize a sequence of pseudo episodes by distilling the training data from each pre-trained model. The ECI adaptively increases the difficulty level of pseudo episodes according to the real-time feedback of the meta model. We formulate the optimization process of meta training with ECI as an adversarial form in an end-to-end manner. During meta testing, we further propose a simple plug-and-play supplement-ICFIL-only used during meta testing to narrow the gap between meta training and meta testing task distribution. Extensive experiments in various real-world scenarios show the superior performance of ours.

Architecture, Dataset and Model-Scale Agnostic Data-free Meta-Learning

TL;DR

Abstract

Paper Structure (18 sections, 7 equations, 5 figures, 8 tables, 2 algorithms)

This paper contains 18 sections, 7 equations, 5 figures, 8 tables, 2 algorithms.

Introduction
Related Works
Problem Setup
Data-free Meta Learning Setup
Meta Testing
Methodology
Preliminary: Episode Training
Episode Curriculum Inversion (ECI)
Inversion Calibration following Inner Loop
Experiments
Experiments Setup
Experiments of DFML in SS
Experiments of DFML in SH
Experiments of DFML in MH
Ablation Study
...and 3 more sections

Figures (5)

Figure 1: Episode Curriculum Inversion can improve the efficiency of pseudo episode training. At each episode, EI may repeatedly synthesize the tasks already learned well, while ECI only synthesizes harder tasks not learned yet.
Figure 2: Task-distribution shift between meta training and testing. The pseudo data (as visualized in fang2021contrastive) distilled from pre-trained models only contains partial semantic information learned by pre-trained models.
Figure 3: The overall pipeline of our proposed PURER consisting of ECI and ICFIL. For each episode during meta training, a pseudo episode is sampled from the dynamic dataset. The split pseudo support set and query set are used for the inner loop and outer loop of meta-learning. The real-time feedback of meta model controls the Gradient Switch. When the feedback is positive, the dynamic dataset is updated to synthesize harder tasks with larger outer loss for the next iteration by minimizing the reversed outer loss through gradient descent. During meta testing, the adapted base model after inner loop is calibrated via ICFIL. For brevity, we leave out the calibration for linear classifier head.
Figure 4: Effect of curriculum mechanism on testing performance on CIFAR-FS in SS scenario. Solid curve: smoothed performance curve. Transparent curve: original performance curve.
Figure 5: (left) pseudo images synthesized from Conv4. (right) pseudo images synthesized from ResNet-18. Each column corresponds to one class.

Architecture, Dataset and Model-Scale Agnostic Data-free Meta-Learning

TL;DR

Abstract

Architecture, Dataset and Model-Scale Agnostic Data-free Meta-Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (5)