Table of Contents
Fetching ...

PRISM: Diversifying Dataset Distillation by Decoupling Architectural Priors

Brian B. Moser, Shalini Sarode, Federico Raue, Stanislav Frolov, Krzysztof Adamkiewicz, Arundhati Shanbhag, Joachim Folz, Tobias C. Nauen, Andreas Dengel

TL;DR

PRISM tackles the lack of intra-class diversity in dataset distillation by decoupling architectural priors: logit supervision is handled by a primary teacher, while BN alignment is supervised by a diverse set of BN teachers. This dual- and multi-teacher alignment introduces multiple world views into synthesis, yielding richer, more generalizable synthetic data on ImageNet-1K and achieving state-of-the-art results at higher IPCs. The work demonstrates that diversification through architectural priors improves both performance and diversity, and it provides scalable batch formation and teacher-selection strategies, along with thorough ablations on recovery and post-recovery steps. Collectively, PRISM establishes architectural decoupling as an orthogonal, scalable axis for advancing dataset distillation toward robust, privacy-preserving, large-scale applications.

Abstract

Dataset distillation (DD) promises compact yet faithful synthetic data, but existing approaches often inherit the inductive bias of a single teacher model. As dataset size increases, this bias drives generation toward overly smooth, homogeneous samples, reducing intra-class diversity and limiting generalization. We present PRISM (PRIors from diverse Source Models), a framework that disentangles architectural priors during synthesis. PRISM decouples the logit-matching and regularization objectives, supervising them with different teacher architectures: a primary model for logits and a stochastic subset for batch-normalization (BN) alignment. On ImageNet-1K, PRISM consistently and reproducibly outperforms single-teacher methods (e.g., SRe2L) and recent multi-teacher variants (e.g., G-VBSM) at low- and mid-IPC regimes. The generated data also show significantly richer intra-class diversity, as reflected by a notable drop in cosine similarity between features. We further analyze teacher selection strategies (pre- vs. intra-distillation) and introduce a scalable cross-class batch formation scheme for fast parallel synthesis. Code will be released after the review period.

PRISM: Diversifying Dataset Distillation by Decoupling Architectural Priors

TL;DR

PRISM tackles the lack of intra-class diversity in dataset distillation by decoupling architectural priors: logit supervision is handled by a primary teacher, while BN alignment is supervised by a diverse set of BN teachers. This dual- and multi-teacher alignment introduces multiple world views into synthesis, yielding richer, more generalizable synthetic data on ImageNet-1K and achieving state-of-the-art results at higher IPCs. The work demonstrates that diversification through architectural priors improves both performance and diversity, and it provides scalable batch formation and teacher-selection strategies, along with thorough ablations on recovery and post-recovery steps. Collectively, PRISM establishes architectural decoupling as an orthogonal, scalable axis for advancing dataset distillation toward robust, privacy-preserving, large-scale applications.

Abstract

Dataset distillation (DD) promises compact yet faithful synthetic data, but existing approaches often inherit the inductive bias of a single teacher model. As dataset size increases, this bias drives generation toward overly smooth, homogeneous samples, reducing intra-class diversity and limiting generalization. We present PRISM (PRIors from diverse Source Models), a framework that disentangles architectural priors during synthesis. PRISM decouples the logit-matching and regularization objectives, supervising them with different teacher architectures: a primary model for logits and a stochastic subset for batch-normalization (BN) alignment. On ImageNet-1K, PRISM consistently and reproducibly outperforms single-teacher methods (e.g., SRe2L) and recent multi-teacher variants (e.g., G-VBSM) at low- and mid-IPC regimes. The generated data also show significantly richer intra-class diversity, as reflected by a notable drop in cosine similarity between features. We further analyze teacher selection strategies (pre- vs. intra-distillation) and introduce a scalable cross-class batch formation scheme for fast parallel synthesis. Code will be released after the review period.

Paper Structure

This paper contains 27 sections, 9 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: UMAP visualization of synthetic images from ImageNet-1K (10 classes), comparing SRe2L with our proposed multi-teacher alignment. Our approach, PRISM, generates significantly greater intra-class diversity, contrasting the overly uniform clusters of SRe2L that can lead to model overfitting more easily.
  • Figure 2: The core idea behind PRISM (PRIors from diverse Source Models): Use multiple, diverse models for decoupling the logit maximization and regularization through BN alignment instead of one, like in SRe2L and related work.
  • Figure 3: Batch formation and optimization strategies. (Left) Methods like G-VBSM, EDC, and DELT optimize jointly over all classes simultaneously. (Right) Methods like our PRISM and SRe2L process each IPC index independently.
  • Figure 4: Intra-class semantic cosine similarity with a pretrained ResNet-18 model on ImageNet-1K dataset applied on the respective distilled images showing higher diversity as indicated by lower mean values and higher variance.
  • Figure 5: Qualitative comparison of synthetic images from ImageNet-1K generated by SRe2L and PRISM. Both methods start from the exact same initial real images to ensure a fair comparison. The images generated by SRe2L (left) exhibit significant homogeneity, with samples within each class (goldfish, rooster, shark, frog) converging to similar colors and textures. In contrast, PRISM (right) produces a wider variety of contexts and colorations.