BACON: Bayesian Optimal Condensation Framework for Dataset Distillation

Zheng Zhou; Hongbo Zhao; Guangliang Cheng; Xiangtai Li; Shuchang Lyu; Wenquan Feng; Qi Zhao

BACON: Bayesian Optimal Condensation Framework for Dataset Distillation

Zheng Zhou, Hongbo Zhao, Guangliang Cheng, Xiangtai Li, Shuchang Lyu, Wenquan Feng, Qi Zhao

TL;DR

Dataset distillation seeks to compress large training sets into small synthetic sets without sacrificing test accuracy. This paper introduces BACON, a Bayesian optimal condensation framework that treats DD as minimizing an expected risk $R(\phi)$ over joint output distributions and derives a numerically feasible lower bound via a spherical-integral formulation. It provides a practical approximate solution using Monte Carlo sampling, a Gaussian likelihood, and a TV-CLIP prior, resulting in an overall loss $\mathcal{L}_{TOTAL}$ with a tunable parameter $\lambda$ and a plug-in Algorithm 1. Extensive experiments across MNIST to TinyImageNet show BACON consistently outperforms state-of-the-art methods (e.g., IDM/DM) under various IPC regimes, validating both the theory and its practical effectiveness and revealing clear directions for future scaling to high-resolution data. Overall, BACON establishes a principled Bayesian foundation for DD and offers a scalable, effective approach that integrates with existing methods to boost distillation performance.

Abstract

Dataset Distillation (DD) aims to distill knowledge from extensive datasets into more compact ones while preserving performance on the test set, thereby reducing storage costs and training expenses. However, existing methods often suffer from computational intensity, particularly exhibiting suboptimal performance with large dataset sizes due to the lack of a robust theoretical framework for analyzing the DD problem. To address these challenges, we propose the BAyesian optimal CONdensation framework (BACON), which is the first work to introduce the Bayesian theoretical framework to the literature of DD. This framework provides theoretical support for enhancing the performance of DD. Furthermore, BACON formulates the DD problem as the minimization of the expected risk function in joint probability distributions using the Bayesian framework. Additionally, by analyzing the expected risk function for optimal condensation, we derive a numerically feasible lower bound based on specific assumptions, providing an approximate solution for BACON. We validate BACON across several datasets, demonstrating its superior performance compared to existing state-of-the-art methods. For instance, under the IPC-10 setting, BACON achieves a 3.46% accuracy gain over the IDM method on the CIFAR-10 dataset and a 3.10% gain on the TinyImageNet dataset. Our extensive experiments confirm the effectiveness of BACON and its seamless integration with existing methods, thereby enhancing their performance for the DD task. Code and distilled datasets are available at BACON.

BACON: Bayesian Optimal Condensation Framework for Dataset Distillation

TL;DR

over joint output distributions and derives a numerically feasible lower bound via a spherical-integral formulation. It provides a practical approximate solution using Monte Carlo sampling, a Gaussian likelihood, and a TV-CLIP prior, resulting in an overall loss

with a tunable parameter

and a plug-in Algorithm 1. Extensive experiments across MNIST to TinyImageNet show BACON consistently outperforms state-of-the-art methods (e.g., IDM/DM) under various IPC regimes, validating both the theory and its practical effectiveness and revealing clear directions for future scaling to high-resolution data. Overall, BACON establishes a principled Bayesian foundation for DD and offers a scalable, effective approach that integrates with existing methods to boost distillation performance.

Abstract

Paper Structure (44 sections, 4 theorems, 24 equations, 10 figures, 3 tables, 1 algorithm)

This paper contains 44 sections, 4 theorems, 24 equations, 10 figures, 3 tables, 1 algorithm.

Introduction
Related Work
Dataset Pruning
Dataset Distillation (DD)
Bayesian Framework for Matching Gradients Method
Bayesian Optimal Condensation Framework
Motivation
Expected Risk Function in Joint Probability Distribution
Bayesian Optimal Condensation Risk Function
Approximating the Optimal Solution for Bayesian Condensation
Overall Loss Function and Pseudocode
Experimental Evaluation
Experiment Setup
Comparison to the State-of-the-art Methods
Analysis
...and 29 more sections

Key Result

Theorem 3.4

The expected risk function in a joint probability distribution can also be calculated as follows (Proof in Appendix app:proofs:t1):

Figures (10)

Figure 1: Comparison between our method and previous methods: (a) Existing state-of-the-art DD methods typically rely on a common paradigm involving the alignment of gradients b28 and distributions b11b56 computed by neural networks on both original and synthetic datasets. (b) In contrast, our BACON method transfers the DD task into the Bayesian optimization problem and generates synthetic images by assessing the likelihood and prior probabilities.
Figure 2: The framework of proposed BACON: The neural networks output the distribution after processing both synthetic and real datasets. Subsequently, we formulate the distribution between the synthetic dataset and real datasets as the Bayesian optimal condensation risk function (refer to Section \ref{['sec:3.2']}). The optimal solution of the risk function is derived using the Bayesian formula (refer to Section \ref{['sec:3.3']}). To obtain the approximated solution of BACON, we introduce two assumptions (refer to Section \ref{['sec:3.4']}), and outline the entire algorithm of BACON in Algorithm \ref{['alg1']} (refer to Section \ref{['sec:3.5']}).
Figure 3: Performance comparison with BACON, IDM, and DM across varying training steps on the CIFAR-10/100 datasets: The blue line with white circles represents our proposed BACON, the orange line with white circles represents IDM, and the green line with white circles represents DM. All synthetic images are generated using the CIFAR-10/100 datasets across training steps from 0 to 20000 with IPC-1, IPC-10, and IPC-50, respectively.
Figure 4: Ablation study of diverse hyperparameters: Sampling diverse hyperparameters from $\lambda = [0.0,1.0]$ and obtaining the effectiveness of diverse $\lambda$ on the test accuracy with the CIFAR-10 dataset and 50 images per class (IPC-50).
Figure 5: Visualization of diverse hyperparameters: Visualizing synthetic images generated with diverse hyperparameters on CIFAR-10 test accuracy, using 50 images per class (IPC-50). The left image of a pair represents an airplane, and the right one represents an automobile.
...and 5 more figures

Theorems & Definitions (16)

Definition 3.1: Similarity Indicator of $\epsilon$-neighborhood
Definition 3.2: Expected Risk Function
Definition 3.3: Sphere Integral Function
Theorem 3.4
Remark 3.5
Theorem 3.6
Remark 3.7
Definition A.1: Similarity Indicator of $\epsilon$-neighborhood
Definition A.2: Expected Risk Function
Definition A.3: Sphere Integral Function
...and 6 more

BACON: Bayesian Optimal Condensation Framework for Dataset Distillation

TL;DR

Abstract

BACON: Bayesian Optimal Condensation Framework for Dataset Distillation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (16)