Table of Contents
Fetching ...

Distilling Dataset into Neural Field

Donghyeok Shin, HeeSun Bae, Gyuwon Sim, Wanmo Kang, Il-Chul Moon

TL;DR

DDiF tackles the cost of large-scale data by distilling information into synthetic neural fields that decode via continuous coordinates, enabling resolution-agnostic and modality-wide data synthesis. It formalizes a coordinate-based neural-field parameterization, proves theoretic expressiveness advantages over prior methods, and demonstrates state-of-the-art performance across image, video, audio, and 3D datasets under tight storage budgets. The approach yields strong cross-architecture generalization and robust cross-resolution performance, making distilled data more versatile for downstream training. The work provides practical benefits for scalable training and broadens the applicability of dataset distillation, with code available for reproduction.

Abstract

Utilizing a large-scale dataset is essential for training high-performance deep learning models, but it also comes with substantial computation and storage costs. To overcome these challenges, dataset distillation has emerged as a promising solution by compressing the large-scale dataset into a smaller synthetic dataset that retains the essential information needed for training. This paper proposes a novel parameterization framework for dataset distillation, coined Distilling Dataset into Neural Field (DDiF), which leverages the neural field to store the necessary information of the large-scale dataset. Due to the unique nature of the neural field, which takes coordinates as input and output quantity, DDiF effectively preserves the information and easily generates various shapes of data. We theoretically confirm that DDiF exhibits greater expressiveness than some previous literature when the utilized budget for a single synthetic instance is the same. Through extensive experiments, we demonstrate that DDiF achieves superior performance on several benchmark datasets, extending beyond the image domain to include video, audio, and 3D voxel. We release the code at https://github.com/aailab-kaist/DDiF.

Distilling Dataset into Neural Field

TL;DR

DDiF tackles the cost of large-scale data by distilling information into synthetic neural fields that decode via continuous coordinates, enabling resolution-agnostic and modality-wide data synthesis. It formalizes a coordinate-based neural-field parameterization, proves theoretic expressiveness advantages over prior methods, and demonstrates state-of-the-art performance across image, video, audio, and 3D datasets under tight storage budgets. The approach yields strong cross-architecture generalization and robust cross-resolution performance, making distilled data more versatile for downstream training. The work provides practical benefits for scalable training and broadens the applicability of dataset distillation, with code available for reproduction.

Abstract

Utilizing a large-scale dataset is essential for training high-performance deep learning models, but it also comes with substantial computation and storage costs. To overcome these challenges, dataset distillation has emerged as a promising solution by compressing the large-scale dataset into a smaller synthetic dataset that retains the essential information needed for training. This paper proposes a novel parameterization framework for dataset distillation, coined Distilling Dataset into Neural Field (DDiF), which leverages the neural field to store the necessary information of the large-scale dataset. Due to the unique nature of the neural field, which takes coordinates as input and output quantity, DDiF effectively preserves the information and easily generates various shapes of data. We theoretically confirm that DDiF exhibits greater expressiveness than some previous literature when the utilized budget for a single synthetic instance is the same. Through extensive experiments, we demonstrate that DDiF achieves superior performance on several benchmark datasets, extending beyond the image domain to include video, audio, and 3D voxel. We release the code at https://github.com/aailab-kaist/DDiF.

Paper Structure

This paper contains 66 sections, 7 theorems, 13 equations, 18 figures, 20 tables, 2 algorithms.

Key Result

Proposition 3.0

Consider two functions $g_1, g_2$ where $g_i:\mathcal{Z}_i\rightarrow\mathbb{R}^{D}$ for $i=1,2$. Also, consider two matrix variables $Z_i\coloneqq[z_{i1},...,z_{iM}]$ where their columns $z_{ij}\in\mathcal{Z}_i$ for $i=1,2$ and $j=1,...,M$. We denotes $\widehat{g_i}(Z_i)\coloneqq[g_i(z_{i1}),...,g_

Figures (18)

  • Figure 1: Overview of DDiF. Each decoded synthetic instance is constructed by the output of each synthetic neural field $F_\psi$ by inputting coordinate set $\mathcal{C}$ (left). DDiF optimizes only the parameters $\psi$, as coordinate set $\mathcal{C}$ does not require optimization or storage. Also, DDiF is capable of encoding grid-based data from various modalities. In the evaluation stage (right), DDiF can decode the data with sizes that were not encountered during the distillation stage by adjusting the input coordinates.
  • Figure 2: Illustration of structural difference between conventional decoding function and neural field.
  • Figure 3: Performance curve on (a) reconstruction task and (b) dataset distillation under the same utilized budget. Pearson correlation coefficient of reconstruction error and test accuracy is $-0.89$.
  • Figure 4: Test accuracies (%) on Video domain. Each black line denotes the same number of decoded instances per class, 1 and 5, respectively.
  • Figure 5: (a) Test accuracies (%) with different image resolutions. The original resolution is $128\times128$. (b) Test accuracy gap (%) from original resolution. We use bilinear interpolation for previous studies.
  • ...and 13 more figures

Theorems & Definitions (10)

  • Proposition 3.0
  • Theorem 3.1
  • Proposition A.0
  • proof
  • Corollary A.0
  • Theorem A.1
  • Lemma A.2
  • proof
  • Theorem A.2
  • proof