Table of Contents
Fetching ...

Beyond Pixels: Efficient Dataset Distillation via Sparse Gaussian Representation

Chenyang Jiang, Zhengcen Li, Hang Zhao, Qiben Shan, Shaocong Wu, Jingyong Su

TL;DR

This work introduces Gaussian Splatting Dataset Distillation (GSDD), a sparse, 2D Gaussian parameterization for distilled images paired with a CUDA-accelerated differentiable rasterizer. By representing each synthetic image with a small set of Gaussians and optimizing region-level parameters, GSDD achieves higher diversity and better scalability under fixed storage budgets than dense pixel or INR-based methods. The approach delivers state-of-the-art results on CIFAR-10/100 and ImageNet subsets, while offering substantial efficiency gains in encoding/decoding, rendering, and memory usage. GSDD is plug-and-play with existing distillation algorithms and demonstrates strong cross-architecture generalization, making large-scale dataset distillation more practical and scalable.

Abstract

Dataset distillation has emerged as a promising paradigm that synthesizes compact, informative datasets capable of retaining the knowledge of large-scale counterparts, thereby addressing the substantial computational and storage burdens of modern model training. Conventional approaches typically rely on dense pixel-level representations, which introduce redundancy and are difficult to scale up. In this work, we propose GSDD, a novel and efficient sparse representation for dataset distillation based on 2D Gaussians. Instead of representing all pixels equally, GSDD encodes critical discriminative information in a distilled image using only a small number of Gaussian primitives. This sparse representation could improve dataset diversity under the same storage budget, enhancing coverage of difficult samples and boosting distillation performance. To ensure both efficiency and scalability, we adapt CUDA-based splatting operators for parallel inference and training, enabling high-quality rendering with minimal computational and memory overhead. Our method is simple yet effective, broadly applicable to different distillation pipelines, and highly scalable. Experiments show that GSDD achieves state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet subsets, while remaining highly efficient encoding and decoding cost. Our code is available at https://github.com/j-cyoung/GSDatasetDistillation.

Beyond Pixels: Efficient Dataset Distillation via Sparse Gaussian Representation

TL;DR

This work introduces Gaussian Splatting Dataset Distillation (GSDD), a sparse, 2D Gaussian parameterization for distilled images paired with a CUDA-accelerated differentiable rasterizer. By representing each synthetic image with a small set of Gaussians and optimizing region-level parameters, GSDD achieves higher diversity and better scalability under fixed storage budgets than dense pixel or INR-based methods. The approach delivers state-of-the-art results on CIFAR-10/100 and ImageNet subsets, while offering substantial efficiency gains in encoding/decoding, rendering, and memory usage. GSDD is plug-and-play with existing distillation algorithms and demonstrates strong cross-architecture generalization, making large-scale dataset distillation more practical and scalable.

Abstract

Dataset distillation has emerged as a promising paradigm that synthesizes compact, informative datasets capable of retaining the knowledge of large-scale counterparts, thereby addressing the substantial computational and storage burdens of modern model training. Conventional approaches typically rely on dense pixel-level representations, which introduce redundancy and are difficult to scale up. In this work, we propose GSDD, a novel and efficient sparse representation for dataset distillation based on 2D Gaussians. Instead of representing all pixels equally, GSDD encodes critical discriminative information in a distilled image using only a small number of Gaussian primitives. This sparse representation could improve dataset diversity under the same storage budget, enhancing coverage of difficult samples and boosting distillation performance. To ensure both efficiency and scalability, we adapt CUDA-based splatting operators for parallel inference and training, enabling high-quality rendering with minimal computational and memory overhead. Our method is simple yet effective, broadly applicable to different distillation pipelines, and highly scalable. Experiments show that GSDD achieves state-of-the-art performance on CIFAR-10, CIFAR-100, and ImageNet subsets, while remaining highly efficient encoding and decoding cost. Our code is available at https://github.com/j-cyoung/GSDatasetDistillation.

Paper Structure

This paper contains 42 sections, 17 equations, 14 figures, 19 tables.

Figures (14)

  • Figure 1: Comparison of Different Distilled Image Representation
  • Figure 2: Overview of the proposed framework. Each Gaussian is parameterized by a total of 9 floating-point values. A single distilled image is represented by a set of Gaussians. During training, we first quantize the parameters to bf16 precision to obtain quantized Gaussian primitives. These are then rendered into distilled images using a customized rasterizer. During the rasterization process, we concatenate all primitives from the distilled dataset and feed them into a single batched rasterization kernel. This design enables efficient rendering and facilitates a compact data structure, as the entire distilled dataset can be managed by initializing only one object instance. To further improve rendering quality when Gaussians are sparse, we incorporate prefiltering and SuperSampling-based Anti-Aliasing techniques. These enhancements enable more accurate estimation of RGB values at each pixel. The rendered distilled images are then aligned with the original images for information matching, and the resulting gradients are backpropagated to update the Gaussian parameters.
  • Figure 3: (a) Distillation performance under different Gaussian pruning strategies as a function of the remaining Gaussian ratio; (b) Test accuracy across training epochs with different GPC (Gaussian Images Per Class) under the same storage budget; (c) Test accuracy of the distilled dataset on samples of varying difficulty under the same storage budget; (d) Relationship between prediction accuracy on samples of different difficulty and GPC under the same storage budget. For fair comparison, TM is initialized with real images and serves as a baseline that represents pixel-based distillation.
  • Figure 4: (a) Loss landscape of GSDD; (b) Loss landscape of pixel-based representation.
  • Figure 5: Performance comparison between GSDD and DDiF under the same per-image storage budget (abbreviated as st (floats)). Top row: Forward and forward+backward execution time and memory usage across varying image resolutions (with fixed batch size = 32). Bottom row: Same metrics across varying batch sizes (with fixed resolution = 128). GSDD consistently achieves lower latency and memory consumption, especially under high-resolution and large-batch scenarios.
  • ...and 9 more figures