Table of Contents
Fetching ...

Contrastive Learning-Enhanced Trajectory Matching for Small-Scale Dataset Distillation

Wenmin Li, Shunsuke Sakai, Tatsuhito Hasegawa

TL;DR

This work tackles dataset distillation under extreme data scarcity by enhancing trajectory-matching with supervised contrastive learning. The proposed DATM-CLR integrates a SimCLR-style contrastive loss into the inner optimization loop, producing more informative and discriminative synthetic samples and improving the alignment of training dynamics with real data. By jointly optimizing trajectory matching and contrastive objectives through Contrastive Fusion and Contrastive Update strategies, the method achieves state-of-the-art performance on CIFAR-10, CIFAR-100, and Tiny-ImageNet, especially at IPC=1, and demonstrates robust generalization across architectures. The approach offers practical impact for deploying compact, high-quality synthetic datasets in edge or rapid-prototyping scenarios, while noting additional computational costs and hyperparameter considerations as trade-offs.

Abstract

Deploying machine learning models in resource-constrained environments, such as edge devices or rapid prototyping scenarios, increasingly demands distillation of large datasets into significantly smaller yet informative synthetic datasets. Current dataset distillation techniques, particularly Trajectory Matching methods, optimize synthetic data so that the model's training trajectory on synthetic samples mirrors that on real data. While demonstrating efficacy on medium-scale synthetic datasets, these methods fail to adequately preserve semantic richness under extreme sample scarcity. To address this limitation, we propose a novel dataset distillation method integrating contrastive learning during image synthesis. By explicitly maximizing instance-level feature discrimination, our approach produces more informative and diverse synthetic samples, even when dataset sizes are significantly constrained. Experimental results demonstrate that incorporating contrastive learning substantially enhances the performance of models trained on very small-scale synthetic datasets. This integration not only guides more effective feature representation but also significantly improves the visual fidelity of the synthesized images. Experimental results demonstrate that our method achieves notable performance improvements over existing distillation techniques, especially in scenarios with extremely limited synthetic data.

Contrastive Learning-Enhanced Trajectory Matching for Small-Scale Dataset Distillation

TL;DR

This work tackles dataset distillation under extreme data scarcity by enhancing trajectory-matching with supervised contrastive learning. The proposed DATM-CLR integrates a SimCLR-style contrastive loss into the inner optimization loop, producing more informative and discriminative synthetic samples and improving the alignment of training dynamics with real data. By jointly optimizing trajectory matching and contrastive objectives through Contrastive Fusion and Contrastive Update strategies, the method achieves state-of-the-art performance on CIFAR-10, CIFAR-100, and Tiny-ImageNet, especially at IPC=1, and demonstrates robust generalization across architectures. The approach offers practical impact for deploying compact, high-quality synthetic datasets in edge or rapid-prototyping scenarios, while noting additional computational costs and hyperparameter considerations as trade-offs.

Abstract

Deploying machine learning models in resource-constrained environments, such as edge devices or rapid prototyping scenarios, increasingly demands distillation of large datasets into significantly smaller yet informative synthetic datasets. Current dataset distillation techniques, particularly Trajectory Matching methods, optimize synthetic data so that the model's training trajectory on synthetic samples mirrors that on real data. While demonstrating efficacy on medium-scale synthetic datasets, these methods fail to adequately preserve semantic richness under extreme sample scarcity. To address this limitation, we propose a novel dataset distillation method integrating contrastive learning during image synthesis. By explicitly maximizing instance-level feature discrimination, our approach produces more informative and diverse synthetic samples, even when dataset sizes are significantly constrained. Experimental results demonstrate that incorporating contrastive learning substantially enhances the performance of models trained on very small-scale synthetic datasets. This integration not only guides more effective feature representation but also significantly improves the visual fidelity of the synthesized images. Experimental results demonstrate that our method achieves notable performance improvements over existing distillation techniques, especially in scenarios with extremely limited synthetic data.

Paper Structure

This paper contains 15 sections, 4 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: (a) The teacher model transfers knowledege to the student model through distillation algorithm, achieving performance parity.(b) Multiple models produce outputs from the same data, and these outputs are aggregated to refine the data itself. The distilled data retains the same input appearance but is enriched with aggregated label or prediction information, making it more informative for training.(c) The synthetic dataset is optimized to reproduce the training trajectory or generalization behavior of real data, enabling the student model to achieve comparable performance to the teacher model.
  • Figure 2: Illustration of the synthetic data optimization process integrating Trajectory Matching and Contrastive Learning. (a) Trajectory Matching aligns the student model parameter trajectory trained on synthetic data with the teacher model parameter trajectory obtained from real data. (b) Contrastive Learning generates positive pairs via augmentation from synthetic images and distinguishes them from negative pairs in the feature space, encouraging intra-class similarity and inter-class dissimilarity. In addition, the Contrastive Update strategy leverages the contrastive loss to directly update the student model during its inner-loop optimization, enabling synthetic data to better capture discriminative representations. (c) Synthetic images are optimized simultaneously by minimizing trajectory matching loss and contrastive learning loss.
  • Figure 3: Variation of different effects of hyperparameters in different strategies