Table of Contents
Fetching ...

Enhancing Dataset Distillation via Label Inconsistency Elimination and Learning Pattern Refinement

Chuhao Zhou, Chenxi Jiang, Yi Xie, Haozhi Cao, Jianfei Yang

TL;DR

The solution, Modified Difficulty-Aligned Trajectory Matching (M-DATM), introduces two key modifications to the original state-of-the-art method DATM: the soft labels learned by DATM do not achieve one-to-one correspondence with the counterparts generated by the official evaluation script, so the soft labels technique is removed to alleviate such inconsistency.

Abstract

Dataset Distillation (DD) seeks to create a condensed dataset that, when used to train a model, enables the model to achieve performance similar to that of a model trained on the entire original dataset. It relieves the model training from processing massive data and thus reduces the computation resources, storage, and time costs. This paper illustrates our solution that ranks 1st in the ECCV-2024 Data Distillation Challenge (track 1). Our solution, Modified Difficulty-Aligned Trajectory Matching (M-DATM), introduces two key modifications to the original state-of-the-art method DATM: (1) the soft labels learned by DATM do not achieve one-to-one correspondence with the counterparts generated by the official evaluation script, so we remove the soft labels technique to alleviate such inconsistency; (2) since the removal of soft labels makes it harder for the synthetic dataset to learn late trajectory information, particularly on Tiny ImageNet, we reduce the matching range, allowing the synthetic data to concentrate more on the easier patterns. In the final evaluation, our M-DATM achieved accuracies of 0.4061 and 0.1831 on the CIFAR-100 and Tiny ImageNet datasets, ranking 1st in the Fixed Images Per Class (IPC) Track.

Enhancing Dataset Distillation via Label Inconsistency Elimination and Learning Pattern Refinement

TL;DR

The solution, Modified Difficulty-Aligned Trajectory Matching (M-DATM), introduces two key modifications to the original state-of-the-art method DATM: the soft labels learned by DATM do not achieve one-to-one correspondence with the counterparts generated by the official evaluation script, so the soft labels technique is removed to alleviate such inconsistency.

Abstract

Dataset Distillation (DD) seeks to create a condensed dataset that, when used to train a model, enables the model to achieve performance similar to that of a model trained on the entire original dataset. It relieves the model training from processing massive data and thus reduces the computation resources, storage, and time costs. This paper illustrates our solution that ranks 1st in the ECCV-2024 Data Distillation Challenge (track 1). Our solution, Modified Difficulty-Aligned Trajectory Matching (M-DATM), introduces two key modifications to the original state-of-the-art method DATM: (1) the soft labels learned by DATM do not achieve one-to-one correspondence with the counterparts generated by the official evaluation script, so we remove the soft labels technique to alleviate such inconsistency; (2) since the removal of soft labels makes it harder for the synthetic dataset to learn late trajectory information, particularly on Tiny ImageNet, we reduce the matching range, allowing the synthetic data to concentrate more on the easier patterns. In the final evaluation, our M-DATM achieved accuracies of 0.4061 and 0.1831 on the CIFAR-100 and Tiny ImageNet datasets, ranking 1st in the Fixed Images Per Class (IPC) Track.

Paper Structure

This paper contains 10 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The goal of dataset distillation (DD) challenge. In the challenge, the large original CIFAR-100 (Tiny ImageNet) dataset with size 50K (100K) is distilled to a synthetic small dataset with size 500 (1,000), and the 'ConvNet' trained on both datasets are expected to have comparable performances. The classification accuracy serves as the evaluation metric in the challenge.
  • Figure 2: The insights of our M-DATM. (a) Two key modifications in M-DATM: the removal of soft labels technique and the adjustment of the matching ranges. (b) The inconsistency between soft labels learned by DATM and the counterparts generated by the official evaluation script. (c) The DATM could not effectively capture discriminative information through the distillation.
  • Figure 3: Performances of M-DATM across different matching ranges on CIFAR-100 and Tiny ImageNet.
  • Figure 4: Visualization of the distilled images across different matching ranges on CIFAR-100 and Tiny ImageNet.