Table of Contents
Fetching ...

Dark Distillation: Backdooring Distilled Datasets without Accessing Raw Data

Ziyuan Yang, Ming Yan, Yi Zhang, Joey Tianyi Zhou

TL;DR

The paper addresses a practical security risk in dataset distillation by showing that third-party attackers can inject backdoors into distilled datasets without access to raw data. The attack reconstructs class-level archetypes in latent space via Concept Reconstruction Blocks and embeds backdoors using a hybrid loss that balances malicious objectives with trajectory consistency, formalized as $L_{hybrid} = alpha L_{BA} + (1-alpha) L_{tr}$. Empirically, the method achieves high attack success rates across diverse datasets, DD methods, and training strategies, while keeping benign performance degradation minimal and requiring less than a minute to synthesize malicious distilled data. This work exposes a fundamental vulnerability in DD and motivates the development of defenses to ensure the integrity of distilled data in real-world data-sharing pipelines.

Abstract

Dataset distillation (DD) enhances training efficiency and reduces bandwidth by condensing large datasets into smaller synthetic ones. It enables models to achieve performance comparable to those trained on the raw full dataset and has become a widely adopted method for data sharing. However, security concerns in DD remain underexplored. Existing studies typically assume that malicious behavior originates from dataset owners during the initial distillation process, where backdoors are injected into raw datasets. In contrast, this work is the first to address a more realistic and concerning threat: attackers may intercept the dataset distribution process, inject backdoors into the distilled datasets, and redistribute them to users. While distilled datasets were previously considered resistant to backdoor attacks, we demonstrate that they remain vulnerable to such attacks. Furthermore, we show that attackers do not even require access to any raw data to inject the backdoors successfully. Specifically, our approach reconstructs conceptual archetypes for each class from the model trained on the distilled dataset. Backdoors are then injected into these archetypes to update the distilled dataset. Moreover, we ensure the updated dataset not only retains the backdoor but also preserves the original optimization trajectory, thus maintaining the knowledge of the raw dataset. To achieve this, a hybrid loss is designed to integrate backdoor information along the benign optimization trajectory, ensuring that previously learned information is not forgotten. Extensive experiments demonstrate that distilled datasets are highly vulnerable to backdoor attacks, with risks pervasive across various raw datasets, distillation methods, and downstream training strategies. Moreover, our attack method is efficient, capable of synthesizing a malicious distilled dataset in under one minute in certain cases.

Dark Distillation: Backdooring Distilled Datasets without Accessing Raw Data

TL;DR

The paper addresses a practical security risk in dataset distillation by showing that third-party attackers can inject backdoors into distilled datasets without access to raw data. The attack reconstructs class-level archetypes in latent space via Concept Reconstruction Blocks and embeds backdoors using a hybrid loss that balances malicious objectives with trajectory consistency, formalized as . Empirically, the method achieves high attack success rates across diverse datasets, DD methods, and training strategies, while keeping benign performance degradation minimal and requiring less than a minute to synthesize malicious distilled data. This work exposes a fundamental vulnerability in DD and motivates the development of defenses to ensure the integrity of distilled data in real-world data-sharing pipelines.

Abstract

Dataset distillation (DD) enhances training efficiency and reduces bandwidth by condensing large datasets into smaller synthetic ones. It enables models to achieve performance comparable to those trained on the raw full dataset and has become a widely adopted method for data sharing. However, security concerns in DD remain underexplored. Existing studies typically assume that malicious behavior originates from dataset owners during the initial distillation process, where backdoors are injected into raw datasets. In contrast, this work is the first to address a more realistic and concerning threat: attackers may intercept the dataset distribution process, inject backdoors into the distilled datasets, and redistribute them to users. While distilled datasets were previously considered resistant to backdoor attacks, we demonstrate that they remain vulnerable to such attacks. Furthermore, we show that attackers do not even require access to any raw data to inject the backdoors successfully. Specifically, our approach reconstructs conceptual archetypes for each class from the model trained on the distilled dataset. Backdoors are then injected into these archetypes to update the distilled dataset. Moreover, we ensure the updated dataset not only retains the backdoor but also preserves the original optimization trajectory, thus maintaining the knowledge of the raw dataset. To achieve this, a hybrid loss is designed to integrate backdoor information along the benign optimization trajectory, ensuring that previously learned information is not forgotten. Extensive experiments demonstrate that distilled datasets are highly vulnerable to backdoor attacks, with risks pervasive across various raw datasets, distillation methods, and downstream training strategies. Moreover, our attack method is efficient, capable of synthesizing a malicious distilled dataset in under one minute in certain cases.

Paper Structure

This paper contains 15 sections, 12 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Illustration of the threat models. (a) Previous works assume that the data owner may be malicious and inject backdoors into the distilled dataset before distributing it to users. (b) In contrast, our threat model is more practical. We assume the data owner is benign. However, third parties, such as hackers or malicious users, may act maliciously. They could attack the system by hijacking the dataset distribution, injecting backdoors into the distilled dataset, and redistributing it to users.
  • Figure 2: Overview of the proposed method.
  • Figure 3: t-SNE visualization of the feature space. "Stars" and "Circles" represent the concept archetypes and real images, respectively. The reconstructed archetypes align closely with the deep feature representations of real images, effectively bridging the gap between the distilled data and real images.
  • Figure 4: The performances of different user-side models under different training strategies. Our attack consistently poses a significant threat across different user-side models and training strategies.
  • Figure 5: Visualization of benign and malicious distilled data. Without a direct comparison, users may struggle to sense the subtle differences due to the inherent abstraction of distilled datasets.
  • ...and 1 more figures