Dark Distillation: Backdooring Distilled Datasets without Accessing Raw Data

Ziyuan Yang; Ming Yan; Yi Zhang; Joey Tianyi Zhou

Dark Distillation: Backdooring Distilled Datasets without Accessing Raw Data

Ziyuan Yang, Ming Yan, Yi Zhang, Joey Tianyi Zhou

TL;DR

The paper addresses a practical security risk in dataset distillation by showing that third-party attackers can inject backdoors into distilled datasets without access to raw data. The attack reconstructs class-level archetypes in latent space via Concept Reconstruction Blocks and embeds backdoors using a hybrid loss that balances malicious objectives with trajectory consistency, formalized as $L_{hybrid} = alpha L_{BA} + (1-alpha) L_{tr}$. Empirically, the method achieves high attack success rates across diverse datasets, DD methods, and training strategies, while keeping benign performance degradation minimal and requiring less than a minute to synthesize malicious distilled data. This work exposes a fundamental vulnerability in DD and motivates the development of defenses to ensure the integrity of distilled data in real-world data-sharing pipelines.

Abstract

Dataset distillation (DD) enhances training efficiency and reduces bandwidth by condensing large datasets into smaller synthetic ones. It enables models to achieve performance comparable to those trained on the raw full dataset and has become a widely adopted method for data sharing. However, security concerns in DD remain underexplored. Existing studies typically assume that malicious behavior originates from dataset owners during the initial distillation process, where backdoors are injected into raw datasets. In contrast, this work is the first to address a more realistic and concerning threat: attackers may intercept the dataset distribution process, inject backdoors into the distilled datasets, and redistribute them to users. While distilled datasets were previously considered resistant to backdoor attacks, we demonstrate that they remain vulnerable to such attacks. Furthermore, we show that attackers do not even require access to any raw data to inject the backdoors successfully. Specifically, our approach reconstructs conceptual archetypes for each class from the model trained on the distilled dataset. Backdoors are then injected into these archetypes to update the distilled dataset. Moreover, we ensure the updated dataset not only retains the backdoor but also preserves the original optimization trajectory, thus maintaining the knowledge of the raw dataset. To achieve this, a hybrid loss is designed to integrate backdoor information along the benign optimization trajectory, ensuring that previously learned information is not forgotten. Extensive experiments demonstrate that distilled datasets are highly vulnerable to backdoor attacks, with risks pervasive across various raw datasets, distillation methods, and downstream training strategies. Moreover, our attack method is efficient, capable of synthesizing a malicious distilled dataset in under one minute in certain cases.

Dark Distillation: Backdooring Distilled Datasets without Accessing Raw Data

TL;DR

Abstract

Dark Distillation: Backdooring Distilled Datasets without Accessing Raw Data

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)