Poisoning the Genome: Targeted Backdoor Attacks on DNA Foundation Models

Charalampos Koilakos; Ioannis Mouratidis; Ilias Georgakopoulos-Soares

Poisoning the Genome: Targeted Backdoor Attacks on DNA Foundation Models

Charalampos Koilakos, Ioannis Mouratidis, Ilias Georgakopoulos-Soares

Abstract

Genomic foundation models trained on DNA sequences have demonstrated remarkable capabilities across diverse biological tasks, from variant effect prediction to genome design. These models are typically trained on massive, publicly sourced genomic datasets comprising trillions of nucleotide tokens, which renders them intrinsically susceptible to errors, artifacts, and adversarial issues embedded in the training data. Unlike natural language, DNA sequences lack the semantic transparency that might allow model makers to filter out corrupted entries, making genomic training corpora particularly susceptible to undetected manipulation. While training data poisoning has been established as a credible threat to large language models, its implications for genomic foundation models remain unexplored. Here, we present the first systematic investigation of training data poisoning in genomic language models. We demonstrate two complementary attack vectors. First, we show that adversarially crafted sequences can selectively degrade generative behavior on targeted genomic contexts, with backdoor activation following a sigmoidal dose-response relationship and full implantation achieved at 1 percent cumulative poison exposure. Second, targeted label corruption of downstream training data can selectively compromise clinically relevant variant classification, demonstrated using BRCA1 variant effect prediction. Our results reveal that genomic foundation models are vulnerable to targeted data poisoning attacks, underscoring the need for data provenance tracking, integrity verification, and adversarial robustness evaluation in the genomic foundation model development pipeline.

Poisoning the Genome: Targeted Backdoor Attacks on DNA Foundation Models

Abstract

Poisoning the Genome: Targeted Backdoor Attacks on DNA Foundation Models

Abstract

Paper Structure

Table of Contents

Figures (4)