Table of Contents
Fetching ...

Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning

James R. M. Black, Moritz S. Hanke, Aaron Maiwald, Tina Hernandez-Boussard, Oliver M. Crook, Jaspreet Pannu

TL;DR

This work investigates whether data-exclusion strategies for open-source genomic language models robustly prevent misuse. By adversarially fine-tuning Evo 2 on sequences from harmful human-infecting viruses, the authors demonstrate that misuse-relevant capabilities can be rescued, evidenced by reduced perplexity on viral sequences and partial recovery of immune-escape predictive signals for an unseen pathogen. Although the rescued signals fall short of those produced by narrow, purpose-built tools, the findings imply that data exclusion raises the bar but is not a fail-safe solution. The study underscores the urgent need for comprehensive safety frameworks and diverse mitigation strategies to govern the development and deployment of gLMs in genomics.

Abstract

Novel deep learning architectures are increasingly being applied to biological data, including genetic sequences. These models, referred to as genomic language mod- els (gLMs), have demonstrated impressive predictive and generative capabilities, raising concerns that such models may also enable misuse, for instance via the generation of genomes for human-infecting viruses. These concerns have catalyzed calls for risk mitigation measures. The de facto mitigation of choice is filtering of pretraining data (i.e., removing viral genomic sequences from training datasets) in order to limit gLM performance on virus-related tasks. However, it is not currently known how robust this approach is for securing open-source models that can be fine-tuned using sensitive pathogen data. Here, we evaluate a state-of-the-art gLM, Evo 2, and perform fine-tuning using sequences from 110 harmful human-infecting viruses to assess the rescue of misuse-relevant predictive capabilities. The fine- tuned model exhibited reduced perplexity on unseen viral sequences relative to 1) the pretrained model and 2) a version fine-tuned on bacteriophage sequences. The model fine-tuned on human-infecting viruses also identified immune escape variants from SARS-CoV-2 (achieving an AUROC of 0.6), despite having no expo- sure to SARS-CoV-2 sequences during fine-tuning. This work demonstrates that data exclusion might be circumvented by fine-tuning approaches that can, to some degree, rescue misuse-relevant capabilities of gLMs. We highlight the need for safety frameworks for gLMs and outline further work needed on evaluations and mitigation measures to enable the safe deployment of gLMs.

Open-weight genome language model safeguards: Assessing robustness via adversarial fine-tuning

TL;DR

This work investigates whether data-exclusion strategies for open-source genomic language models robustly prevent misuse. By adversarially fine-tuning Evo 2 on sequences from harmful human-infecting viruses, the authors demonstrate that misuse-relevant capabilities can be rescued, evidenced by reduced perplexity on viral sequences and partial recovery of immune-escape predictive signals for an unseen pathogen. Although the rescued signals fall short of those produced by narrow, purpose-built tools, the findings imply that data exclusion raises the bar but is not a fail-safe solution. The study underscores the urgent need for comprehensive safety frameworks and diverse mitigation strategies to govern the development and deployment of gLMs in genomics.

Abstract

Novel deep learning architectures are increasingly being applied to biological data, including genetic sequences. These models, referred to as genomic language mod- els (gLMs), have demonstrated impressive predictive and generative capabilities, raising concerns that such models may also enable misuse, for instance via the generation of genomes for human-infecting viruses. These concerns have catalyzed calls for risk mitigation measures. The de facto mitigation of choice is filtering of pretraining data (i.e., removing viral genomic sequences from training datasets) in order to limit gLM performance on virus-related tasks. However, it is not currently known how robust this approach is for securing open-source models that can be fine-tuned using sensitive pathogen data. Here, we evaluate a state-of-the-art gLM, Evo 2, and perform fine-tuning using sequences from 110 harmful human-infecting viruses to assess the rescue of misuse-relevant predictive capabilities. The fine- tuned model exhibited reduced perplexity on unseen viral sequences relative to 1) the pretrained model and 2) a version fine-tuned on bacteriophage sequences. The model fine-tuned on human-infecting viruses also identified immune escape variants from SARS-CoV-2 (achieving an AUROC of 0.6), despite having no expo- sure to SARS-CoV-2 sequences during fine-tuning. This work demonstrates that data exclusion might be circumvented by fine-tuning approaches that can, to some degree, rescue misuse-relevant capabilities of gLMs. We highlight the need for safety frameworks for gLMs and outline further work needed on evaluations and mitigation measures to enable the safe deployment of gLMs.

Paper Structure

This paper contains 19 sections, 4 figures.

Figures (4)

  • Figure 1: Composition of the dataset of viral genomes, by type of viral sequence. 110 viral sequences were used for fine-tuning, and 12 sequences were held out for downstream evaluation.
  • Figure 2: Scatterplot showing perplexity and sequence length for 110 harmful human-infecting viruses that the fine-tuned model was trained on (n=97, r=0.034). Perplexity measures how well a gLM predicts the next token in a sequence, with lower values indicating better predictive performance and thus a better understanding of the underlying genomic data. Each point on the plot corresponds to a single virus used for fine tuning.
  • Figure 3: Boxplot showing perplexity on training (n=110) and test (n=12) sequences across the three versions of Evo 2: pretrained; FT-bacteriophages; FT-harmful.
  • Figure 4: Receiver operating characteristic (ROC) curves comparing methods for identifying SARS-CoV-2 Spike mutations leading to a phenotype of immune escape vs no immune escape. 3 versions of Evo 2, pretrained, fine-tuned (bacteriophages), and fine-tuned (harmful human-infecting viruses) were compared. BLOSUM-62 scores were used to evaluate whether evolutionary conservation alone would confer predictive power. EVEscape, a deep learning model leveraging fitness predictions and structural information, was compared as an example of a model specialized for this exact task.