Table of Contents
Fetching ...

Optimizing Gene-Based Testing for Antibiotic Resistance Prediction

David Hagerman, Anna Johnning, Roman Naeem, Fredrik Kahl, Erik Kristiansson, Lennart Svensson

TL;DR

Antibiotic resistance prediction requires rapid, cost-effective diagnostics. The authors propose GenoARM, a framework that combines reinforcement learning with a transformer-based AR predictor to optimally select a small subset of PCR gene tests, leveraging metadata to boost accuracy. Empirical results across five pathogens show that incorporating metadata substantially improves performance, with GenoARM often achieving the best predictions under a 5-gene test budget; RandEvolve remains a strong baseline. The work demonstrates that near-full-genome predictive power can be achieved with a carefully chosen, small gene panel, offering practical implications for clinical diagnostics and resource-constrained settings.

Abstract

Antibiotic Resistance (AR) is a critical global health challenge that necessitates the development of cost-effective, efficient, and accurate diagnostic tools. Given the genetic basis of AR, techniques such as Polymerase Chain Reaction (PCR) that target specific resistance genes offer a promising approach for predictive diagnostics using a limited set of key genes. This study introduces GenoARM, a novel framework that integrates reinforcement learning (RL) with transformer-based models to optimize the selection of PCR gene tests and improve AR predictions, leveraging observed metadata for improved accuracy. In our evaluation, we developed several high-performing baselines and compared them using publicly available datasets derived from real-world bacterial samples representing multiple clinically relevant pathogens. The results show that all evaluated methods achieve strong and reliable performance when metadata is not utilized. When metadata is introduced and the number of selected genes increases, GenoARM demonstrates superior performance due to its capacity to approximate rewards for unseen and sparse combinations. Overall, our framework represents a major advancement in optimizing diagnostic tools for AR in clinical settings.

Optimizing Gene-Based Testing for Antibiotic Resistance Prediction

TL;DR

Antibiotic resistance prediction requires rapid, cost-effective diagnostics. The authors propose GenoARM, a framework that combines reinforcement learning with a transformer-based AR predictor to optimally select a small subset of PCR gene tests, leveraging metadata to boost accuracy. Empirical results across five pathogens show that incorporating metadata substantially improves performance, with GenoARM often achieving the best predictions under a 5-gene test budget; RandEvolve remains a strong baseline. The work demonstrates that near-full-genome predictive power can be achieved with a carefully chosen, small gene panel, offering practical implications for clinical diagnostics and resource-constrained settings.

Abstract

Antibiotic Resistance (AR) is a critical global health challenge that necessitates the development of cost-effective, efficient, and accurate diagnostic tools. Given the genetic basis of AR, techniques such as Polymerase Chain Reaction (PCR) that target specific resistance genes offer a promising approach for predictive diagnostics using a limited set of key genes. This study introduces GenoARM, a novel framework that integrates reinforcement learning (RL) with transformer-based models to optimize the selection of PCR gene tests and improve AR predictions, leveraging observed metadata for improved accuracy. In our evaluation, we developed several high-performing baselines and compared them using publicly available datasets derived from real-world bacterial samples representing multiple clinically relevant pathogens. The results show that all evaluated methods achieve strong and reliable performance when metadata is not utilized. When metadata is introduced and the number of selected genes increases, GenoARM demonstrates superior performance due to its capacity to approximate rewards for unseen and sparse combinations. Overall, our framework represents a major advancement in optimizing diagnostic tools for AR in clinical settings.

Paper Structure

This paper contains 20 sections, 3 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: An overview of the full GenoARM framework as used during inference.
  • Figure 2: Detailed AR prediction model architecture. Gene test names and metadata is shown un-tokenized but would be tokenized prior to being embedded.
  • Figure 3: An overview of the training framework for the policy network.
  • Figure 4: Accuracy on E.coli of all methods for increasing number of genes. "GenAR - All genes" and "GenARM - All genes" utilizes all genes in the dataset as inputs.