Table of Contents
Fetching ...

Accelerating Antibiotic Discovery with Large Language Models and Knowledge Graphs

Maxime Delmas, Magdalena Wysocka, Danilo Gusicuma, André Freitas

TL;DR

Antimicrobial resistance creates high costs and long development timelines, with rediscovery of known compounds posing a major risk. The authors present an LLM-based alarm system augmented by a Knowledge Graph to systematically detect prior antibiotic activity evidence across organism and chemical literature, while ensuring taxonomic and synonym resolution. The pipeline is demonstrated on a private set of 73 organisms (with 12 negative hits), achieving broad coverage of OL- and CL-evidence and prioritizing alerts into Strong/Medium/Weak to guide review; a public release of the associated KG and UI is planned. The work highlights gaps in public literature coverage, demonstrates scalable semi-automatic literature review, and offers a reusable framework for evidence-driven target prioritization in antibiotic discovery.

Abstract

The discovery of novel antibiotics is critical to address the growing antimicrobial resistance (AMR). However, pharmaceutical industries face high costs (over $1 billion), long timelines, and a high failure rate, worsened by the rediscovery of known compounds. We propose an LLM-based pipeline that acts as an alarm system, detecting prior evidence of antibiotic activity to prevent costly rediscoveries. The system integrates organism and chemical literature into a Knowledge Graph (KG), ensuring taxonomic resolution, synonym handling, and multi-level evidence classification. We tested the pipeline on a private list of 73 potential antibiotic-producing organisms, disclosing 12 negative hits for evaluation. The results highlight the effectiveness of the pipeline for evidence reviewing, reducing false negatives, and accelerating decision-making. The KG for negative hits and the user interface for interactive exploration will be made publicly available.

Accelerating Antibiotic Discovery with Large Language Models and Knowledge Graphs

TL;DR

Antimicrobial resistance creates high costs and long development timelines, with rediscovery of known compounds posing a major risk. The authors present an LLM-based alarm system augmented by a Knowledge Graph to systematically detect prior antibiotic activity evidence across organism and chemical literature, while ensuring taxonomic and synonym resolution. The pipeline is demonstrated on a private set of 73 organisms (with 12 negative hits), achieving broad coverage of OL- and CL-evidence and prioritizing alerts into Strong/Medium/Weak to guide review; a public release of the associated KG and UI is planned. The work highlights gaps in public literature coverage, demonstrates scalable semi-automatic literature review, and offers a reusable framework for evidence-driven target prioritization in antibiotic discovery.

Abstract

The discovery of novel antibiotics is critical to address the growing antimicrobial resistance (AMR). However, pharmaceutical industries face high costs (over $1 billion), long timelines, and a high failure rate, worsened by the rediscovery of known compounds. We propose an LLM-based pipeline that acts as an alarm system, detecting prior evidence of antibiotic activity to prevent costly rediscoveries. The system integrates organism and chemical literature into a Knowledge Graph (KG), ensuring taxonomic resolution, synonym handling, and multi-level evidence classification. We tested the pipeline on a private list of 73 potential antibiotic-producing organisms, disclosing 12 negative hits for evaluation. The results highlight the effectiveness of the pipeline for evidence reviewing, reducing false negatives, and accelerating decision-making. The KG for negative hits and the user interface for interactive exploration will be made publicly available.

Paper Structure

This paper contains 19 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: An overview of the early phase of antibiotic development and the cost attached to lead compounds identification.
  • Figure 2: An illustration of the proposed pipeline, step-by-step, from the initial list of organism identifications to the extraction of AA evidence alerts in 3 levels. Intermediary annotations (in purple) describe the flow of literature, relations, and evidence that have been processed.
  • Figure 3: A snapshot of the built KG around the natural product relation between Cephalosporium acremonium and Cephalosporine C. Taxonomic and nomenclature relationships are represented between Organism nodes in green. Relation nodes $(r_1, r_2, r_3)$ describe relations between organisms and the isolated natural product Cephalosprin C from different sources: LOTUS database (LOTUSNPR) and extracted from an abstract (TiabNPR) and a paragraph (ChunkNPR). Text nodes connected to relation nodes $(r_2, r_3)$ refer to the text from which the relation was extracted. The evidence node $e_1$ is an example of OL-evidence associated with a Medium alert. The node $e_2$ is a CL-evidence associated with a Strong alert. Literature node connected to relation and evidence nodes allow for linked to the original reference in PubMed (or using the DOI if not available in the case of LOTUS annotations).
  • Figure 4: A describes the distribution of publication years for literature references annotated in the LOTUS database. Panels B represents the distribution of references indexed in PubMed for natural product relations annotated in LOTUS.
  • Figure 5: Distribution of all reported alerts per class (Strong, Medium and Weak) and categories for CL (left) and OL (center) evidence for the 12 discarded organisms. The right panel describes the reported evidence only using the LOTUS available natural products relations.
  • ...and 4 more figures