Table of Contents
Fetching ...

Toward Quantum-Enabled Biomarker Discovery: An Outlook from Q4Bio

Dhirpal Shah, Mariesa Teo, Ryan A. Robinett, Sophia Madejski, Zachary Morrell, Siddhi Ramesh, Colin Campbell, Bharath Thotakura, Victory Omole, Ben Hall, Aram W. Harrow, Teague Tomesh, Alexander T. Pearson, Frederic T. Chong, Samantha J. Riesenfeld

TL;DR

The paper investigates empirical quantum advantage in a clinically relevant biomarker-discovery task by building a hybrid quantum-classical pipeline for multimodal cancer data. It formulates feature selection as a higher-order polynomial constrained binary optimization (PCBO) problem and introduces hyper-RQAOA (HRQAOA) with parameter transfer to reduce quantum resource requirements while preserving solution quality. Through simulations and hardware experiments on heavy-hex IBM devices, it demonstrates sparsification, error mitigation, and estimator-based workflows that improve edge-fixing reliability, outlining a realistic near- to intermediate-term path to EQA. The work highlights co-design across data preprocessing, problem encoding, algorithm selection, and hardware mapping, showing potential for compact, interpretable biomarker panels and broader biomedical applications beyond oncology.

Abstract

We present a case study and forward-looking perspective on co-design for hybrid quantum-classical algorithms, centered on the goal of empirical quantum advantage (EQA), which we define as a measurable performance gain using quantum hardware over state-of-the-art classical methods on the same task. Because classical algorithms continue to improve, the EQA crossover point is a moving target; nevertheless, we argue that a persistent advantage is possible for our application class even if the crossover point shifts. Specifically, our team examines the task of biomarker discovery in precision oncology. We push the limitations of the best classical algorithms, improving them as best as we can, and then augment them with a quantum subroutine for the task where we are most likely to see performance gains. We discuss the implementation of a quantum subroutine for feature selection on current devices, where hardware constraints necessitate further co-design between algorithm and physical device capabilities. Looking ahead, we perform resource analysis to explore a plausible EQA region on near/intermediate-term hardware, considering the impacts of advances in classical and quantum computing on this regime. Finally, we outline potential clinical impact and broader applications of this hybrid pipeline beyond oncology.

Toward Quantum-Enabled Biomarker Discovery: An Outlook from Q4Bio

TL;DR

The paper investigates empirical quantum advantage in a clinically relevant biomarker-discovery task by building a hybrid quantum-classical pipeline for multimodal cancer data. It formulates feature selection as a higher-order polynomial constrained binary optimization (PCBO) problem and introduces hyper-RQAOA (HRQAOA) with parameter transfer to reduce quantum resource requirements while preserving solution quality. Through simulations and hardware experiments on heavy-hex IBM devices, it demonstrates sparsification, error mitigation, and estimator-based workflows that improve edge-fixing reliability, outlining a realistic near- to intermediate-term path to EQA. The work highlights co-design across data preprocessing, problem encoding, algorithm selection, and hardware mapping, showing potential for compact, interpretable biomarker panels and broader biomedical applications beyond oncology.

Abstract

We present a case study and forward-looking perspective on co-design for hybrid quantum-classical algorithms, centered on the goal of empirical quantum advantage (EQA), which we define as a measurable performance gain using quantum hardware over state-of-the-art classical methods on the same task. Because classical algorithms continue to improve, the EQA crossover point is a moving target; nevertheless, we argue that a persistent advantage is possible for our application class even if the crossover point shifts. Specifically, our team examines the task of biomarker discovery in precision oncology. We push the limitations of the best classical algorithms, improving them as best as we can, and then augment them with a quantum subroutine for the task where we are most likely to see performance gains. We discuss the implementation of a quantum subroutine for feature selection on current devices, where hardware constraints necessitate further co-design between algorithm and physical device capabilities. Looking ahead, we perform resource analysis to explore a plausible EQA region on near/intermediate-term hardware, considering the impacts of advances in classical and quantum computing on this regime. Finally, we outline potential clinical impact and broader applications of this hybrid pipeline beyond oncology.

Paper Structure

This paper contains 45 sections, 25 equations, 19 figures, 1 table.

Figures (19)

  • Figure 1: Overview of a generic data analysis pipeline. Our team explored potential applications of quantum computers within each step, but ultimately focused efforts on developing hybrid feature-selection algorithms.
  • Figure 2: End-to-end overview of the hybrid data processing pipeline developed by our team for biomarker discovery and scaling towards EQA for clinical applications.
  • Figure 3: Example of the mRNA discretization for the DHX8 gene feature in the Pancan data. The continuous expression values (top) were each assigned one of five categorical values (bottom) according to percentile bins defined by multiples of 20 percent.
  • Figure 4: UMAP of continuous slide-level features (left) and discretized slide-level features (right) in the pan-cancer TCGA dataset, using Prov-GigaPath feature extractor.
  • Figure 5: Evaluation of classical (dashed lines) and mRmR-based (solid lines) feature selection algorithms on a dataset containing 32,024 total features with three separate class labels. In this early dataset, the 13,525 mRNA variables were transformed and discretized so as to function as distractor variables. Each algorithm produces a feature set of a specific size (x-axis) which is used to train a logistic regression classifier. Five-fold cross-validation is performed to produce the reported F1 score (y-axis, higher is better).
  • ...and 14 more figures