Table of Contents
Fetching ...

Quantum Hamiltonian Learning using Time-Resolved Measurement Data and its Application to Gene Regulatory Network Inference

Mohammad Aamir Sohail, Ranga R. Sudharshan, S. Sandeep Pradhan, Arvind Rao

TL;DR

The quantum Hamiltonian-based gene-expression model (QHGM), in which gene interactions are encoded as a parameterized Hamiltonian that governs gene expression evolution over pseudotime, is introduced, which opens new directions for applying quantum-like modeling to biological systems beyond the limits of classical inference.

Abstract

We present a new Hamiltonian-learning framework based on time-resolved measurement data from a fixed local IC-POVM and its application to inferring gene regulatory networks. We introduce the quantum Hamiltonian-based gene-expression model (QHGM), in which gene interactions are encoded as a parameterized Hamiltonian that governs gene expression evolution over pseudotime. We derive finite-sample recovery guarantees and establish upper bounds on the number of time and measurement samples required for accurate parameter estimation with high probability, scaling polynomially with system size. To recover the QHGM parameters, we develop a scalable variational learning algorithm based on empirical risk minimization. Our method recovers network structure efficiently on synthetic benchmarks and reveals novel, biologically plausible regulatory connections in Glioblastoma single-cell RNA sequencing data, highlighting its potential in cancer research. This framework opens new directions for applying quantum-like modeling to biological systems beyond the limits of classical inference.

Quantum Hamiltonian Learning using Time-Resolved Measurement Data and its Application to Gene Regulatory Network Inference

TL;DR

The quantum Hamiltonian-based gene-expression model (QHGM), in which gene interactions are encoded as a parameterized Hamiltonian that governs gene expression evolution over pseudotime, is introduced, which opens new directions for applying quantum-like modeling to biological systems beyond the limits of classical inference.

Abstract

We present a new Hamiltonian-learning framework based on time-resolved measurement data from a fixed local IC-POVM and its application to inferring gene regulatory networks. We introduce the quantum Hamiltonian-based gene-expression model (QHGM), in which gene interactions are encoded as a parameterized Hamiltonian that governs gene expression evolution over pseudotime. We derive finite-sample recovery guarantees and establish upper bounds on the number of time and measurement samples required for accurate parameter estimation with high probability, scaling polynomially with system size. To recover the QHGM parameters, we develop a scalable variational learning algorithm based on empirical risk minimization. Our method recovers network structure efficiently on synthetic benchmarks and reveals novel, biologically plausible regulatory connections in Glioblastoma single-cell RNA sequencing data, highlighting its potential in cancer research. This framework opens new directions for applying quantum-like modeling to biological systems beyond the limits of classical inference.
Paper Structure (26 sections, 11 theorems, 123 equations, 6 figures)

This paper contains 26 sections, 11 theorems, 123 equations, 6 figures.

Key Result

Theorem 1

For the Hamiltonian learning problem described above, fix a confidence $\delta>0$ and an empirical SC tolerance $\varepsilon>0$. Under the assumptions stated above, if the number of sampled times $\mathsf{N}_t$ and the number of measurement outcomes per time $\mathsf{N}_c$ are chosen such that where $\tilde{O}$ hides logarithmic factors in $c$ and $1/\varepsilon$, as well as fixed constants $\mu_

Figures (6)

  • Figure 1: Overview of the quantum Hamiltonian-based gene-expression model (QHGM). A gene regulatory network (GRN) is mapped to a parameterized Hamiltonian $\mathrm{H}(\mathbf{w})$, where the presence of gene $\mathsf{g}_i$ induces the action of a Pauli- $\mathrm{Y}$ operator on gene $\mathsf{g}_j$ with regulatory weights $w_{ij}$. The model begins from an initial separable state, representing independent gene states. As the system evolves along pseudotime, correlations between genes are gradually introduced, resulting in an entangled quantum state. At each pseudotime point, this state is measured using a fixed single-qubit IC-POVM, producing a probability distribution $\phi(\mathbf{m}|t,\mathbf{w})$ over measurement outcomes. Collecting repeated measurements outcomes at each pseudotime point yields discretized gene-expression profiles, denoted as $\mathcal{G}$, with dimension $(\mathsf{N}_t,\mathsf{N}_c,n)$ that serve as the observable data for inference. Here, $\mathsf{N}_t$ is the number of pseudotime bins, $\mathsf{N}_c$ is the number of independently measured cells per bin, and $n$ is the number of genes in the network.
  • Figure 2: VQ-Net. (A) Raw scRNA-seq data are preprocessed, normalized, and assigned pseudotime values, providing a temporal ordering of cells along a developmental trajectory. (B) The normalized pseudotime-ordered scRNA-seq data are converted into four discrete values, denoted as $\mathcal{G}$, where $\mathsf{N}_t$ denotes the number of pseudotime bins, $\mathsf{N}_c$ denotes the number of independently measured cells per bin, and $n$ is the number of genes in the network. (C) Prepares a separable initial state and evolves under the parameterized Hamiltonian $\mathrm{H}(\mathbf{w}) \,=\, \sum_{(i,j)} w_{ij}\,\tfrac{1}{2}\bigl(\mathrm{I} - \mathrm{Z}_i\bigr)\,\otimes\,\mathrm{Y}_j,$ which encodes directed regulatory interactions. (D) For each pseudotime bin $t_i$, the single-qubit IC-POVM is applied as measurement observables on entangled evolved states conditioned on the discretized scRNA-seq data $\mathbf{m}_{(i,k)}$ as input. Here, $k$ is the index of the cell in the corresponding pseudotime bin. (E) The parameters $(\boldsymbol{\theta},\boldsymbol{\phi},\mathbf{w})$ are optimized by minimizing the mini-batch empirical loss using a classical optimizer. (F) The learned weights $\mathbf{w}$ are visualized as a signed, asymmetric weight matrix, from which the GRN is inferred.
  • Figure 3: Performance of VQ-Net on the synthetic data generated using QHGM. (A) Maximum absolute weight error versus epochs for different numbers of sampled times $\mathsf{N}_t$ and measurements per time $\mathsf{N}_c$. (B) Percentage of recovered weights within $10\%$ error, illustrating the tradeoff between empirical identifiability (controlled by $\mathsf{N}_t$) and sampling variance (controlled by $\mathsf{N}_c$). (C) Batch empirical loss during training is computed using a mini-batch of size $20$ and $200$ for $\mathsf{N}_t=45 \text{ and } 5$, respectively. The results indicate convergence to the theoretical optimum for $\mathsf{N}_t = 45$. (D) Learned weights compared to ground truth, demonstrating strong recovery for $\mathsf{N}_t = 45$ and dispersion for $\mathsf{N}_t = 5$. (E--F) Recovered initial-state parameters $(\boldsymbol{\theta},\boldsymbol{\phi})$ with relative errors of $(0.0098, 0.0194)$ for $(\mathsf{N}_t,\mathsf{N}_c) = (45, 10^3)$ and $(0.1638,0.180)$ for $(\mathsf{N}_t,\mathsf{N}_c) = (5,50\times 10^3)$, respectively.
  • Figure 4: Inferred GRN from QHGM on GBMap scRNA-seq data. (A) Cells are shown at increasing annotation granularity (Level 2 and Level 3) at UMAP coordinates. The trajectory inference is set with an OPC-like cell as the root node and progresses toward AC-like and MES-like cell types specified as terminal endpoints. The streamplot overlaid on the UMAP embedding visualizes this progression, showing the dominant probabilistic flow of cells along pseudotime, where higher pseudotime values correspond to more differentiated states. (B) Heatmap of median learned weights across 10 simulations, which summarizes central tendencies across simulations. (C) Gene-wise classification of regulatory behavior based on the coefficient of variation (CV) of positive and negative weights across simulations. The joint CV analysis distinguishes genes with selective, stable activation or repression from those exhibiting high variability, providing a quantitative measure of regulatory consistency and context dependence. (D) GRN is inferred from the learned weights by selecting the top 15th percentile of positive (activation) and negative (repression) Hamiltonian weights separately, visualized as a directed network. Green and blue edges denote activating and repressing interactions, respectively.
  • Figure 5: Performance of state-of-the-art classical inference methods on QHGM-generated data. (A) Network Edge recovery (B) Weights Sign recovery (C) Sparsity-Edge recovery Tradeoff
  • ...and 1 more figures

Theorems & Definitions (21)

  • Definition 1: Statistical Model
  • Theorem 1
  • Theorem 2
  • Definition 2: IC-POVM medlock2020informationally
  • Lemma 1: Matrix Bernstein-type inequality tropp2015introduction
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Proposition 1
  • ...and 11 more