Table of Contents
Fetching ...

QHap: Quantum-Inspired Haplotype Phasing

Rui Zhang, Xian-Zhe Tao, Yibo Chen, Jiawei Zhang, Lei He, Dongming Fang, Lin Yang, Yuhui Sun, Qinyuan Zheng, Xinmeng Shi, Yang Zhou, Wanyi Chen, Chentao Yang, Man-Hong Yung, Jun-Han Huang

Abstract

Haplotype phasing, the process of resolving parental allele inheritance patterns in diploid genomes, is critical for precision medicine and population genetics, yet the underlying optimization is NP-hard, posing a scalability challenge. To address this, we introduce QHap, a haplotype phasing tool that leverages quantum-inspired optimization. By reformulating haplotype phasing as a Max-Cut problem and deploying a GPU-accelerated ballistic simulated bifurcation solver, QHap accelerates phasing while maintaining accuracy comparable to established phasing tools. On the highly polymorphic human major histocompatibility complex region, QHap demonstrates 4- to 20-fold acceleration with zero switch error across multiple long read sequencing platforms. The framework implements two strategies: a read-based method for regional phasing, and a single nucleotide polymorphism-based method that, through quality-weighted probabilistic edge construction, efficiently scales to chromosome-scale tasks. Integration of chromatin conformation capture data extends phase block contiguity by up to 15-fold, enabling near-chromosome-spanning haplotype reconstruction. QHap demonstrates that quantum-inspired algorithms operating on classical hardware offer a promising approach to addressing the growing computational demands of sequencing data, establishing a new paradigm for applying physics-inspired optimization to fundamental challenges in computational genomics.

QHap: Quantum-Inspired Haplotype Phasing

Abstract

Haplotype phasing, the process of resolving parental allele inheritance patterns in diploid genomes, is critical for precision medicine and population genetics, yet the underlying optimization is NP-hard, posing a scalability challenge. To address this, we introduce QHap, a haplotype phasing tool that leverages quantum-inspired optimization. By reformulating haplotype phasing as a Max-Cut problem and deploying a GPU-accelerated ballistic simulated bifurcation solver, QHap accelerates phasing while maintaining accuracy comparable to established phasing tools. On the highly polymorphic human major histocompatibility complex region, QHap demonstrates 4- to 20-fold acceleration with zero switch error across multiple long read sequencing platforms. The framework implements two strategies: a read-based method for regional phasing, and a single nucleotide polymorphism-based method that, through quality-weighted probabilistic edge construction, efficiently scales to chromosome-scale tasks. Integration of chromatin conformation capture data extends phase block contiguity by up to 15-fold, enabling near-chromosome-spanning haplotype reconstruction. QHap demonstrates that quantum-inspired algorithms operating on classical hardware offer a promising approach to addressing the growing computational demands of sequencing data, establishing a new paradigm for applying physics-inspired optimization to fundamental challenges in computational genomics.

Paper Structure

This paper contains 30 sections, 16 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Schematic overview of QHap (read-based) algorithm. Panel (a) shows the initial genomic sequence with two haplotypes (H1 and H2). After sequencing, alignment, and variant calling, (b) a base matrix is constructed where rows correspond to sequencing reads and columns to SNP loci, with entries indicating the observed alleles or missing values. In (c), the base matrix is transformed into a ternary matrix via numerical encoding: 1 indicates the read allele matches the initial haplotype H1, -1 indicates a match with H2, and 0 represents missing values or alleles matching neither H1 nor H2. Panel (d) illustrates the construction of a weighted undirected graph where reads are represented as nodes, and edge weights are determined by the degree of allelic conflicts between paired reads. In (e), the graph is decomposed into connected components for independent processing. Panel (f) demonstrates the Max-Cut algorithm applied within each block to partition reads into two sets corresponding to the two haplotype phases, minimizing conflicts between reads. Panel (g) illustrates the consensus voting step applied to the partitioned results within each block. Finally, (h) displays the reconstructed complete phased haplotypes obtained by combining the partitioned blocks across all SNP loci.
  • Figure 2: Schematic overview of QHap (SNP-based) algorithm. Panel (a) shows the initial genomic sequence with two haplotypes (H1 and H2). After sequencing, alignment, and variant calling, (b) a base matrix is constructed where rows correspond to sequencing reads and columns to SNP loci, with entries indicating the observed alleles or missing values. In (c), the base matrix is transformed into a weighted matrix by incorporating Phred quality scores, where entries encode both allelic assignments and measurement confidence. Panel (d) illustrates the construction of a weighted undirected graph where SNPs are represented as nodes and edges between nodes are weighted according to the log-likelihood ratio of haplotype relationships. In (e), the graph is decomposed into connected components for independent processing. Panel (f) shows the solution to the Max-Cut problem and the reconstruction of the final haplotypes. The "Sample" section illustrates the iterative refinement process: edge weights crossing the partition are inverted and re-solved until convergence, with final haplotype assignments determined by the parity of accumulated site counters.
  • Figure 3: Performance comparison of quantum-inspired optimization algorithms on QHap-constructed graphs. Panel (a) corresponds to a connected component derived from QHap (read-based) formulation with non-negative integer-valued edge weights (8,742 nodes, 259,417 edges). Panel (b) corresponds to a connected component derived from QHap (SNP-based) formulation with mixed-sign floating-point edge weights (7,768 nodes, 3,303,740 edges). Solid lines represent mean cut values averaged over 100 independent runs, and shaded regions indicate standard deviation. The dashed line indicates the best value achieved during runtime. All algorithms were executed on an Intel(R) Core(TM) i7-14650HX processor (2.20 GHz).
  • Figure 4: Comparison of Ising energy evolution for quantum-inspired optimization algorithms and classical SA. Panel (a) corresponds to a connected component from QHap (read-based) formulation with non-negative integer-valued edge weights (8,742 nodes, 259,417 edges). Panel (b) corresponds to a connected component from QHap (SNP-based) formulation with mixed-sign floating-point edge weights (7,768 nodes, 3,303,740 edges). At each time step, the plotted energy value corresponds to the minimum across 200 samples. GPU-accelerated algorithms were executed on an NVIDIA GeForce RTX 5060 Laptop GPU, while CPU-based execution used an Intel(R) Core(TM) i7-14650HX processor (2.20 GHz).
  • Figure 5: The impact of SNP linkage depth on computational metrics during the SNP-based graph construction of QHap. The x-axis represents the linkage depth proportion (ranging from 1/10 to 10/10), indicating the fraction of downstream SNP loci with which each SNP forms pairwise connections during graph construction. The left y-axis (blue) denotes the running time in seconds (measuring only the core phasing algorithm execution, excluding data loading and preprocessing), while the right y-axis (red) represents the switch error rate (SE, %). The analysis was performed on CycloneSEQ data of the HG002 sample in the MHC region. The vertical dashed line indicates the linkage depth parameter adopted in this paper. The bSB parameters were adaptively configured based on graph size (500 iterations for graphs with < 5,000 nodes and 1,000 iterations otherwise, both utilizing 100 samples), with a maximum convergence count of 5. All measurements represent the mean values obtained from 10 independent runs of the algorithm. Experiments were performed on a system with an Intel(R) Core(TM) i7-14650HX processor (2.20 GHz) and an NVIDIA GeForce RTX 5060 Laptop GPU.
  • ...and 2 more figures