Stratified Sampling for Quasi-Probability Decompositions

Joshua W. Dai; Bálint Koczor

Stratified Sampling for Quasi-Probability Decompositions

Joshua W. Dai, Bálint Koczor

TL;DR

This work addresses the extra variance introduced by product-form quasi-probability decompositions (QPDs) in quantum algorithms. It frames circuit-level randomisation as a classical sampling problem and proves that stratified sampling with ideal proportional quotas yields unbiased estimators that are never worse than naïve sampling, with potential constant-factor improvements. The authors introduce counts-vector stratification, a permutation-invariant statistic, and a dynamic-programming backbone that exactly computes stratum masses and enables conditional sampling, achieving substantial variance reductions in both PAI and PEC benchmarks, especially in oracle-like regimes. The approach requires only classical pre- and post-processing and preserves existing QPD resources, offering a practical, scalable method to reduce sampling costs in near-term and early fault-tolerant quantum protocols. Potential extensions include coarsening, adaptive pilot allocations, and hierarchical stratification to broaden applicability to larger, more complex QPDs.

Abstract

Quasi-probability decompositions (QPDs) have proven essential in many quantum algorithms and protocols -- one replaces a ``difficult'' quantum circuit with an ensemble of ``easier'' circuit variants whose weighted outcomes reproduce any target observable. This, however, inevitably yields an increased configuration variance beyond Born-rule shot noise. We develop a broad framework for accounting for and reducing this variance and prove that stratified sampling -- under ideal proportional allocation -- results in an unbiased estimator with a variance that is never worse than naïve sampling (with equality only in degenerate cases). Furthermore, we provide a classical dynamic programme to enable stratification on arbitrary product-form QPDs. Numerical simulations of typical QPDs, such as Probabilistic Error Cancellation (PEC) and Probabilistic Angle Interpolation (PAI), demonstrate constant-factor reductions in overall variance (up to $\sim 60$--$80\%$ in an oracle model) and robust $\sim 10\%$ savings in the pessimistic single-shot regime. Our results can be applied immediately to reduce the net sampling cost of practically relevant QPDs that are commonly used in near term and early fault-tolerant algorithms without requiring additional quantum resources.

Stratified Sampling for Quasi-Probability Decompositions

TL;DR

Abstract

in an oracle model) and robust

savings in the pessimistic single-shot regime. Our results can be applied immediately to reduce the net sampling cost of practically relevant QPDs that are commonly used in near term and early fault-tolerant algorithms without requiring additional quantum resources.

Paper Structure (98 sections, 7 theorems, 125 equations, 10 figures, 5 algorithms)

This paper contains 98 sections, 7 theorems, 125 equations, 10 figures, 5 algorithms.

Introduction
Product-form QPDs: variance decomposition and stratified sampling
Local and circuit-level QPDs
Per-sample estimator, configuration samples, and repetitions per configuration
Variance decomposition and naïve design
Stratified sampling over QPD configurations
Remark (QPD overhead unchanged).
Counts-vector stratification for QPDs
Padding to a common width.
Counts-vector strata.
DP preprocessing and conditional sampling.
Complexity caveat and routes to scalability.
Generality and problem-agnosticism.
Practical stratified sampling recipe
Numerical Results & Discussion
...and 83 more sections

Key Result

Theorem 1

Consider a QPD estimator $\widehat{Y}_K$ with a total budget of $K$ configurations. Let $w_s$ be the probability of a configuration falling into stratum $s$, and let $\mu_s$ and $\sigma_s^2$ denote the mean and variance of the estimator within that stratum, respectively. Under an idealized proportio Consequently, proportional stratification provides a strictly lower variance whenever the stratum m

Figures (10)

Figure 1: Schematic of product-form QPD sampling and counts-vector stratification on a toy three-gate circuit. (a) Each ideal gate admits a local QPD over implementable primitives, so a circuit configuration is an index string $\underline\ell$. (b) Naïve QPD sampling draws $\underline\ell$ directly from $p(\underline\ell)\propto |g(\underline\ell)|$. (c) Counts-vector stratification groups configurations by the number of occurrences of each local primitive, yielding strata indexed by a counts vector $\mathbf M$ and a marginal distribution over strata obtained by aggregating the configuration masses.
Figure 2: PAI on TFIM Trotter circuits ($n=6$, $h=0.6$, $J=0.7$, $t=1.0$) estimating $\langle X_1\rangle$. For each depth $L$ we draw $K=8192$ configurations and report empirical variance ratios under naïve and counts-vector proportional designs in the oracle (red) and single-shot (blue) models; the green curve shows $R=64$. Bands denote bootstrap confidence intervals. The dashed orange curve shows the circuit QPD 1-norm $\|g\|_1$ on a secondary axis. The circuit depths are given by $\nu = 12L$.
Figure 3: PEC on the same TFIM Trotter circuits as in Fig. \ref{['fig:trotter PAI']}, with gate-independent single-qubit depolarising noise ($p=0.01$) after each unitary. We draw $K=8192$ configurations per $L$ and report empirical variance ratios with bootstrap confidence bands. The dashed orange curve shows the circuit QPD 1-norm $\|g\|_1$ on a secondary axis. The circuit depths are given by $\nu = 18L$.
Figure 4: State-space scaling of the cached counts-vector DP. Shown is the number of reachable DP states $\sum_{i=0}^{\nu}\binom{i+d-1}{d-1}$, i.e. the size of the cached forward table required to compute exact counts-marginals and to enable exact backward conditional sampling. As a simple architecture-agnostic proxy, we annotate 1 GB/1 TB reference lines assuming one float64 (8 bytes) stored per state. Real implementations typically incur constant-factor overheads (data type, number of cached layers, metadata, etc.), or benefit from sparse/compressed representations, but the state count captures the fundamental scaling with $(\nu,d)$.
Figure 5: Counts-vector stratum mass concentration for TFIM--PAI. For each Trotter depth $L$, we compute the counts-vector stratum weights $w_{\mathbf m}=\Pr(\mathbf M=\mathbf m)$ (Poisson--multinomial marginals) and sort strata by decreasing mass $w_{(1)}\ge w_{(2)}\ge\cdots$. Shown is the cumulative mass $\sum_{i\le t} w_{(i)}$ versus sorted index $t$ (log-scale). The horizontal dashed line marks 0.99, and the vertical dashed line marks the smallest index $t_{0.99}$ such that $\sum_{i\le t_{0.99}} w_{(i)}\ge 0.99$. Each panel reports the threshold $t_{0.99}$ and the ratio $t_{0.99}/|\mathcal{S}|$ (labelled 'Thresh' and 'Ratio' respectively), where $|\mathcal{S}|$ is the number of reachable DP states (counts-vector strata). In this TFIM--PAI instance ($d=3$ at every location), $|\mathcal{S}|=\binom{\nu+d-1}{d-1}$. Across depths, $99\%$ of the stratum mass is supported on only $\sim 2\%$--$9\%$ of strata, illustrating a pronounced head--tail structure and motivating truncated/coarsened variants of the exact DP.
...and 5 more figures

Theorems & Definitions (12)

Theorem 1: Proportional Stratification is Never Worse than Naïve Sampling
proof
Corollary 1: Sample complexity under proportional stratification
Proposition 1: Exact unbiasedness with a residual stratum
proof
Lemma 1: Borrowing cannot exhaust donors
proof
Proposition 2: Forward DP correctness
proof
Proposition 3: Backward sampler correctness
...and 2 more

Stratified Sampling for Quasi-Probability Decompositions

TL;DR

Abstract

Stratified Sampling for Quasi-Probability Decompositions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (12)