Table of Contents
Fetching ...

On the connection between least squares, regularization, and classical shadows

Zhihui Zhu, Joseph M. Lukens, Brian T. Kirby

TL;DR

The paper unifies LS, RLS, and classical shadows under a shared shadow framework, revealing that LS shadows form the baseline while CS acts as a channel-inversion regularizer and RLS introduces an $\ell_2$ stabilizer. It shows CS shadows are unbiased but can exhibit high variance, whereas RLS shadows reduce variance at the cost of bias and are more robust to distribution mismatches; multishot analysis indicates CS performs best with single-shot measurements, while RLS displays stronger sensitivity to the number of shots. The results provide both conceptual insight and practical guidance on when to prefer CS or RLS for quantum state estimation, highlighting CS’s computational efficiency and scalability in large systems. Collectively, this work clarifies the connections between LS, RLS, and CS, offering a unified perspective and concrete tradeoffs for quantum shadow-based estimation.

Abstract

Classical shadows (CS) offer a resource-efficient means to estimate quantum observables, circumventing the need for exhaustive state tomography. Here, we clarify and explore the connection between CS techniques and least squares (LS) and regularized least squares (RLS) methods commonly used in machine learning and data analysis. By formal identification of LS and RLS "shadows" completely analogous to those in CS -- namely, point estimators calculated from the empirical frequencies of single measurements -- we show that both RLS and CS can be viewed as regularizers for the underdetermined regime, replacing the pseudoinverse with invertible alternatives. Through numerical simulations, we evaluate RLS and CS from three distinct angles: the tradeoff in bias and variance, mismatch between the expected and actual measurement distributions, and the interplay between the number of measurements and number of shots per measurement. Compared to CS, RLS attains lower variance at the expense of bias, is robust to distribution mismatch, and is more sensitive to the number of shots for a fixed number of state copies -- differences that can be understood from the distinct approaches taken to regularization. Conceptually, our integration of LS, RLS, and CS under a unifying "shadow" umbrella aids in advancing the overall picture of CS techniques, while practically our results highlight the tradeoffs intrinsic to these measurement approaches, illuminating the circumstances under which either RLS or CS would be preferred, such as unverified randomness for the former or unbiased estimation for the latter.

On the connection between least squares, regularization, and classical shadows

TL;DR

The paper unifies LS, RLS, and classical shadows under a shared shadow framework, revealing that LS shadows form the baseline while CS acts as a channel-inversion regularizer and RLS introduces an stabilizer. It shows CS shadows are unbiased but can exhibit high variance, whereas RLS shadows reduce variance at the cost of bias and are more robust to distribution mismatches; multishot analysis indicates CS performs best with single-shot measurements, while RLS displays stronger sensitivity to the number of shots. The results provide both conceptual insight and practical guidance on when to prefer CS or RLS for quantum state estimation, highlighting CS’s computational efficiency and scalability in large systems. Collectively, this work clarifies the connections between LS, RLS, and CS, offering a unified perspective and concrete tradeoffs for quantum shadow-based estimation.

Abstract

Classical shadows (CS) offer a resource-efficient means to estimate quantum observables, circumventing the need for exhaustive state tomography. Here, we clarify and explore the connection between CS techniques and least squares (LS) and regularized least squares (RLS) methods commonly used in machine learning and data analysis. By formal identification of LS and RLS "shadows" completely analogous to those in CS -- namely, point estimators calculated from the empirical frequencies of single measurements -- we show that both RLS and CS can be viewed as regularizers for the underdetermined regime, replacing the pseudoinverse with invertible alternatives. Through numerical simulations, we evaluate RLS and CS from three distinct angles: the tradeoff in bias and variance, mismatch between the expected and actual measurement distributions, and the interplay between the number of measurements and number of shots per measurement. Compared to CS, RLS attains lower variance at the expense of bias, is robust to distribution mismatch, and is more sensitive to the number of shots for a fixed number of state copies -- differences that can be understood from the distinct approaches taken to regularization. Conceptually, our integration of LS, RLS, and CS under a unifying "shadow" umbrella aids in advancing the overall picture of CS techniques, while practically our results highlight the tradeoffs intrinsic to these measurement approaches, illuminating the circumstances under which either RLS or CS would be preferred, such as unverified randomness for the former or unbiased estimation for the latter.
Paper Structure (13 sections, 2 theorems, 23 equations, 7 figures, 1 table)

This paper contains 13 sections, 2 theorems, 23 equations, 7 figures, 1 table.

Key Result

Lemma 1

The LS estimator $\widehat{\boldsymbol{\rho}}$ is always Hermitian. Moreover, if $\widehat{\boldsymbol{p}}$ lies in the range space of $\mathcal{A}$, then the LS estimator $\widehat{\boldsymbol{\rho}}$ also has trace $1$.

Figures (7)

  • Figure 1: Shadow picture of quantum estimation. POVMs $\{\mathcal{A}_1,\mathcal{A}_2,\ldots,\mathcal{A}_M\}\equiv\mathcal{A}$ are measured via repeated preparation of a ground truth quantum state $\boldsymbol{\rho}$. The observed frequencies for each POVM produce a single shadow state $\widehat{\boldsymbol{\rho}}_m = \mathcal{S}\left(\mathcal{A}_m^\dagger(\widehat{\boldsymbol{p}}_m)\right)$, the collection of which are averaged for the final estimate $\widehat{\boldsymbol{\rho}}$. The only difference between each technique lies in the specific shadow operation chosen: (i) least squares (LS) performs the (pseudo)inverse on the POVMs directly; (ii) regularized least squares (RLS) ensures invertibility through the addition of a term proportional to the identity; and (iii) classical shadows (CS) inverts according to a simulated channel $\mathcal{M}$ defined in expectation over all possible measurement settings.
  • Figure 2: Illustration of the performance of the LS shadow for estimating the state $\boldsymbol{\rho}$ and the linear observables $\lambda_i = \operatorname{tr}{\boldsymbol{\Lambda}_i \boldsymbol{\rho}}$ with $\boldsymbol{\Lambda}_i = \boldsymbol{\phi}_i \boldsymbol{\phi}_i^\dagger$, where $\boldsymbol{\phi}_0 = \boldsymbol{e}_0, \boldsymbol{\phi}_1 = \frac{1}{\sqrt{2}}\boldsymbol{e}_0 + \frac{1}{\sqrt{2(D-1)}} \sum_{j=1}^{D-1}\boldsymbol{e}_j, \boldsymbol{\phi}_2 = \boldsymbol{e}_1$: (a) the sum of positive eigenvalues and negative eigenvalues of $\widehat{\boldsymbol{\rho}}$, and $\|\widehat{\boldsymbol{\rho}} - \boldsymbol{\rho}\|_F$, (b-d) $\widehat{\lambda}_i$ (estimator for $\lambda_i$) from 50 independent trials, and the corresponding MSE $(\widehat{\lambda}_i - \lambda_i)^2$ averaged over the 50 trials.
  • Figure 3: Illustration of the performance of the RLS shadow with different regularization parameter $\mu$ for estimating the state $\boldsymbol{\rho}$ and the three linear observables as in Fig. \ref{['fig:LS']}.
  • Figure 4: Illustration of the performance of CS and RLS for estimating the state $\boldsymbol{\rho}$ and the linear observables $\lambda_i = \operatorname{tr}{\boldsymbol{\Lambda}_i \boldsymbol{\rho}}$ with $\boldsymbol{\Lambda}_i = \boldsymbol{\phi}_i \boldsymbol{\phi}_i^\dagger$.
  • Figure 5: (a) Illustration of the performance of the RLS and classical shadow for estimating $50$ linear observables $\lambda = \operatorname{tr}(\boldsymbol{\Lambda} \boldsymbol{\rho}),\boldsymbol{\Lambda} = \boldsymbol{\phi} \boldsymbol{\phi}^\dagger$, where $\boldsymbol{\phi}$ is randomly and uniformly generated from the unit sphere. (b) the probability distribution (i.e., probability density function (PDF)) $P(\lambda) = (D-1)(1-\lambda)^{D-2}$ for such a random linear observable $\lambda$.
  • ...and 2 more figures

Theorems & Definitions (4)

  • Lemma 1
  • proof
  • Theorem 1
  • proof : Proof of \ref{['lemma:MSE-multi-shots']}