Table of Contents
Fetching ...

Amplitude-based Input Attribution in Quantum Learning via Integrated Gradients

Nicholas S. DiBrita, Jason Han, Younghyun Cho, Hengrui Luo, Tirthak Patel

TL;DR

HATTRIQ tackles the interpretability gap in quantum machine learning for amplitude-encoded inputs by providing a hardware-friendly, gradient-based input attribution framework. It formalizes input gradients via $F(\boldsymbol{x}; \boldsymbol{\theta}) = \langle x| U^{\dagger}(\boldsymbol{\theta}) O U(\boldsymbol{\theta}) |x\rangle$ and attribution through the integrated gradients expression $IG_i(x) = (x_i - x'_i) \int_0^1 \partial F(x' + \alpha (x - x'))/\partial x_i \, d\alpha$, and extends this to amplitude embedding with explicit gradient formulas $\partial F/\partial c_k$ and $\partial F/\partial d_k$. The Hadamard-test-based circuit construction enables exact gradient computation directly on quantum hardware, with a parallelization strategy that scales gradient evaluation using $m$ ancilla qubits to compute $2^m - 1$ components simultaneously. Empirical results across BarsAndStripes, MNIST, and FashionMNIST demonstrate faithful attributions and robustness to shot noise, underscoring practical utility for quantum models. The work advances quantum interpretability by delivering a general, scalable, hardware-compatible approach to input attribution for amplitude-encoded QML, with open-source code and potential extensions to parameter/layer attribution and mid-circuit measurements.

Abstract

Quantum machine learning (QML) algorithms have demonstrated early promise across hardware platforms, but remain difficult to interpret due to the inherent opacity of quantum state evolution. Widely used classical interpretability methods, such as integrated gradients and surrogate-based sensitivity analysis, are not directly compatible with quantum circuits due to measurement collapse and the exponential complexity of simulating state evolution. In this work, we introduce HATTRIQ, a general-purpose framework to compute amplitude-based input attribution scores in circuit-based QML models. HATTRIQ supports the widely-used input amplitude embedding feature encoding scheme and uses a Hadamard test-based construction to compute input gradients directly on quantum hardware to generate provably faithful attributions. We validate HATTRIQ on classification tasks across several datasets (Bars and Stripes, MNIST, and FashionMNIST).

Amplitude-based Input Attribution in Quantum Learning via Integrated Gradients

TL;DR

HATTRIQ tackles the interpretability gap in quantum machine learning for amplitude-encoded inputs by providing a hardware-friendly, gradient-based input attribution framework. It formalizes input gradients via and attribution through the integrated gradients expression , and extends this to amplitude embedding with explicit gradient formulas and . The Hadamard-test-based circuit construction enables exact gradient computation directly on quantum hardware, with a parallelization strategy that scales gradient evaluation using ancilla qubits to compute components simultaneously. Empirical results across BarsAndStripes, MNIST, and FashionMNIST demonstrate faithful attributions and robustness to shot noise, underscoring practical utility for quantum models. The work advances quantum interpretability by delivering a general, scalable, hardware-compatible approach to input attribution for amplitude-encoded QML, with open-source code and potential extensions to parameter/layer attribution and mid-circuit measurements.

Abstract

Quantum machine learning (QML) algorithms have demonstrated early promise across hardware platforms, but remain difficult to interpret due to the inherent opacity of quantum state evolution. Widely used classical interpretability methods, such as integrated gradients and surrogate-based sensitivity analysis, are not directly compatible with quantum circuits due to measurement collapse and the exponential complexity of simulating state evolution. In this work, we introduce HATTRIQ, a general-purpose framework to compute amplitude-based input attribution scores in circuit-based QML models. HATTRIQ supports the widely-used input amplitude embedding feature encoding scheme and uses a Hadamard test-based construction to compute input gradients directly on quantum hardware to generate provably faithful attributions. We validate HATTRIQ on classification tasks across several datasets (Bars and Stripes, MNIST, and FashionMNIST).

Paper Structure

This paper contains 25 sections, 2 theorems, 16 equations, 8 figures, 1 table.

Key Result

Lemma 3.1

For the general case, assume the amplitudes of an amplitude-encoded input are complex valued, so that each $x_k = c_k + \mathbf{i}\,d_k$. Then, the input gradients with respect to the function given in Eq. eq:model are given by the following for the real values and complex-valued components.

Figures (8)

  • Figure 1: Sample images and the accompanying integrated gradients attribution for various samples from the NIST dataset. Quantum models were trained for various binary classification tasks. Blue indicates positive attribution, red indicates negative attribution, and white indicates neutral attribution. We see patches and patterns of strong attributions for the trained classifier models.
  • Figure 2: Sample images and gradient attribution for the amplitude-embedded MNIST dataset. We have merged the attributions to show positive and negative attributions in the same image.
  • Figure 3: Sample images and attributions for the FashionMNIST dataset using amplitude encoding. We have merged the attributions to show positive and negative attributions in the same image.
  • Figure 4: Sample images and the accompanying integrated gradients attribution for the Bars and Stripes dataset. Quantum models using (a) angle encoding and (b) amplitude encoding were trained.
  • Figure 5: Integrated gradients computed using various amounts of measurement shots (samples). In (b), (c), and (d), gradient components are computed using our circuit-based approach, using 10, 100, and 1000 samples to estimate each component. Overall, we see almost no degradation in the attribution scores, as compared to the results given by exact simulation (e).
  • ...and 3 more figures

Theorems & Definitions (8)

  • Remark 2.1
  • Definition 2.2: Attribution Score
  • Lemma 3.1: Input Gradient
  • proof
  • Remark 3.2
  • Definition 4.1: Hadamard Test
  • Theorem 4.2
  • proof