Amplitude-based Input Attribution in Quantum Learning via Integrated Gradients
Nicholas S. DiBrita, Jason Han, Younghyun Cho, Hengrui Luo, Tirthak Patel
TL;DR
HATTRIQ tackles the interpretability gap in quantum machine learning for amplitude-encoded inputs by providing a hardware-friendly, gradient-based input attribution framework. It formalizes input gradients via $F(\boldsymbol{x}; \boldsymbol{\theta}) = \langle x| U^{\dagger}(\boldsymbol{\theta}) O U(\boldsymbol{\theta}) |x\rangle$ and attribution through the integrated gradients expression $IG_i(x) = (x_i - x'_i) \int_0^1 \partial F(x' + \alpha (x - x'))/\partial x_i \, d\alpha$, and extends this to amplitude embedding with explicit gradient formulas $\partial F/\partial c_k$ and $\partial F/\partial d_k$. The Hadamard-test-based circuit construction enables exact gradient computation directly on quantum hardware, with a parallelization strategy that scales gradient evaluation using $m$ ancilla qubits to compute $2^m - 1$ components simultaneously. Empirical results across BarsAndStripes, MNIST, and FashionMNIST demonstrate faithful attributions and robustness to shot noise, underscoring practical utility for quantum models. The work advances quantum interpretability by delivering a general, scalable, hardware-compatible approach to input attribution for amplitude-encoded QML, with open-source code and potential extensions to parameter/layer attribution and mid-circuit measurements.
Abstract
Quantum machine learning (QML) algorithms have demonstrated early promise across hardware platforms, but remain difficult to interpret due to the inherent opacity of quantum state evolution. Widely used classical interpretability methods, such as integrated gradients and surrogate-based sensitivity analysis, are not directly compatible with quantum circuits due to measurement collapse and the exponential complexity of simulating state evolution. In this work, we introduce HATTRIQ, a general-purpose framework to compute amplitude-based input attribution scores in circuit-based QML models. HATTRIQ supports the widely-used input amplitude embedding feature encoding scheme and uses a Hadamard test-based construction to compute input gradients directly on quantum hardware to generate provably faithful attributions. We validate HATTRIQ on classification tasks across several datasets (Bars and Stripes, MNIST, and FashionMNIST).
