Quantum Gradient Class Activation Map for Model Interpretability
Hsin-Yi Lin, Huan-Hsin Tseng, Samuel Yen-Chi Chen, Shinjae Yoo
TL;DR
This work addresses interpretability in quantum machine learning by introducing QGrad-CAM, a framework that uses a Variational Quantum Circuit to assign importance to CNN activation maps through gradient-based localization. It derives an explicit activation-map weighting formula $w_k^{\ell} = \frac{1}{WH}\sum_{i,j} \frac{\partial f^{\ell}(A_{ij}^k)}{\partial A_{ij}^k}$ and outlines its derivation via density-matrix expansion and Lie brackets, enabling Grad-CAM-style explanations for quantum components. The method is validated on image datasets (MNIST, Dogs vs Cats) and a speech corpus (TIMIT), producing class-specific localization maps and, in speech, revealing when the model attends to background regions to detect noise. The results suggest that quantum-classical hybrids can offer transparent, computable explanations and motivate future work on leveraging quantum techniques for interpretability.
Abstract
Quantum machine learning (QML) has recently made significant advancements in various topics. Despite the successes, the safety and interpretability of QML applications have not been thoroughly investigated. This work proposes using Variational Quantum Circuits (VQCs) for activation mapping to enhance model transparency, introducing the Quantum Gradient Class Activation Map (QGrad-CAM). This hybrid quantum-classical computing framework leverages both quantum and classical strengths and gives access to the derivation of an explicit formula of feature map importance. Experimental results demonstrate significant, fine-grained, class-discriminative visual explanations generated across both image and speech datasets.
