Table of Contents
Fetching ...

Quantum Masked Autoencoders for Vision Learning

Emma Andrews, Prabhat Mishra

TL;DR

The paper addresses learning from partially observed data in a quantum setting by introducing Quantum Masked Autoencoders (QMAEs) that embed images into quantum states, use a learnable mask token, and employ a fidelity-based loss guided by a SWAP test. On MNIST, QMAE achieves higher fidelity reconstructions than quantum autoencoders (QAEs) and yields substantial gains in downstream classification accuracy (e.g., 65.06% vs 52.20%), with best reconstructions at a 25% mask. This work demonstrates a viable path for quantum feature learning under masking, delivering both improved quantum-state fidelity and practical improvements for quantum image processing tasks. The results suggest potential advantages in quantum data representations and masked information processing on near-term quantum hardware.

Abstract

Classical autoencoders are widely used to learn features of input data. To improve the feature learning, classical masked autoencoders extend classical autoencoders to learn the features of the original input sample in the presence of masked-out data. While quantum autoencoders exist, there is no design and implementation of quantum masked autoencoders that can leverage the benefits of quantum computing and quantum autoencoders. In this paper, we propose quantum masked autoencoders (QMAEs) that can effectively learn missing features of a data sample within quantum states instead of classical embeddings. We showcase that our QMAE architecture can learn the masked features of an image and can reconstruct the masked input image with improved visual fidelity in MNIST images. Experimental evaluation highlights that QMAE can significantly outperform (12.86% on average) in classification accuracy compared to state-of-the-art quantum autoencoders in the presence of masks.

Quantum Masked Autoencoders for Vision Learning

TL;DR

The paper addresses learning from partially observed data in a quantum setting by introducing Quantum Masked Autoencoders (QMAEs) that embed images into quantum states, use a learnable mask token, and employ a fidelity-based loss guided by a SWAP test. On MNIST, QMAE achieves higher fidelity reconstructions than quantum autoencoders (QAEs) and yields substantial gains in downstream classification accuracy (e.g., 65.06% vs 52.20%), with best reconstructions at a 25% mask. This work demonstrates a viable path for quantum feature learning under masking, delivering both improved quantum-state fidelity and practical improvements for quantum image processing tasks. The results suggest potential advantages in quantum data representations and masked information processing on near-term quantum hardware.

Abstract

Classical autoencoders are widely used to learn features of input data. To improve the feature learning, classical masked autoencoders extend classical autoencoders to learn the features of the original input sample in the presence of masked-out data. While quantum autoencoders exist, there is no design and implementation of quantum masked autoencoders that can leverage the benefits of quantum computing and quantum autoencoders. In this paper, we propose quantum masked autoencoders (QMAEs) that can effectively learn missing features of a data sample within quantum states instead of classical embeddings. We showcase that our QMAE architecture can learn the masked features of an image and can reconstruct the masked input image with improved visual fidelity in MNIST images. Experimental evaluation highlights that QMAE can significantly outperform (12.86% on average) in classification accuracy compared to state-of-the-art quantum autoencoders in the presence of masks.

Paper Structure

This paper contains 17 sections, 9 equations, 8 figures, 1 table, 1 algorithm.

Figures (8)

  • Figure 1: Classical autoencoder architecture.
  • Figure 2: Quantum autoencoder architecture.
  • Figure 3: Two-qubit interaction circuit, originally proposed for image compression QAEs by Wang et al. wang2024quantum. The circuit consists of 9 parameterized $R_Z$ gates, 6 parameterized $R_Y$ gates, and 3 CNOT gates.
  • Figure 4: QMAE architecture. The original image is masked and embedded as input to the encoder $U(\theta)$. The decoder takes the compressed representation and reconstructs the image, with learned features. A SWAP test is then performed between the reconstructed image and the original input image to get the fidelity.
  • Figure 5: SWAP test to measure fidelity of two states located on qubit wires 1 and 2.
  • ...and 3 more figures