Quantum Masked Autoencoders for Vision Learning

Emma Andrews; Prabhat Mishra

Quantum Masked Autoencoders for Vision Learning

Emma Andrews, Prabhat Mishra

TL;DR

The paper addresses learning from partially observed data in a quantum setting by introducing Quantum Masked Autoencoders (QMAEs) that embed images into quantum states, use a learnable mask token, and employ a fidelity-based loss guided by a SWAP test. On MNIST, QMAE achieves higher fidelity reconstructions than quantum autoencoders (QAEs) and yields substantial gains in downstream classification accuracy (e.g., 65.06% vs 52.20%), with best reconstructions at a 25% mask. This work demonstrates a viable path for quantum feature learning under masking, delivering both improved quantum-state fidelity and practical improvements for quantum image processing tasks. The results suggest potential advantages in quantum data representations and masked information processing on near-term quantum hardware.

Abstract

Classical autoencoders are widely used to learn features of input data. To improve the feature learning, classical masked autoencoders extend classical autoencoders to learn the features of the original input sample in the presence of masked-out data. While quantum autoencoders exist, there is no design and implementation of quantum masked autoencoders that can leverage the benefits of quantum computing and quantum autoencoders. In this paper, we propose quantum masked autoencoders (QMAEs) that can effectively learn missing features of a data sample within quantum states instead of classical embeddings. We showcase that our QMAE architecture can learn the masked features of an image and can reconstruct the masked input image with improved visual fidelity in MNIST images. Experimental evaluation highlights that QMAE can significantly outperform (12.86% on average) in classification accuracy compared to state-of-the-art quantum autoencoders in the presence of masks.

Quantum Masked Autoencoders for Vision Learning

TL;DR

Abstract

Quantum Masked Autoencoders for Vision Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)