QClusformer: A Quantum Transformer-based Framework for Unsupervised Visual Clustering
Xuan-Bac Nguyen, Hoang-Quan Nguyen, Samuel Yen-Chi Chen, Samee U. Khan, Hugh Churchill, Khoa Luu
TL;DR
This work tackles the heavy computation involved in unsupervised visual clustering on large unlabeled datasets by introducing QClusformer, a quantum transformer-based framework that employs parameterized quantum circuits for self-attention and quantum feature encoding. It presents a complete end-to-end pipeline including amplitude encoding, cosine similarity-based sequence representation, and a clustering-oriented loss to identify hard samples within clusters. Empirical results on MS-Celeb-1M show clear improvements over classical baselines, while DeepFashion results remain competitive, demonstrating the practical viability of quantum-assisted clustering for large-scale vision tasks. The study highlights the potential of quantum transformer architectures to enhance unsupervised visual clustering and motivates further exploration of quantum resources in vision analytics.
Abstract
Unsupervised vision clustering, a cornerstone in computer vision, has been studied for decades, yielding significant outcomes across numerous vision tasks. However, these algorithms involve substantial computational demands when confronted with vast amounts of unlabeled data. Conversely, quantum computing holds promise in expediting unsupervised algorithms when handling large-scale databases. In this study, we introduce QClusformer, a pioneering Transformer-based framework leveraging quantum machines to tackle unsupervised vision clustering challenges. Specifically, we design the Transformer architecture, including the self-attention module and transformer blocks, from a quantum perspective to enable execution on quantum hardware. In addition, we present QClusformer, a variant based on the Transformer architecture, tailored for unsupervised vision clustering tasks. By integrating these elements into an end-to-end framework, QClusformer consistently outperforms previous methods running on classical computers. Empirical evaluations across diverse benchmarks, including MS-Celeb-1M and DeepFashion, underscore the superior performance of QClusformer compared to state-of-the-art methods.
