Table of Contents
Fetching ...

A Resource-Aligned Hybrid Quantum-Classical Framework for Multimodal Face Anti-Spoofing

Wanqi Sun, Jungang Xu, Chenghua Duan

Abstract

Embedding high-dimensional data into resource-limited quantum devices remains a significant challenge for practical quantum machine learning. In multimodal face anti-spoofing, while linear compression methods such as principal component analysis can reduce dimensionality to accommodate limited quantum budgets, such approaches often lose critical high-order cross-modal correlations due to the loss of structural information. To this end, we propose a hybrid Matrix Product State (MPS)-Variational Quantum Circuit (VQC) framework, where the MPS serves as a structured, differentiable pre-quantum compression and fusion module, and the VQC acts as the quantum classifier. Built upon the low-rank structure controlled by the virtual bond dimension and integrated with a configurable nonlinear enhancement mechanism, this MPS module explicitly models long-range cross-modal correlations while compressing multimodal data into a compact representation matching the quantum budget and improving numerical stability under extreme compression. Experiments on the CASIA-SURF benchmark demonstrate that MPS-VQC achieves accuracy comparable to strong classical neural network baselines with fewer than 0.25M parameters, highlighting the parameter efficiency of tensor-network representations for high-dimensional multimodal data under tight resource budgets. Leveraging the intrinsic compatibility between MPS structures and quantum circuit topology, this framework not only provides a viable technological pathway for efficient multimodal anti-spoofing on NISQ devices but also serves as a stepping stone toward fully quantum implementations of such tasks in the future.

A Resource-Aligned Hybrid Quantum-Classical Framework for Multimodal Face Anti-Spoofing

Abstract

Embedding high-dimensional data into resource-limited quantum devices remains a significant challenge for practical quantum machine learning. In multimodal face anti-spoofing, while linear compression methods such as principal component analysis can reduce dimensionality to accommodate limited quantum budgets, such approaches often lose critical high-order cross-modal correlations due to the loss of structural information. To this end, we propose a hybrid Matrix Product State (MPS)-Variational Quantum Circuit (VQC) framework, where the MPS serves as a structured, differentiable pre-quantum compression and fusion module, and the VQC acts as the quantum classifier. Built upon the low-rank structure controlled by the virtual bond dimension and integrated with a configurable nonlinear enhancement mechanism, this MPS module explicitly models long-range cross-modal correlations while compressing multimodal data into a compact representation matching the quantum budget and improving numerical stability under extreme compression. Experiments on the CASIA-SURF benchmark demonstrate that MPS-VQC achieves accuracy comparable to strong classical neural network baselines with fewer than 0.25M parameters, highlighting the parameter efficiency of tensor-network representations for high-dimensional multimodal data under tight resource budgets. Leveraging the intrinsic compatibility between MPS structures and quantum circuit topology, this framework not only provides a viable technological pathway for efficient multimodal anti-spoofing on NISQ devices but also serves as a stepping stone toward fully quantum implementations of such tasks in the future.

Paper Structure

This paper contains 16 sections, 19 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of the proposed hybrid quantum--classical multimodal face anti-spoofing framework. RGB, Depth, and IR inputs are processed by modality-specific feature extractors with a Residual Network backbone and Squeeze-and-Excitation blocks to prepare unimodal embeddings. The embeddings are concatenated and fed into an MPS-based fusion and compression module, which performs structured fusion and differentiable dimensionality reduction and outputs a compact representation for quantum state encoding. A variational quantum circuit then performs the final classification between bona fide (live) and attack (spoof) samples.
  • Figure 2: Local SVD update for shifting the orthogonality center in a mixed-canonical MPS. At site $n_c$, the center tensor is reshaped into a matrix $M^{[n_c]}$ and factorized as $M^{[n_c]}=U\,\Gamma\,V$. After truncating the singular spectrum to a target virtual bond dimension, $U$ is absorbed into the left-canonical part, while the product $V$ is contracted into the adjacent right tensor. This operation transfers the orthogonality center from site $n_c$ to $n_c+1$ and updates the virtual bond dimension.
  • Figure 3: Chain-like and brick-wall structure circuit diagrams. The blocks $U^{[i,j]}$ represent two-qubit gates acting on qubits $i$ and $j$.
  • Figure 4: Training loss stability comparison.(a) Training loss for models with nominal bond dimensions $\chi$ that are truncated during training to a maximum virtual bond dimension $\chi_{\max}=4$. (b) Training loss for the same nominal $\chi$ values without any bond dimension truncation. In both panels, dashed lines denote Standard MPS and solid lines denote Activated MPS; Activated MPS shows smoother convergence.
  • Figure 5: ACER performance comparison across different MPS-compressed feature dimensions and training data ratios. From left to right, the subplots show ACER evaluated for models trained with MPS-compressed features with dimension $D_{\mathrm{fused}}=4,8,16,32$, respectively. Each colored line corresponds to one model: the classical MLP baseline or a VQC with $N_q=4,6,8,12$ qubits. The training data ratio takes the values 0.1, 0.3, 0.5, 0.7, and 1.0, corresponding to using 10%, 30%, 50%, 70%, and 100% of the training set, respectively.