A Tensor Residual Circuit Neural Network Factorized with Matrix Product Operation
Andi Chen
TL;DR
This paper tackles the challenge of reducing neural network complexity without sacrificing generalization and robustness. It introduces a Tensor Circuit Neural Network (TCNN) that blends Matrix Product Operator (MPO) factorization with a residually connected, complex-valued circuit architecture, augmented by an information fusion layer that merges real and imaginary features. The approach achieves competitive parameter efficiency while delivering improved generalization and robustness on standard image datasets, including resilience to noise and parameter attacks, and is supported by ablation studies validating the architecture. The work suggests TCNN as a practical, hardware-friendly option for robust, efficient image recognition in noisy industrial settings.
Abstract
It is challenging to reduce the complexity of neural networks while maintaining their generalization ability and robustness, especially for practical applications. Conventional solutions for this problem incorporate quantum-inspired neural networks with Kronecker products and hybrid tensor neural networks with MPO factorization and fully-connected layers. Nonetheless, the generalization power and robustness of the fully-connected layers are not as outstanding as circuit models in quantum computing. In this paper, we propose a novel tensor circuit neural network (TCNN) that takes advantage of the characteristics of tensor neural networks and residual circuit models to achieve generalization ability and robustness with low complexity. The proposed activation operation and parallelism of the circuit in complex number field improves its non-linearity and efficiency for feature learning. Moreover, since the feature information exists in the parameters in both the real and imaginary parts in TCNN, an information fusion layer is proposed for merging features stored in those parameters to enhance the generalization capability. Experimental results confirm that TCNN showcases more outstanding generalization and robustness with its average accuracies on various datasets 2\%-3\% higher than those of the state-of-the-art compared models. More significantly, while other models fail to learn features under noise parameter attacking, TCNN still showcases prominent learning capability owing to its ability to prevent gradient explosion. Furthermore, it is comparable to the compared models on the number of trainable parameters and the CPU running time. An ablation study also indicates the advantage of the activation operation, the parallelism architecture and the information fusion layer.
