A Lightweight Complex-Valued Deformable CNN for High-Quality Computer-Generated Holography
Shuyang Xie, Jie Zhou, Bo Xu, Jun Wang, Renjing Xu
TL;DR
This study tackles the challenge of limited ERF in CGH by proposing a lightweight complex-valued deformable CNN (DeNet) that uses deformable convolutions to adapt receptive fields for global diffraction effects. By feeding the ASM-propagated complex amplitude as input and employing a complex-valued U-Net with deformable layers, the model achieves higher PSNR with far fewer parameters than prior open-source methods, including improvements of up to $2.04$, $5.31$, and $9.71$ dB at $1920\times1072$ over CCNN-CGH, HoloNet, and Holo-encoder. The approach is validated through simulations and optical experiments, showing superior reconstruction fidelity with efficient inference suitable for real-time CGH. This contributes a practical, resource-efficient pathway toward high-quality holographic displays in AR/VR contexts.
Abstract
Holographic displays have significant potential in virtual reality and augmented reality owing to their ability to provide all the depth cues. Deep learning-based methods play an important role in computer-generated holography (CGH). During the diffraction process, each pixel exerts an influence on the reconstructed image. However, previous works face challenges in capturing sufficient information to accurately model this process, primarily due to the inadequacy of their effective receptive field (ERF). Here, we designed complex-valued deformable convolution for integration into network, enabling dynamic adjustment of the convolution kernel's shape to increase flexibility of ERF for better feature extraction. This approach allows us to utilize a single model while achieving state-of-the-art performance in both simulated and optical experiment reconstructions, surpassing existing open-source models. Specifically, our method has a peak signal-to-noise ratio that is 2.04 dB, 5.31 dB, and 9.71 dB higher than that of CCNN-CGH, HoloNet, and Holo-encoder, respectively, when the resolution is 1920$\times$1072. The number of parameters of our model is only about one-eighth of that of CCNN-CGH.
