CSANet: Channel Spatial Attention Network for Robust 3D Face Alignment and Reconstruction
Yilin Liu, Xuezhou Guo, Xinqi Wang, Fangzhou Du
TL;DR
CSANet addresses robust 3D face alignment and reconstruction from single 2D images, focusing on occlusion and lighting challenges. It augments a lightweight bottleneck backbone with Spatial Group-wise Enhancement and Coordinate Attention to refine features, and uses a joint Wing Loss and WPDC objective to stabilize 3DMM parameter learning. The approach yields superior accuracy, especially under large poses, and demonstrates faster training compared to baselines like 3DDFA, with competitive reconstruction quality on AFLW/AFLW2000-3D. This work provides a practical, attention-based framework for efficient, robust 3D face modeling suitable for real-world applications.
Abstract
Our project proposes an end-to-end 3D face alignment and reconstruction network. The backbone of our model is built by Bottle-Neck structure via Depth-wise Separable Convolution. We integrate Coordinate Attention mechanism and Spatial Group-wise Enhancement to extract more representative features. For more stable training process and better convergence, we jointly use Wing loss and the Weighted Parameter Distance Cost to learn parameters for 3D Morphable model and 3D vertices. Our proposed model outperforms all baseline models both quantitatively and qualitatively.
