DiffVein: A Unified Diffusion Network for Finger Vein Segmentation and Authentication
Yanjun Liu, Wenming Yang, Qingmin Liao
TL;DR
DiffVein introduces a unified diffusion-model framework for finger vein segmentation and authentication, coupling a segmentation branch with a denoising diffusion path to enable mutual information exchange. It adds a mask condition to guide denoising and a Semantic Difference Transformer to fuse diffusion-derived category embeddings into segmentation, guided by a Fourier-space Structural Similarity loss. Across USM and THU-MVFV3V datasets, DiffVein achieves state-of-the-art performance in both verification (EER as low as $0.089\%$) and identification (ACC up to $99.79\%$) while delivering superior segmentation topology (high clDice scores). These results demonstrate the practical potential of cross-task diffusion-driven biometric recognition, with ablation confirming the contributions of diffusion, conditioning, FourierSIM, and SD-Former to overall gains.
Abstract
Finger vein authentication, recognized for its high security and specificity, has become a focal point in biometric research. Traditional methods predominantly concentrate on vein feature extraction for discriminative modeling, with a limited exploration of generative approaches. Suffering from verification failure, existing methods often fail to obtain authentic vein patterns by segmentation. To fill this gap, we introduce DiffVein, a unified diffusion model-based framework which simultaneously addresses vein segmentation and authentication tasks. DiffVein is composed of two dedicated branches: one for segmentation and the other for denoising. For better feature interaction between these two branches, we introduce two specialized modules to improve their collective performance. The first, a mask condition module, incorporates the semantic information of vein patterns from the segmentation branch into the denoising process. Additionally, we also propose a Semantic Difference Transformer (SD-Former), which employs Fourier-space self-attention and cross-attention modules to extract category embedding before feeding it to the segmentation task. In this way, our framework allows for a dynamic interplay between diffusion and segmentation embeddings, thus vein segmentation and authentication tasks can inform and enhance each other in the joint training. To further optimize our model, we introduce a Fourier-space Structural Similarity (FourierSIM) loss function, which is tailored to improve the denoising network's learning efficacy. Extensive experiments on the USM and THU-MVFV3V datasets substantiates DiffVein's superior performance, setting new benchmarks in both vein segmentation and authentication tasks.
