Table of Contents
Fetching ...

TopoFR: A Closer Look at Topology Alignment on Face Recognition

Jun Dan, Yang Liu, Jiankang Deng, Haoyu Xie, Siyuan Li, Baigui Sun, Shan Luo

TL;DR

The paper addresses how to preserve the topological structure of large-scale face data in FR models. It introduces PTSA, a perturbation-based topological structure alignment method, and SDE, a Gaussian–Uniform Mixture–driven hard-sample mining strategy, to prevent structure collapse and improve generalization. By leveraging persistent homology and a topology-aware loss, TopoFR achieves state-of-the-art or competitive results across multiple benchmarks and backbones, with public code available. This topology-guided framework offers a principled way to harness data structure in FR, potentially enhancing robustness on diverse and challenging facial variations.

Abstract

The field of face recognition (FR) has undergone significant advancements with the rise of deep learning. Recently, the success of unsupervised learning and graph neural networks has demonstrated the effectiveness of data structure information. Considering that the FR task can leverage large-scale training data, which intrinsically contains significant structure information, we aim to investigate how to encode such critical structure information into the latent space. As revealed from our observations, directly aligning the structure information between the input and latent spaces inevitably suffers from an overfitting problem, leading to a structure collapse phenomenon in the latent space. To address this problem, we propose TopoFR, a novel FR model that leverages a topological structure alignment strategy called PTSA and a hard sample mining strategy named SDE. Concretely, PTSA uses persistent homology to align the topological structures of the input and latent spaces, effectively preserving the structure information and improving the generalization performance of FR model. To mitigate the impact of hard samples on the latent space structure, SDE accurately identifies hard samples by automatically computing structure damage score (SDS) for each sample, and directs the model to prioritize optimizing these samples. Experimental results on popular face benchmarks demonstrate the superiority of our TopoFR over the state-of-the-art methods. Code and models are available at: https://github.com/modelscope/facechain/tree/main/face_module/TopoFR.

TopoFR: A Closer Look at Topology Alignment on Face Recognition

TL;DR

The paper addresses how to preserve the topological structure of large-scale face data in FR models. It introduces PTSA, a perturbation-based topological structure alignment method, and SDE, a Gaussian–Uniform Mixture–driven hard-sample mining strategy, to prevent structure collapse and improve generalization. By leveraging persistent homology and a topology-aware loss, TopoFR achieves state-of-the-art or competitive results across multiple benchmarks and backbones, with public code available. This topology-guided framework offers a principled way to harness data structure in FR, potentially enhancing robustness on diverse and challenging facial variations.

Abstract

The field of face recognition (FR) has undergone significant advancements with the rise of deep learning. Recently, the success of unsupervised learning and graph neural networks has demonstrated the effectiveness of data structure information. Considering that the FR task can leverage large-scale training data, which intrinsically contains significant structure information, we aim to investigate how to encode such critical structure information into the latent space. As revealed from our observations, directly aligning the structure information between the input and latent spaces inevitably suffers from an overfitting problem, leading to a structure collapse phenomenon in the latent space. To address this problem, we propose TopoFR, a novel FR model that leverages a topological structure alignment strategy called PTSA and a hard sample mining strategy named SDE. Concretely, PTSA uses persistent homology to align the topological structures of the input and latent spaces, effectively preserving the structure information and improving the generalization performance of FR model. To mitigate the impact of hard samples on the latent space structure, SDE accurately identifies hard samples by automatically computing structure damage score (SDS) for each sample, and directs the model to prioritize optimizing these samples. Experimental results on popular face benchmarks demonstrate the superiority of our TopoFR over the state-of-the-art methods. Code and models are available at: https://github.com/modelscope/facechain/tree/main/face_module/TopoFR.

Paper Structure

This paper contains 20 sections, 12 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: We sample 1000 (a), 5000 (b), 10000 (c) and 100000 (d) face images from the MS1MV2 dataset respectively, and compute their persistence diagrams using PH, where $H_{j}$ represents the $j$-th dimension homology. Persistence diagram mileyko2011probability is a mathematical tool to describe the topological structure of space, where the $j$-th dimension homology $H_{j}$ in persistence diagram represents the $j$-th dimension hole in space. In topology theory, if the number of high-dimensional holes in the space is more, then the underlying topological structure of the space is more complex zomorodian2004computing. As shown in Figure 1(a)-1(d), as the amount of face data increases, the persistence diagram of the input space contains more and more high-dimensional holes (e.g., $H_{3}$ and $H_{4}$). Therefore, this phenomenon demonstrates a growing complexity in the topological structure of the input space.
  • Figure 2: (a): We investigate the relationship between the amount of data and the topological structure discrepancy by employing ResNet-50 ArcFace model deng2019arcface to perform inferences on MS1MV2 training set. Inferences are conducted for 1000 iterations with batch sizes of 256, 1024, and 2048, respectively. Histograms are used to approximate these discrepancy distributions. (b): We investigate the relationship between the network depth and the topological structure discrepancy by performing inference on MS1MV2 training set (batch size=128) using ArcFace models with different backbones. (c): We investigate the trend of topological structure discrepancy during training (batch size=128) and found that i) directly using PH to align the topological structures will cause the discrepancy to drops to 0 dramatically; ii) whereas using our PTSA strategy promotes a smooth convergence of structure discrepancy. (d): Aligning the topological structures directly using PH will lead to significant discrepancy when evaluating on IJB-C benchmark. Our PTSA strategy effectively mitigates this overfitting issue, resulting in smaller structure discrepancy during evaluation.
  • Figure 3: Global overview of our proposed TopoFR. $\bigotimes$ represents the multiplication operation. $\xi$ denotes the probability of applying RSP to each training sample.
  • Figure 4: The estimated Gaussian density (blue curve) w.r.t the entropy of classification prediction. Green marker $\star$ and black marker $\times$ represent the entropy of correctly classified sample and misclassified sample, respectively.
  • Figure 5: The topological structure discrepancy of TopoFR and variant TopoFR-A under different backbones and training datasets (i.e., [Backbone, Training dataset]). Variant TopoFR-A directly utilizes PH to align the topological structures of two spaces. Notably, our TopoFR models trained with Glint360K dataset almost perfectly align the topological structures of the input space and the latent space on the IJB-C benchmark (i.e., the blue histogram almost converges to a straight line).
  • ...and 3 more figures