Table of Contents
Fetching ...

Cranio-ID: Graph-Based Craniofacial Identification via Automatic Landmark Annotation in 2D Multi-View X-rays

Ravi Shankar Prasad, Nandani Sharma, Dinesh Singh

TL;DR

Cranio-ID tackles forensic craniofacial identification by automating landmark annotation on 2D skulls and enabling robust cross-modal skull–face and sketch–face matching. It builds patches around landmarks as graph nodes, encodes them with a GCN alongside global ViT features, and aligns skull and face modalities through cross-attention and entropic-regularized optimal transport within a triplet-learning framework. The approach is evaluated on S2F and CUHK, showing significant gains in recall and mAP over baselines and revealing domain gaps that favor sketch–face matching over skull–face. The results demonstrate the method's potential for fast, reliable landmark annotation and cross-domain identification in forensic contexts, with implications for both practical identification and future reconstruction tasks.

Abstract

In forensic craniofacial identification and in many biomedical applications, craniometric landmarks are important. Traditional methods for locating landmarks are time-consuming and require specialized knowledge and expertise. Current methods utilize superimposition and deep learning-based methods that employ automatic annotation of landmarks. However, these methods are not reliable due to insufficient large-scale validation studies. In this paper, we proposed a novel framework Cranio-ID: First, an automatic annotation of landmarks on 2D skulls (which are X-ray scans of faces) with their respective optical images using our trained YOLO-pose models. Second, cross-modal matching by formulating these landmarks into graph representations and then finding semantic correspondence between graphs of these two modalities using cross-attention and optimal transport framework. Our proposed framework is validated on the S2F and CUHK datasets (CUHK dataset resembles with S2F dataset). Extensive experiments have been conducted to evaluate the performance of our proposed framework, which demonstrates significant improvements in both reliability and accuracy, as well as its effectiveness in cross-domain skull-to-face and sketch-to-face matching in forensic science.

Cranio-ID: Graph-Based Craniofacial Identification via Automatic Landmark Annotation in 2D Multi-View X-rays

TL;DR

Cranio-ID tackles forensic craniofacial identification by automating landmark annotation on 2D skulls and enabling robust cross-modal skull–face and sketch–face matching. It builds patches around landmarks as graph nodes, encodes them with a GCN alongside global ViT features, and aligns skull and face modalities through cross-attention and entropic-regularized optimal transport within a triplet-learning framework. The approach is evaluated on S2F and CUHK, showing significant gains in recall and mAP over baselines and revealing domain gaps that favor sketch–face matching over skull–face. The results demonstrate the method's potential for fast, reliable landmark annotation and cross-domain identification in forensic contexts, with implications for both practical identification and future reconstruction tasks.

Abstract

In forensic craniofacial identification and in many biomedical applications, craniometric landmarks are important. Traditional methods for locating landmarks are time-consuming and require specialized knowledge and expertise. Current methods utilize superimposition and deep learning-based methods that employ automatic annotation of landmarks. However, these methods are not reliable due to insufficient large-scale validation studies. In this paper, we proposed a novel framework Cranio-ID: First, an automatic annotation of landmarks on 2D skulls (which are X-ray scans of faces) with their respective optical images using our trained YOLO-pose models. Second, cross-modal matching by formulating these landmarks into graph representations and then finding semantic correspondence between graphs of these two modalities using cross-attention and optimal transport framework. Our proposed framework is validated on the S2F and CUHK datasets (CUHK dataset resembles with S2F dataset). Extensive experiments have been conducted to evaluate the performance of our proposed framework, which demonstrates significant improvements in both reliability and accuracy, as well as its effectiveness in cross-domain skull-to-face and sketch-to-face matching in forensic science.

Paper Structure

This paper contains 21 sections, 20 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Sample image showing landmark localization on the face (a) and skull (b); where (b) and (c) show semantic correspondence between landmarks on skull and graph skeletons from two views: Front and Side. Total 18 landmarks are localized for the front and 13 for the side face and skull image respectively.
  • Figure 2: Sample image of X-ray dataset used, where soft tissue eliminated images are shown down to their respective raw images.
  • Figure 3: The proposed framework consists of five stages. In the first stage, the face and skull regions are detected, and keypoints are localized using a YOLO pose-based model. In the second stage, patches are extracted around the corresponding keypoints, and feature representations are obtained from these patches. In the third stage, each patch is treated as a node, and the connections between the nearest nodes are defined as edges, forming a graph representation. In the fourth stage, a GCNs is employed to extract high-level features from the respective graphs and this graph skull features ($z_s$) and face features ($z_f$) are then concatenated with the global features of skull ($f_{s}$) and face ($f_{f}$). Then, in the fifth stage, concatenated features of skull ($F_{gs}$) and face ($F_{gf}$) are refined through a cross-attention and optimal transport modules to establish semantic correspondence between two modalities to obtain the optimal mapping $T^{*}$. (Best viewed in colors)
  • Figure 4: Predictions on test dataset samples for landmark localization on S2F dataset and CUHK dataset. Where red one represents the prediction results while blue represents the ground truth.
  • Figure 5: Top-10 retrieval results for given query in S2F dataset and CUHK dataset. The green box represents a correct match for the given query image (i.e, skull or sketch).
  • ...and 1 more figures