State-of-the-Art Periorbital Distance Prediction and Disease Classification Using Periorbital Features

George R. Nahass; Sasha Hubschman; Jeffrey C. Peterson; Ghasem Yazdanpanah; Nicholas Tomaras; Madison Cheung; Alex Palacios; Kevin Heinze; Chad A. Purnell; Pete Setabutr; Ann Q. Tran; Darvin Yi

State-of-the-Art Periorbital Distance Prediction and Disease Classification Using Periorbital Features

George R. Nahass, Sasha Hubschman, Jeffrey C. Peterson, Ghasem Yazdanpanah, Nicholas Tomaras, Madison Cheung, Alex Palacios, Kevin Heinze, Chad A. Purnell, Pete Setabutr, Ann Q. Tran, Darvin Yi

TL;DR

This study targets automated, anatomically grounded periorbital distance prediction and its use as robust features for disease classification under real-world imaging variability. It develops a DeepLabV3-based segmentation pipeline and benchmarks it against SAM and PeriorbitAI across diverse healthy and diseased cohorts, achieving state-of-the-art distance accuracy often within intergrader variability. Beyond segmentation, the work demonstrates that periorbital distances enable competitive ID disease classification and, crucially, superior generalization under distribution shift compared to CNNs trained only on images, with XGBoost and Lasso maintaining strong OOD performance and fusion models offering peak ID accuracy. The findings support deployment of anatomy-based AI pipelines for accessible oculoplastic and craniofacial care, and point to future advances in fusion architectures and semi-supervised learning to further improve robustness and clinical utility.

Abstract

Periorbital distances are critical markers for diagnosing and monitoring a range of oculoplastic and craniofacial conditions. Manual measurement, however, is subjective and prone to intergrader variability. Automated methods have been developed but remain limited by standardized imaging requirements, small datasets, and a narrow focus on individual measurements. We developed a segmentation pipeline trained on a domain-specific dataset of healthy eyes and compared its performance against the Segment Anything Model (SAM) and the prior benchmark, PeriorbitAI. Segmentation accuracy was evaluated across multiple disease classes and imaging conditions. We further investigated the use of predicted periorbital distances as features for disease classification under in-distribution (ID) and out-of-distribution (OOD) settings, comparing shallow classifiers, CNNs, and fusion models. Our segmentation model achieved state-of-the-art accuracy across all datasets, with error rates within intergrader variability and superior performance relative to SAM and PeriorbitAI. In classification tasks, models trained on periorbital distances matched CNN performance on ID data (77--78\% accuracy) and substantially outperformed CNNs under OOD conditions (63--68\% accuracy vs. 14\%). Fusion models achieved the highest ID accuracy (80\%) but were sensitive to degraded CNN features under OOD shifts. Segmentation-derived periorbital distances provide robust, explainable features for disease classification and generalize better under domain shift than CNN image classifiers. These results establish a new benchmark for periorbital distance prediction and highlight the potential of anatomy-based AI pipelines for real-world deployment in oculoplastic and craniofacial care.

State-of-the-Art Periorbital Distance Prediction and Disease Classification Using Periorbital Features

TL;DR

Abstract

Paper Structure (28 sections, 2 equations, 12 figures, 16 tables)

This paper contains 28 sections, 2 equations, 12 figures, 16 tables.

Introduction
Methods
Datasets
Segmentation and Periorbital Distance Datasets
Classification Datasets
Periorbital Distance Prediction Pipelines
DeepLabV3
Segment Anything Model
Calculation of Anatomical Relationships
Comparison to PeriorbitAI
Classification Pipelines
CNN Baselines
Shallow Models with Periorbital Distances
Fusion Models
Hardware and Statistical Analysis
...and 13 more sections

Figures (12)

Figure 1: Graphical schematic of segmentation models and distance prediction pipelines. A) Cropped images of the eyes are segmented using one of two deep learning models (DLV3 or SAM). Following segmentation, the Dice score was calculated, and the segmentation masks are used to predict periorbital distances which were compared to distances obtained from human annotations using the mean absolute error (MAE). B) Graphical schematic of the training procedure for the DLV3 model, Input images are cropped at the midline and both eyes are resized to be $256x256$. The network is then trained for 500 steps. Details can be found in the methods. C) Graphical schematic of the Segment Anything Model (SAM). The cropped image was used as input, and bounding box prompts were derived from MediaPipe facemesh coordinates kartynnik_real-time_nodate.
Figure 2: Graphical overview of the classification pipeline used in this study. Models were trained on ID data and tested on both ID and OOD datasets. XGBoost and Lasso models were trained using periorbital distances extracted via a segmentation (DeepLabV3) intermediate step (bottom). A ResNet-18 was trained for classification on cropped images. OOD-ness was quantified (top) by inspecting the embeddings produced by the trained ResNet.
Figure 3: Qualitative evaluation of all three models periorbital distance prediction. Predicted distances from DeepLabV3 and SAM (left to right columns) for various disease states. Brightness has been increased on some images for presentation purposes only. Color can be interpreted as follows: Red dashes: IPD, teal: HPF, light blue: MRD 1, orange: MRD 2, green: OCD, yellow: ICD, purple: Inferior brow height, black: superior brow height.
Figure 4: t-SNE plot of embeddings of ID test and OOD from finetuned ResNet-18. X's represent OOD and O's represent ID train samples. B) Grad-CAM visualizations of a CNN classifier trained on ID data for randomly sampled images of CAP (crouzon-apert-pfeiffer) and Healthy classes from ID and OOD datasets. CAP had the lowest difference in Wasserstein distance between OOD-ID train and ID test-ID train, and Healthy had the highest
Figure 5: Representative example of ground truth segmentation masks and periorbital distance prediction. A): Ground truth masks of the anatomical regions used for evaluating segmentation results and deriving ground truth distance measurements on the face. B). Distance measurements calculated from (A). Pixels were converted to mm using 11.71 mm as the standard diameter for the iris. Scleral area was calculated by taking the ratio of sclera mask to the iris mask, and 4th degree polynomials were fit to the superior and inferior scleral margin. Abbreviations are as follows: VD- Vertical Dystopia, IPD-Inner Pupil Distance, OCD-Outer Canthal Distance, ICD-Inner Canthal Distance, HPF-Horizontal Palpebral Fissure, MRD-Margin to Reflex Distance, ISS- Inferior Scleral Show. Other measurements not shown are VPF-sum of MRD 1 and 2, canthal height-distance between inner pupillary line and medial/lateral canthus, canthal tilt, and scleral area.
...and 7 more figures

State-of-the-Art Periorbital Distance Prediction and Disease Classification Using Periorbital Features

TL;DR

Abstract

State-of-the-Art Periorbital Distance Prediction and Disease Classification Using Periorbital Features

Authors

TL;DR

Abstract

Table of Contents

Figures (12)