Table of Contents
Fetching ...

Iris-SAM: Iris Segmentation Using a Foundation Model

Parisa Farmanifard, Arun Ross

TL;DR

This work adapts the Segment Anything Model (SAM) to iris segmentation by fine-tuning with a Focal Loss to address extreme class imbalance between iris and non-iris pixels. It innovates by generating training-time bounding-box prompts from ground-truth masks and enabling inference-time automatic bounding boxes with a single iris mask output, optimizing for Intersection over Union (IoU). Iris-SAM achieves near-perfect IoU on ND-Iris-0405 (99.58%) and high IoU on CASIA-Iris-Interval-v3 (96.94%), with strong cross-dataset generalization (e.g., 93.75–95.26% on unseen datasets) and low performance variance across tests. The results demonstrate the viability of foundation-model-based segmentation in specialized biometric domains and lay groundwork for extending to off-axis and multi-spectral iris imaging in future work.

Abstract

Iris segmentation is a critical component of an iris biometric system and it involves extracting the annular iris region from an ocular image. In this work, we develop a pixel-level iris segmentation model from a foundational model, viz., Segment Anything Model (SAM), that has been successfully used for segmenting arbitrary objects. The primary contribution of this work lies in the integration of different loss functions during the fine-tuning of SAM on ocular images. In particular, the importance of Focal Loss is borne out in the fine-tuning process since it strategically addresses the class imbalance problem (i.e., iris versus non-iris pixels). Experiments on ND-IRIS-0405, CASIA-Iris-Interval-v3, and IIT-Delhi-Iris datasets convey the efficacy of the trained model for the task of iris segmentation. For instance, on the ND-IRIS-0405 dataset, an average segmentation accuracy of 99.58% was achieved, compared to the best baseline performance of 89.75%.

Iris-SAM: Iris Segmentation Using a Foundation Model

TL;DR

This work adapts the Segment Anything Model (SAM) to iris segmentation by fine-tuning with a Focal Loss to address extreme class imbalance between iris and non-iris pixels. It innovates by generating training-time bounding-box prompts from ground-truth masks and enabling inference-time automatic bounding boxes with a single iris mask output, optimizing for Intersection over Union (IoU). Iris-SAM achieves near-perfect IoU on ND-Iris-0405 (99.58%) and high IoU on CASIA-Iris-Interval-v3 (96.94%), with strong cross-dataset generalization (e.g., 93.75–95.26% on unseen datasets) and low performance variance across tests. The results demonstrate the viability of foundation-model-based segmentation in specialized biometric domains and lay groundwork for extending to off-axis and multi-spectral iris imaging in future work.

Abstract

Iris segmentation is a critical component of an iris biometric system and it involves extracting the annular iris region from an ocular image. In this work, we develop a pixel-level iris segmentation model from a foundational model, viz., Segment Anything Model (SAM), that has been successfully used for segmenting arbitrary objects. The primary contribution of this work lies in the integration of different loss functions during the fine-tuning of SAM on ocular images. In particular, the importance of Focal Loss is borne out in the fine-tuning process since it strategically addresses the class imbalance problem (i.e., iris versus non-iris pixels). Experiments on ND-IRIS-0405, CASIA-Iris-Interval-v3, and IIT-Delhi-Iris datasets convey the efficacy of the trained model for the task of iris segmentation. For instance, on the ND-IRIS-0405 dataset, an average segmentation accuracy of 99.58% was achieved, compared to the best baseline performance of 89.75%.
Paper Structure (12 sections, 4 equations, 13 figures, 2 tables)

This paper contains 12 sections, 4 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: Iris anatomy in the near-infrared spectrum.
  • Figure 2: Proposed network (Iris-SAM) using Segment Anything Model (SAM) kirillov2023segment. (a) Training and (b) Inference/Testing. During training, prompts (bounding boxes) are generated from ground truth masks to guide the model. For inference/testing, the model automatically generates bounding boxes (visualized in green) from the input image, allowing it to predict the iris masks (depicted in blue) without needing explicit bounding box inputs.
  • Figure 3: (a) Training loss over 100 epochs for Dice, Triplet, and Focal Loss functions on the Casia-Iris-Interval-v3 dataset. Focal Loss converges rapidly to a lower value, indicating efficient learning, while Dice and Triplet losses exhibit higher variability and slower convergence. (b) Precision-Recall curve of our method on different test datasets when the Focal Loss was used during training.
  • Figure 4: (a) FineTuning + Focal Loss (Iris-SAM) using the default pre-trained model "ViT_h" with different $\gamma$ values on CASIA-Iris-Interval-v3 dataset. (b) FineTuning + Focal Loss (Iris-SAM) using the default pre-trained model "ViT_h" on three different datasets.
  • Figure 5: FineTuning (FT) sample results (CASIA-Iris-Interval-v3).
  • ...and 8 more figures