CondSeg: Ellipse Estimation of Pupil and Iris via Conditioned Segmentation

Zhuang Jia; Jiangfan Deng; Liying Chi; Xiang Long; Daniel K. Du

CondSeg: Ellipse Estimation of Pupil and Iris via Conditioned Segmentation

Zhuang Jia, Jiangfan Deng, Liying Chi, Xiang Long, Daniel K. Du

TL;DR

The paper addresses the challenge of estimating full pupil and iris ellipses for gaze estimation without explicit ellipse annotations. It introduces CondSeg, a conditioned segmentation framework that predicts an eye-region mask and full ellipse parameters, using a differentiable Ellp2Mask to turn 5D ellipse parameters $(x_0,y_0,a,b,\theta)$ into soft segmentation maps. Through iris-region focus and a two-stage training regime on OpenEDS datasets, CondSeg achieves competitive IoU on visible eye parts while providing accurate ellipse centers directly, enabling downstream eye-tracking tasks without heavy ellipse labeling. This approach reduces annotation burden and yields ellipses suitable for AR/VR gaze estimation in real-world applications.

Abstract

Parsing of eye components (i.e. pupil, iris and sclera) is fundamental for eye tracking and gaze estimation for AR/VR products. Mainstream approaches tackle this problem as a multi-class segmentation task, providing only visible part of pupil/iris, other methods regress elliptical parameters using human-annotated full pupil/iris parameters. In this paper, we consider two priors: projected full pupil/iris circle can be modelled with ellipses (ellipse prior), and the visibility of pupil/iris is controlled by openness of eye-region (condition prior), and design a novel method CondSeg to estimate elliptical parameters of pupil/iris directly from segmentation labels, without explicitly annotating full ellipses, and use eye-region mask to control the visibility of estimated pupil/iris ellipses. Conditioned segmentation loss is used to optimize the parameters by transforming parameterized ellipses into pixel-wise soft masks in a differentiable way. Our method is tested on public datasets (OpenEDS-2019/-2020) and shows competitive results on segmentation metrics, and provides accurate elliptical parameters for further applications of eye tracking simultaneously.

CondSeg: Ellipse Estimation of Pupil and Iris via Conditioned Segmentation

TL;DR

into soft segmentation maps. Through iris-region focus and a two-stage training regime on OpenEDS datasets, CondSeg achieves competitive IoU on visible eye parts while providing accurate ellipse centers directly, enabling downstream eye-tracking tasks without heavy ellipse labeling. This approach reduces annotation burden and yields ellipses suitable for AR/VR gaze estimation in real-world applications.

Abstract

Paper Structure (13 sections, 9 equations, 6 figures, 4 tables)

This paper contains 13 sections, 9 equations, 6 figures, 4 tables.

Introduction
Related Works
Methodology
Overall Pipeline and Network Architecture
Estimators for Full Iris and Pupil Ellipses
From Elliptical Parameter to Segmentation Mask
Experiments
Datasets and Evaluation Metrics
Implementation Details
Comparison of Eye Parsing on Visible Parts
Analysis of Full Ellipses for Full Pupil and Iris
Iris Region RoI Focus and Augmentations
Conclusion

Figures (6)

Figure 1: eye-region appearance in common eye images can be decoupled in two dimensions: the iris/pupil position which is controlled by eyeball movement and gaze direction, and the eyelid openness which is related to the elevation of upper eyelid controlled by voluntary muscle (i.e. levator palpebrae superioris). (Synthetic images on the right are from NVGaze dataset kim2019nvgaze)
Figure 2: Comparison of different schemes for full ellipse estimation for pupil/iris, where (a) trains a multi-class segmentation first and uses the predicted dense mask to fit ellipse parameters as post-processing, and (b) first generates elliptical parameters for each sample as ground-truth, and trains a regression network to predict the parameters. Our proposed strategy is illustrated in (c), which directly predict the elliptical parameters without explicit ellipse annotations.
Figure 3: Network architecture of proposed method. Dense-block based encoder-decoder network extracts image features and predicts eye-region segmentation mask. The encoded feature is also utilized to estimate iris elliptical parameters. Full-pupil ellipse is predicted from the cropped full iris RoI region. All elliptical parameters are converted to soft segmentation mask (conditioned by eye-region) for calculating loss to optimize the correctness of elliptical parameters.
Figure 4: Ellipse drawn directly from parameters is shown in (a), and (b) is the distmap$\bm{D}$ which is calculated with $\bm{x}^{\mathrm{T}} \bm{M} \bm{x}$, (c)-(e) are segmaps with $\tau=50, 200$ and $1000$
Figure 5: Eye-parsing performances on tested public datasets OpenEDS-2019 and OpenEDS-2020 is shown above by comparing the output of CondSeg and ground-truth segmentation masks. Note that CondSeg can still provide reasonable pupil and iris masks even when they are obviously occluded by eyelid.
...and 1 more figures

CondSeg: Ellipse Estimation of Pupil and Iris via Conditioned Segmentation

TL;DR

Abstract

CondSeg: Ellipse Estimation of Pupil and Iris via Conditioned Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)