CondSeg: Ellipse Estimation of Pupil and Iris via Conditioned Segmentation
Zhuang Jia, Jiangfan Deng, Liying Chi, Xiang Long, Daniel K. Du
TL;DR
The paper addresses the challenge of estimating full pupil and iris ellipses for gaze estimation without explicit ellipse annotations. It introduces CondSeg, a conditioned segmentation framework that predicts an eye-region mask and full ellipse parameters, using a differentiable Ellp2Mask to turn 5D ellipse parameters $(x_0,y_0,a,b,\theta)$ into soft segmentation maps. Through iris-region focus and a two-stage training regime on OpenEDS datasets, CondSeg achieves competitive IoU on visible eye parts while providing accurate ellipse centers directly, enabling downstream eye-tracking tasks without heavy ellipse labeling. This approach reduces annotation burden and yields ellipses suitable for AR/VR gaze estimation in real-world applications.
Abstract
Parsing of eye components (i.e. pupil, iris and sclera) is fundamental for eye tracking and gaze estimation for AR/VR products. Mainstream approaches tackle this problem as a multi-class segmentation task, providing only visible part of pupil/iris, other methods regress elliptical parameters using human-annotated full pupil/iris parameters. In this paper, we consider two priors: projected full pupil/iris circle can be modelled with ellipses (ellipse prior), and the visibility of pupil/iris is controlled by openness of eye-region (condition prior), and design a novel method CondSeg to estimate elliptical parameters of pupil/iris directly from segmentation labels, without explicitly annotating full ellipses, and use eye-region mask to control the visibility of estimated pupil/iris ellipses. Conditioned segmentation loss is used to optimize the parameters by transforming parameterized ellipses into pixel-wise soft masks in a differentiable way. Our method is tested on public datasets (OpenEDS-2019/-2020) and shows competitive results on segmentation metrics, and provides accurate elliptical parameters for further applications of eye tracking simultaneously.
