Table of Contents
Fetching ...

PupilSense: A Novel Application for Webcam-Based Pupil Diameter Estimation

Vijul Shah, Ko Watanabe, Brian B. Moser, Andreas Dengel

TL;DR

The paper tackles the lack of accessible, webcam-based pupil diameter estimation by collecting a large open dataset that pairs RGB webcam eye images with precise pupil measurements from a Tobii ground-truth system. It establishes a robust preprocessing workflow (alignment, eye cropping, and depth map generation) and demonstrates baseline predictions using LOPOCV with ResNet18 and ResNet50, where ResNet50 achieves $MAPE$ around $3.23$–$3.64\%$, outperforming small-model baselines. The authors release an open dataset and deploy PupilSense, a web application that provides CAM visualizations, frame-level diameter estimates, and blink-aware analysis, making pupilometry more transparent and usable in natural settings. They highlight practical impact for human behavior research and healthcare, while acknowledging limitations in real-time applicability and data diversity, and outlining future work on real-time processing, privacy-preserving deployment, and broader camera validation.

Abstract

Measuring pupil diameter is vital for gaining insights into physiological and psychological states - traditionally captured by expensive, specialized equipment like Tobii eye-trackers and Pupillabs glasses. This paper presents a novel application that enables pupil diameter estimation using standard webcams, making the process accessible in everyday environments without specialized equipment. Our app estimates pupil diameters from videos and offers detailed analysis, including class activation maps, graphs of predicted left and right pupil diameters, and eye aspect ratios during blinks. This tool expands the accessibility of pupil diameter measurement, particularly in everyday settings, benefiting fields like human behavior research and healthcare. Additionally, we present a new open source dataset for pupil diameter estimation using webcam images containing cropped eye images and corresponding pupil diameter measurements.

PupilSense: A Novel Application for Webcam-Based Pupil Diameter Estimation

TL;DR

The paper tackles the lack of accessible, webcam-based pupil diameter estimation by collecting a large open dataset that pairs RGB webcam eye images with precise pupil measurements from a Tobii ground-truth system. It establishes a robust preprocessing workflow (alignment, eye cropping, and depth map generation) and demonstrates baseline predictions using LOPOCV with ResNet18 and ResNet50, where ResNet50 achieves around , outperforming small-model baselines. The authors release an open dataset and deploy PupilSense, a web application that provides CAM visualizations, frame-level diameter estimates, and blink-aware analysis, making pupilometry more transparent and usable in natural settings. They highlight practical impact for human behavior research and healthcare, while acknowledging limitations in real-time applicability and data diversity, and outlining future work on real-time processing, privacy-preserving deployment, and broader camera validation.

Abstract

Measuring pupil diameter is vital for gaining insights into physiological and psychological states - traditionally captured by expensive, specialized equipment like Tobii eye-trackers and Pupillabs glasses. This paper presents a novel application that enables pupil diameter estimation using standard webcams, making the process accessible in everyday environments without specialized equipment. Our app estimates pupil diameters from videos and offers detailed analysis, including class activation maps, graphs of predicted left and right pupil diameters, and eye aspect ratios during blinks. This tool expands the accessibility of pupil diameter measurement, particularly in everyday settings, benefiting fields like human behavior research and healthcare. Additionally, we present a new open source dataset for pupil diameter estimation using webcam images containing cropped eye images and corresponding pupil diameter measurements.
Paper Structure (11 sections, 6 figures, 3 tables)

This paper contains 11 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: PupilSense: A web app for estimating and analyzing pupil diameters from everyday images and videos.[A]: Options to select either the left or right pupil for analysis (in blue) and to choose the classification models (in pink). [B]: Visualization of the input and output media, including CAM and estimated pupil diameters. [C]: Estimated pupil diameter values for each frame, analyzed by selected pupil type(s). [D]: EAR values for blink detection, with thresholds for acceptance of open eyes (in green) and rejection (in red). [E]: Consolidated data view showing pupil diameter values, EARs, and differences in pupil diameters, with a downloaded CSV file.
  • Figure 2: Overview of a data recording and preprocessing (alignment flow). Tobii eye-tracker records pupil diameter, and ChameleonView captures facial recordings using a webcam. Facial recordings start when the participant clicks on the button in the center. The start and end timestamp of the recording is collected in order to synchronize the data with an eye-tracker. To synchronize the 90 frames with the 270 Tobii-captured data points, each metric column is concatenated horizontally across the 90 data points from the three unique timestamps in the Tobii-captured CSV file, followed by computing a row-wise mean.
  • Figure 3: Pupil diameter distribution of one participant during the recordings. Different pupil diameter measurements and webcam images were captured during the three-second long sessions (in total, 50 sessions). The colors of the boxes indicate the display color used during the recordings (white, black, red, blue, yellow, green, gray, and white again).
  • Figure 4: Data preprocessing pipeline to crop the eyes. For face detection and landmark localization, we used Mediapipe to extract the respective cropped eye images (32x16), left and right, separately. We applied a pre-trained DepthAnythingV2 model on the entire image and cropped the depth maps around the eye regions with the help of landmarks detected from Mediapipe. Next, we applied blink detection on the cropped eyes using the Eye Aspect Ratio (EAR) and a pre-trained vision transformer for blink detection. Cropped eye images and the depth maps are then saved based on the EAR threshold and model confidence score.
  • Figure 5: Iris masks were extracted using Mediapipe landmarks (left) and Otsu's Binarization (right).
  • ...and 1 more figures