Table of Contents
Fetching ...

Webcam-based Pupil Diameter Prediction Benefits from Upscaling

Vijul Shah, Brian B. Moser, Ko Watanabe, Andreas Dengel

TL;DR

Low-resolution webcam eye images hinder precise pupil diameter estimation for cognitive and physiological state assessment. The study evaluates five pre-trained SR models as preprocessing on full-face images to produce EyeDentify++ left/right eye datasets and trains three ResNet regressors on upscaled images at 2× and 4×. Findings indicate that SR generally improves prediction accuracy with strong interactions between SR method and scale; while bicubic upsampling often performs well, several advanced SR models yield further gains and induce shifts in model attention observed via activation maps. These results provide practical guidance for selecting upscaling techniques to boost webcam-based pupilometry, enabling more reliable assessments of stress, cognitive load, and related states in real-world settings.

Abstract

Capturing pupil diameter is essential for assessing psychological and physiological states such as stress levels and cognitive load. However, the low resolution of images in eye datasets often hampers precise measurement. This study evaluates the impact of various upscaling methods, ranging from bicubic interpolation to advanced super-resolution, on pupil diameter predictions. We compare several pre-trained methods, including CodeFormer, GFPGAN, Real-ESRGAN, HAT, and SRResNet. Our findings suggest that pupil diameter prediction models trained on upscaled datasets are highly sensitive to the selected upscaling method and scale. Our results demonstrate that upscaling methods consistently enhance the accuracy of pupil diameter prediction models, highlighting the importance of upscaling in pupilometry. Overall, our work provides valuable insights for selecting upscaling techniques, paving the way for more accurate assessments in psychological and physiological research.

Webcam-based Pupil Diameter Prediction Benefits from Upscaling

TL;DR

Low-resolution webcam eye images hinder precise pupil diameter estimation for cognitive and physiological state assessment. The study evaluates five pre-trained SR models as preprocessing on full-face images to produce EyeDentify++ left/right eye datasets and trains three ResNet regressors on upscaled images at 2× and 4×. Findings indicate that SR generally improves prediction accuracy with strong interactions between SR method and scale; while bicubic upsampling often performs well, several advanced SR models yield further gains and induce shifts in model attention observed via activation maps. These results provide practical guidance for selecting upscaling techniques to boost webcam-based pupilometry, enabling more reliable assessments of stress, cognitive load, and related states in real-world settings.

Abstract

Capturing pupil diameter is essential for assessing psychological and physiological states such as stress levels and cognitive load. However, the low resolution of images in eye datasets often hampers precise measurement. This study evaluates the impact of various upscaling methods, ranging from bicubic interpolation to advanced super-resolution, on pupil diameter predictions. We compare several pre-trained methods, including CodeFormer, GFPGAN, Real-ESRGAN, HAT, and SRResNet. Our findings suggest that pupil diameter prediction models trained on upscaled datasets are highly sensitive to the selected upscaling method and scale. Our results demonstrate that upscaling methods consistently enhance the accuracy of pupil diameter prediction models, highlighting the importance of upscaling in pupilometry. Overall, our work provides valuable insights for selecting upscaling techniques, paving the way for more accurate assessments in psychological and physiological research.
Paper Structure (13 sections, 4 equations, 5 figures, 1 table)

This paper contains 13 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Pipeline of our data preprocessing with image SR. As a first step, we super-resolve the raw data with a pre-defined scaling factor (here $2\times$). Next, we used Mediapipe to extract the respective cropped eye images ($64\times32$), left and right, for face detection and landmark localization. Subsequently, we applied blink detection on the cropped eyes using the Eye Aspect Ratio (EAR) and a pre-trained vision transformer for blink detection, as described in EyeDentify shah2024eyedentify. Cropped eye images are then saved based on the EAR threshold and model confidence score.
  • Figure 2: Comparison of applying image SR models on the cropped eye images versus applying them on the entire image. While the SR approximations on the entire image lead to results plausible to the respective input, the SR models applied to the cropped eye images lead to very distinct images. For instance, GFPGAN (left) produces unnatural pupils, whereas HAT (right) emits brightness shifts.
  • Figure 3: Comparison of applying pre-trained SR models on the EyeDentify Dataset.
  • Figure 4: Challenges in estimating pupil diameter without and with SR: Participants A, B, C show head movements and gaze shifts; Participant D shows eye size variation while smiling; Participants E, F, G, H experience different lighting effects—E in bright light, F with a yellow tint, G’s face appearing red, and H’s face appearing blue.
  • Figure 5: Class Activation Map zhou2016learning visualizations for the final convolutional layer of ResNet18, ResNet50, and ResNet152 are shown for a test participant viewing the same display color with No-SR, SRx2, and SRx4 eye images. The true and predicted values represent the original and estimated pupil diameters.