Monitoring Viewer Attention During Online Ads
Mina Bishay, Graham Page, Waleed Emad, Mohammad Mavadati
TL;DR
The paper tackles the challenge of reliably measuring viewer attention during online ads by proposing a device-aware attention architecture that fuses AFFDEX 2.0 and SmartEye SDK to detect four distractors: off-screen gaze, speaking, drowsiness, and unattended screen. It introduces gaze-intersection estimation with screen plane and extrinsic parameter calibration, and models for gaze (eye and head), speaking (lip movement with CNN), yawning (mouth ratio and AU features), and no-face detection. Validation across four proprietary datasets demonstrates that combining multiple distractors improves attention detection on both desktop and mobile platforms, outperforming prior approaches such as AFFDEX 1.0. The approach enables more accurate ad-testing by filtering inattentive participants and supports scalable, real-world deployment in online advertising contexts. Future work aims to refine gaze assessment when ads are not fullscreen and to broaden distractor coverage, enhancing practical utility.
Abstract
Nowadays, video ads spread through numerous online platforms, and are being watched by millions of viewers worldwide. Big brands gauge the liking and purchase intent of their new ads, by analyzing the facial responses of viewers recruited online to watch the ads from home or work. Although this approach captures naturalistic responses, it is susceptible to distractions inherent in the participants' environments, such as a movie playing on TV, a colleague speaking, or mobile notifications. Inattentive participants should get flagged and eliminated to avoid skewing the ad-testing process. In this paper we introduce an architecture for monitoring viewer attention during online ads. Leveraging two behavior analysis toolkits; AFFDEX 2.0 and SmartEye SDK, we extract low-level facial features encompassing facial expressions, head pose, and gaze direction. These features are then combined to extract high-level features that include estimated gaze on the screen plane, yawning, speaking, etc -- this enables the identification of four primary distractors; off-screen gaze, drowsiness, speaking, and unattended screen. Our architecture tailors the gaze settings according to the device type (desktop or mobile). We validate our architecture first on datasets annotated for specific distractors, and then on a real-world ad testing dataset with various distractors. The proposed architecture shows promising results in detecting distraction across both desktop and mobile devices.
