Table of Contents
Fetching ...

ScreenTK: Seamless Detection of Time-Killing Moments Using Continuous Mobile Screen Text and On-Device LLMs

Le Fang, Shiquan Zhang, Hong Jia, Jorge Goncalves, Vassilis Kostakos

TL;DR

This work addresses the problem of detecting time-killing moments on smartphones in the face of overwhelming continuous information flow. It introduces ScreenTK, which combines continuous screen text captured on-device via AWARE-Light with on-device LLMs (LLama3) to identify and summarize time-killing moments. Evaluated through two case studies (explicit and implicit) with six participants and 1,034 records, ScreenTK achieves a 38% improvement over a state-of-the-art screenshot-based baseline and nearly complete event detection, while delivering richer contextual traces. The approach underscores the limitations of screenshot-based methods and suggests privacy-preserving, on-device processing with potential integration with app-sensor data for personalized interventions in the future.

Abstract

Smartphones have become essential to people's digital lives, providing a continuous stream of information and connectivity. However, this constant flow can lead to moments where users are simply passing time rather than engaging meaningfully. This underscores the importance of developing methods to identify these "time-killing" moments, enabling the delivery of important notifications in a way that minimizes interruptions and enhances user engagement. Recent work has utilized screenshots taken every 5 seconds to detect time-killing activities on smartphones. However, this method often misses to capture phone usage between intervals. We demonstrate that up to 50% of time-killing instances go undetected using screenshots, leading to substantial gaps in understanding user behavior. To address this limitation, we propose a method called ScreenTK that detects time-killing moments by leveraging continuous screen text monitoring and on-device large language models (LLMs). Screen text contains more comprehensive information than screenshots and allows LLMs to summarize detailed phone usage. To verify our framework, we conducted experiments with six participants, capturing 1,034 records of different time-killing moments. Initial results show that our framework outperforms state-of-the-art solutions by 38% in our case study.

ScreenTK: Seamless Detection of Time-Killing Moments Using Continuous Mobile Screen Text and On-Device LLMs

TL;DR

This work addresses the problem of detecting time-killing moments on smartphones in the face of overwhelming continuous information flow. It introduces ScreenTK, which combines continuous screen text captured on-device via AWARE-Light with on-device LLMs (LLama3) to identify and summarize time-killing moments. Evaluated through two case studies (explicit and implicit) with six participants and 1,034 records, ScreenTK achieves a 38% improvement over a state-of-the-art screenshot-based baseline and nearly complete event detection, while delivering richer contextual traces. The approach underscores the limitations of screenshot-based methods and suggests privacy-preserving, on-device processing with potential integration with app-sensor data for personalized interventions in the future.

Abstract

Smartphones have become essential to people's digital lives, providing a continuous stream of information and connectivity. However, this constant flow can lead to moments where users are simply passing time rather than engaging meaningfully. This underscores the importance of developing methods to identify these "time-killing" moments, enabling the delivery of important notifications in a way that minimizes interruptions and enhances user engagement. Recent work has utilized screenshots taken every 5 seconds to detect time-killing activities on smartphones. However, this method often misses to capture phone usage between intervals. We demonstrate that up to 50% of time-killing instances go undetected using screenshots, leading to substantial gaps in understanding user behavior. To address this limitation, we propose a method called ScreenTK that detects time-killing moments by leveraging continuous screen text monitoring and on-device large language models (LLMs). Screen text contains more comprehensive information than screenshots and allows LLMs to summarize detailed phone usage. To verify our framework, we conducted experiments with six participants, capturing 1,034 records of different time-killing moments. Initial results show that our framework outperforms state-of-the-art solutions by 38% in our case study.
Paper Structure (14 sections, 3 figures)

This paper contains 14 sections, 3 figures.

Figures (3)

  • Figure 1: Comparison between ScreenTK and SOTA screenshot-based method 10.1145/3544548.3580689 for detecting time-killing moments. The green boxes highlight the records of time-killing instances captured by the proposed framework, ScreenTK. Top: explicit time-killing study. Bottom: implicit time-killing study. Transparent: ScreenTK only. Non-Transparent: Both ScreenTK and Screenshot.
  • Figure 2: Prompt engineering for analyzing smartphone screen text data to understand user time-killing behavior, including data format, expected insights, and response structure.
  • Figure 3: Examples of collected screen text. timestamp: the Unix timestamp at which the event occurred; text: text on the current screen.