Table of Contents
Fetching ...

Putting Language into Context Using Smartphone-Based Keyboard Logging

Florian Bemmann, Timo Koch, Maximilian Bergmann, Clemens Stachl, Daniel Buschek, Ramona Schoedel, Sven Mayer

TL;DR

Language data from smartphones often lacks contextual information and privacy protections. The authors propose context-enriched keyboard logging that uses input prompt text metadata to infer input motive, with on-device preprocessing and a large six-month field study (N=624) to derive a motive mapping and to share an Android library. They find that filtering by input motive yields clearer data and higher LIWC matches for messaging and social content, while search queries remain challenging, underscoring the value of motive-based data curation. The framework supports privacy-preserving collection and fine-grained analysis across linguistics, psychology and HCI, offering practical tools and a roadmap for reproducible, on-device mobile language research.

Abstract

While the study of language as typed on smartphones offers valuable insights, existing data collection methods often fall short in providing contextual information and ensuring user privacy. We present a privacy-respectful approach - context-enriched keyboard logging - that allows for the extraction of contextual information on the user's input motive, which is meaningful for linguistics, psychology, and behavioral sciences. In particular, with our approach, we enable distinguishing language contents by their channel (i.e., comments, messaging, search inputs). Filtering by channel allows for better pre-selection of data, which is in the interest of researchers and improves users' privacy. We demonstrate our approach on a large-scale six-month user study (N=624) of language use in smartphone interactions in the wild. Finally, we highlight the implications for research on language use in human-computer interaction and interdisciplinary contexts.

Putting Language into Context Using Smartphone-Based Keyboard Logging

TL;DR

Language data from smartphones often lacks contextual information and privacy protections. The authors propose context-enriched keyboard logging that uses input prompt text metadata to infer input motive, with on-device preprocessing and a large six-month field study (N=624) to derive a motive mapping and to share an Android library. They find that filtering by input motive yields clearer data and higher LIWC matches for messaging and social content, while search queries remain challenging, underscoring the value of motive-based data curation. The framework supports privacy-preserving collection and fine-grained analysis across linguistics, psychology and HCI, offering practical tools and a roadmap for reproducible, on-device mobile language research.

Abstract

While the study of language as typed on smartphones offers valuable insights, existing data collection methods often fall short in providing contextual information and ensuring user privacy. We present a privacy-respectful approach - context-enriched keyboard logging - that allows for the extraction of contextual information on the user's input motive, which is meaningful for linguistics, psychology, and behavioral sciences. In particular, with our approach, we enable distinguishing language contents by their channel (i.e., comments, messaging, search inputs). Filtering by channel allows for better pre-selection of data, which is in the interest of researchers and improves users' privacy. We demonstrate our approach on a large-scale six-month user study (N=624) of language use in smartphone interactions in the wild. Finally, we highlight the implications for research on language use in human-computer interaction and interdisciplinary contexts.
Paper Structure (35 sections, 3 figures, 3 tables)

This paper contains 35 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Screenshots of three text fields of the three Android apps: Google WhatsApp (left), Google Search (middle), and Twitter (right). All three text fields have input prompt texts, that give the user a hint about what the text field is intended to be used for.
  • Figure 2: Instead of categorizing collected in-the-wild text input data by the originating app, we propose to regard the originating text field's input prompt text. This figure shows on the example of the app Instagram, that text inputs into Instagram are not just social media contents such as posts and comments, but can also have other motives such as messaging, search, and data input.
  • Figure 3: Words typed per user per input motive. Search inputs are rather short (1 to 3 words), and Messaging inputs are rather long with 5 to 50 words. Social network contents like posts (Content Creation) and Comments range in between.