Table of Contents
Fetching ...

Enabling Automatic Self-Talk Detection via Earables

Euihyeok Lee, Seonghyeon Kim, SangHun Im, Heung-Seon Oh, Seungwoo Kang

TL;DR

Self-talk is an elusive internal dialogue linked to emotion regulation and cognitive processing. This work introduces MutterMeter, a mobile system that automatically detects vocalized self-talk from earable audio using a hierarchical, context-aware pipeline that integrates acoustic, linguistic, and contextual cues. Through a first-of-its-kind in-the-wild tennis dataset (31.1 hours, 25 participants), MutterMeter achieves a macro-F1 of 0.84 and demonstrates substantial latency reductions via stage-wise early exit, outperforming SER, sentiment analysis, and several LLM baselines. The approach balances on-device efficiency with server-side linguistic processing, enabling real-time self-talk detection in everyday contexts while preserving user privacy, and lays a foundation for broader mental-state monitoring applications.

Abstract

Self-talk-an internal dialogue that can occur silently or be spoken aloud-plays a crucial role in emotional regulation, cognitive processing, and motivation, yet has remained largely invisible and unmeasurable in everyday life. In this paper, we present MutterMeter, a mobile system that automatically detects vocalized self-talk from audio captured by earable microphones in real-world settings. Detecting self-talk is technically challenging due to its diverse acoustic forms, semantic and grammatical incompleteness, and irregular occurrence patterns, which differ fundamentally from assumptions underlying conventional speech understanding models. To address these challenges, MutterMeter employs a hierarchical classification architecture that progressively integrates acoustic, linguistic, and contextual information through a sequential processing pipeline, adaptively balancing accuracy and computational efficiency. We build and evaluate MutterMeter using a first-of-its-kind dataset comprising 31.1 hours of audio collected from 25 participants. Experimental results demonstrate that MutterMeter achieves robust performance with a macro-averaged F1 score of 0.84, outperforming conventional approaches, including LLM-based and speech emotion recognition models.

Enabling Automatic Self-Talk Detection via Earables

TL;DR

Self-talk is an elusive internal dialogue linked to emotion regulation and cognitive processing. This work introduces MutterMeter, a mobile system that automatically detects vocalized self-talk from earable audio using a hierarchical, context-aware pipeline that integrates acoustic, linguistic, and contextual cues. Through a first-of-its-kind in-the-wild tennis dataset (31.1 hours, 25 participants), MutterMeter achieves a macro-F1 of 0.84 and demonstrates substantial latency reductions via stage-wise early exit, outperforming SER, sentiment analysis, and several LLM baselines. The approach balances on-device efficiency with server-side linguistic processing, enabling real-time self-talk detection in everyday contexts while preserving user privacy, and lays a foundation for broader mental-state monitoring applications.

Abstract

Self-talk-an internal dialogue that can occur silently or be spoken aloud-plays a crucial role in emotional regulation, cognitive processing, and motivation, yet has remained largely invisible and unmeasurable in everyday life. In this paper, we present MutterMeter, a mobile system that automatically detects vocalized self-talk from audio captured by earable microphones in real-world settings. Detecting self-talk is technically challenging due to its diverse acoustic forms, semantic and grammatical incompleteness, and irregular occurrence patterns, which differ fundamentally from assumptions underlying conventional speech understanding models. To address these challenges, MutterMeter employs a hierarchical classification architecture that progressively integrates acoustic, linguistic, and contextual information through a sequential processing pipeline, adaptively balancing accuracy and computational efficiency. We build and evaluate MutterMeter using a first-of-its-kind dataset comprising 31.1 hours of audio collected from 25 participants. Experimental results demonstrate that MutterMeter achieves robust performance with a macro-averaged F1 score of 0.84, outperforming conventional approaches, including LLM-based and speech emotion recognition models.

Paper Structure

This paper contains 42 sections, 5 equations, 14 figures, 12 tables, 1 algorithm.

Figures (14)

  • Figure 1: Non–self-talk examples
  • Figure 2: Indistinguishable self-talk examples
  • Figure 3: Distinguishable self-talk examples
  • Figure 4: Results of self-talk detection using existing methods
  • Figure 5: MutterMeter system overview
  • ...and 9 more figures