Table of Contents
Fetching ...

MeciFace: Mechanomyography and Inertial Fusion-based Glasses for Edge Real-Time Recognition of Facial and Eating Activities

Hymalai Bello, Sungho Suh, Bo Zhou, Paul Lukowicz

TL;DR

MeciFace presents a glasses-based wearable that performs real-time facial expression and eating/drinking activity recognition entirely on-device by fusing mechanomyography and inertial data through a two-stage hierarchical TinyML pipeline. The compact CNNs, deployed on a microcontroller via TensorFlow Lite for Microcontrollers, keep memory usage to 11–19 KB while achieving robust performance, with an on-edge power envelope below 0.55–0.65 W. In evaluations with unseen users, the system achieves a 94% F1-score for eating/drinking detection and approximately 86% F1-score for facial expressions, demonstrating practical viability for private, edge-based health monitoring. The work establishes a foundation for privacy-preserving, ubiquitous monitoring of stress-related eating and facial cues, with potential extensions to environmental sensing and multimodal data fusion.

Abstract

The increasing prevalence of stress-related eating behaviors and their impact on overall health highlights the importance of effective and ubiquitous monitoring systems. In this paper, we present MeciFace, an innovative wearable technology designed to monitor facial expressions and eating activities in real-time on-the-edge (RTE). MeciFace aims to provide a low-power, privacy-conscious, and highly accurate tool for promoting healthy eating behaviors and stress management. We employ lightweight convolutional neural networks as backbone models for facial expression and eating monitoring scenarios. The MeciFace system ensures efficient data processing with a tiny memory footprint, ranging from 11KB to 19 KB. During RTE evaluation, the system achieves an F1-score of < 86% for facial expression recognition and 94% for eating/drinking monitoring, for the RTE of unseen users (user-independent case).

MeciFace: Mechanomyography and Inertial Fusion-based Glasses for Edge Real-Time Recognition of Facial and Eating Activities

TL;DR

MeciFace presents a glasses-based wearable that performs real-time facial expression and eating/drinking activity recognition entirely on-device by fusing mechanomyography and inertial data through a two-stage hierarchical TinyML pipeline. The compact CNNs, deployed on a microcontroller via TensorFlow Lite for Microcontrollers, keep memory usage to 11–19 KB while achieving robust performance, with an on-edge power envelope below 0.55–0.65 W. In evaluations with unseen users, the system achieves a 94% F1-score for eating/drinking detection and approximately 86% F1-score for facial expressions, demonstrating practical viability for private, edge-based health monitoring. The work establishes a foundation for privacy-preserving, ubiquitous monitoring of stress-related eating and facial cues, with potential extensions to environmental sensing and multimodal data fusion.

Abstract

The increasing prevalence of stress-related eating behaviors and their impact on overall health highlights the importance of effective and ubiquitous monitoring systems. In this paper, we present MeciFace, an innovative wearable technology designed to monitor facial expressions and eating activities in real-time on-the-edge (RTE). MeciFace aims to provide a low-power, privacy-conscious, and highly accurate tool for promoting healthy eating behaviors and stress management. We employ lightweight convolutional neural networks as backbone models for facial expression and eating monitoring scenarios. The MeciFace system ensures efficient data processing with a tiny memory footprint, ranging from 11KB to 19 KB. During RTE evaluation, the system achieves an F1-score of < 86% for facial expression recognition and 94% for eating/drinking monitoring, for the RTE of unseen users (user-independent case).
Paper Structure (11 sections, 5 figures, 2 tables)

This paper contains 11 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Facial Muscle Activities Dictionary; 6 Facial Expressions from Warsaw Set of Emotional Facial Expression Photoset Warsaw and 2 Gestures from bello2023inmyface. Taking a Pill Facial Muscle Movement is Included to Differentiate Eating/Drinking Episode with the Sporadic Gesture of Touching Face/Mouth.
  • Figure 2: MeciFace Prototype (A). Hardware Connections Blocks: Motion and Environmental Station on The Glasses' Nose Bridge with BNO085 (IMU), SPH8878LR5H (Microphone) and BME688 (Barometer). On The Temples are The Force Sensitive Resistor (FSR), Piezoelectric Film (PEF), and QtPy ESP32 (MCU) (B).
  • Figure 3: Real-Time and on-the-Edge Flow Diagram Implementation for the Eating/Drinking Scenario with the Two Stages Hierarchical Modeling; First Stage is the Mechanomyography-based Model (MMG-Model) to Detect Null/Activity. The Second Stage is the Inertial-Model to Classify Eating and Drinking Episodes by Window Size of One Second and Window Step of Half a Second (A). Real-Time and on-the-Edge Flow Diagram Implementation for the Facial Expressions Scenario with Motion Threshold Detection and Two Stages Hierarchical Modeling; The First Stage is the MMG-Model to detect Null/Activity. The Second Stage is the Inertial-based Model to Classify the Facial Movements Dictionary in \ref{['fig:FacialDic']}(B).
  • Figure 4: Results of the offline MMG-Model with Five Volunteers (Leave-one-session-out cross-validation) in Lunch/Dinner Scenario;F1-score=83 %(A). Results of the offline Inertial-Model with Five Volunteers (Leave-one-session-out cross-validation) Lunch/Dinner Scenario; F1-score=88 %(B). Real-Time on-the-Edge Recognition Results for Five Unseen Volunteers (User-independent) in Snacking Scenario; F1-score = 94 %(C).
  • Figure 5: Results of the offline Inertial-Model for Ten Sessions on Different Days with Leave-One-Session Out Cross Validation for the Recognition of the Dictionary in \ref{['fig:FacialDic']}; Joy/Surprise(1), Anger/Disgust/Anger(2), Winking(3), Fear(4) and taking a pill(5) and F1-score=95%(A). Real-Time and on-the-Edge Results of the Inertial-Model for Three Sessions on Different Days for the Recognition of the Facial Activities in the Dictionary in \ref{['fig:FacialDic']}; F1-score=86%(B).