Moonwalk: Advancing Gait-Based User Recognition on Wearable Devices with Metric Learning

Asaf Liberman; Oron Levy; Soroush Shahi; Cori Tymoszek Park; Mike Ralph; Richard Kang; Abdelkareem Bedri; Gierad Laput

Moonwalk: Advancing Gait-Based User Recognition on Wearable Devices with Metric Learning

Asaf Liberman, Oron Levy, Soroush Shahi, Cori Tymoszek Park, Mike Ralph, Richard Kang, Abdelkareem Bedri, Gierad Laput

TL;DR

Moonwalk introduces a passive gait-based authentication method for wireless headphones using the built-in accelerometer. It trains a self-supervised metric learning model with NT-Xent loss to produce discriminative gait embeddings from 3D acceleration, enabling enrollment from as little as 10 seconds of walking and on-device recognition without retraining for new users. In experiments with 50 participants and controlled variations in shoe type and floor surface, the approach achieves an average F1-score of 92.9% and EER around 2–3% on the GA dataset, with robust generalization to surfaces and improved performance with adaptive enrollment. The work demonstrates the practicality and challenges of passive authentication for wearables, and outlines directions for making gait-based recognition more robust and deployable in real devices.

Abstract

Personal devices have adopted diverse authentication methods, including biometric recognition and passcodes. In contrast, headphones have limited input mechanisms, depending solely on the authentication of connected devices. We present Moonwalk, a novel method for passive user recognition utilizing the built-in headphone accelerometer. Our approach centers on gait recognition; enabling users to establish their identity simply by walking for a brief interval, despite the sensor's placement away from the feet. We employ self-supervised metric learning to train a model that yields a highly discriminative representation of a user's 3D acceleration, with no retraining required. We tested our method in a study involving 50 participants, achieving an average F1 score of 92.9% and equal error rate of 2.3%. We extend our evaluation by assessing performance under various conditions (e.g. shoe types and surfaces). We discuss the opportunities and challenges these variations introduce and propose new directions for advancing passive authentication for wearable devices.

Moonwalk: Advancing Gait-Based User Recognition on Wearable Devices with Metric Learning

TL;DR

Abstract

Paper Structure (19 sections, 1 equation, 6 figures, 1 table)

This paper contains 19 sections, 1 equation, 6 figures, 1 table.

Introduction
Related Work
Method
Datasets
Pre-Processing
Training
Loss
Testing Procedure and Evaluation Metrics
Evaluation and Results
Comparison with Conventional Methods
Metric Learning Model
Generalizability
Adaptive Enrollment
Usability of the Enrollment and Recognition Process
Discussion and Limitations
...and 4 more sections

Figures (6)

Figure 1: An overview of our user recognition process. Here, a user starts enrolling by entering their name (A), and the enrollment process involves 10 seconds of walking (B). Further, the user can walk for more than 10 seconds to increase the recognition accuracy (C). Our method recognizes the enrolled user based on gait similarity features obtained from headphones accelerometer, requiring no retraining (D). Moreover, our method rejects other people wearing the user's headphones, as noted when a similarity score is low (E). Furthermore, our method supports enrolling different "appearances" (a la facial recognition), such as different shoes (F, G). Finally, our studies show that our method can generalize across different ground surfaces without a need to re-enroll (H).
Figure 2: Overview of the Moonwalk pipeline. (a-b) data acquisition from the headphones' accelerometer (3D signal); (c) Pre-processing including conversion to magnitude and then spectrogram, data for training includes the pixel dropout augmentation. (d) In the contrastive training scheme, segments from the same session are defined as positive pairs, and segments from other sessions (belonging to different users) defined as negative pairs. (e) The model outputs embedding for each segment, and (f) trained using a contrastive loss to minimize the distance between embeddings of positive pairs while maximizing the distance between embeddings of negative pairs. The trained model yields a discriminative embeddings space. (g) In the recognition stage, the new sample is compared to an already enrolled user's sample. A distance threshold determines if the sample belongs to that user or not.
Figure 3: Data representations of several participants. (a) The raw accelerometer signals of Participant 9, Participant 15 and Participant 17. (b) The spectrograms that were calculated from the signals in (a). (c) t-SNE calculated on the embeddings created by the model on the spectrogram input. Participant samples are well separated. (d) t-SNE calculated on features (computed using the raw signal) from the literature watanabe2020gait.
Figure 4: Illustration of the proposed adaptive enrollment technique in the context of a single user's walking sessions. (a) The enrollment step. The user walks through a variety of conditions, including terrain types, inclinations, and walking paces. The distance from the user's walking embedding is shown over the course of the session. When the distance rises above the threshold (here, 0.3) for multiple time windows, adaptive enrollment occurs, capturing the gait pattern in the problematic segments. Since the headphones are required to remain in-ear for adaptive enrollment to occur, we can assume that the genuine user is still present. As time progresses, the distance approaches the threshold less frequently as more of the user's gait is encompassed within the collected templates. (b) After the adaptive enrollment step, the same set of templates is used to validate the user over a test session. The user again walks through various conditions, and the distance is reported. Here, we do not make the assumption that the headphones remain in-ear, and thus adaptive enrollment does not occur, and we use a stricter threshold of 0.24. The FAR from this user's test session was 7.9% with 4 total enrolled templates.
Figure 5: Results for from our GA dataset ($n=50)$ with k-fold cross-validation. Performance is highly correlated with sample duration - a longer sample duration improves results, since it contains more temporal context. A plateau can be noticed towards high sample durations, making a sample duration of 10 ideal.
...and 1 more figures

Moonwalk: Advancing Gait-Based User Recognition on Wearable Devices with Metric Learning

TL;DR

Abstract

Moonwalk: Advancing Gait-Based User Recognition on Wearable Devices with Metric Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)