Table of Contents
Fetching ...

MARAuder's Map: Motion-Aware Real-time Activity Recognition with Layout-Based Trajectories

Zishuai Liu, Weihang You, Jin Lu, Fei Dou

TL;DR

This paper presents MARAuder’s Map, a real-time activity recognition framework for smart homes that operates on unsegmented ambient sensor streams by projecting sensor activations onto the floorplan to form trajectory-like image sequences. A CNN encodes spatial layouts, while a learnable time embedding captures hour/day granularity, and an attention-enabled LSTM models temporal dependencies to robustly classify activities within cross-activity windows. The approach is validated on three CASAS datasets (Milan, Kyoto7, Aruba), showing superior performance over strong baselines and demonstrating resilience to temporal ambiguity and multi-activity windows. The results highlight the value of explicit layout-grounded representations, structured temporal cues, and attention in enabling accurate, real-time HAR in ambient-sensor settings with practical deployment potential.

Abstract

Ambient sensor-based human activity recognition (HAR) in smart homes remains challenging due to the need for real-time inference, spatially grounded reasoning, and context-aware temporal modeling. Existing approaches often rely on pre-segmented, within-activity data and overlook the physical layout of the environment, limiting their robustness in continuous, real-world deployments. In this paper, we propose MARAuder's Map, a novel framework for real-time activity recognition from raw, unsegmented sensor streams. Our method projects sensor activations onto the physical floorplan to generate trajectory-aware, image-like sequences that capture the spatial flow of human movement. These representations are processed by a hybrid deep learning model that jointly captures spatial structure and temporal dependencies. To enhance temporal awareness, we introduce a learnable time embedding module that encodes contextual cues such as hour-of-day and day-of-week. Additionally, an attention-based encoder selectively focuses on informative segments within each observation window, enabling accurate recognition even under cross-activity transitions and temporal ambiguity. Extensive experiments on multiple real-world smart home datasets demonstrate that our method outperforms strong baselines, offering a practical solution for real-time HAR in ambient sensor environments.

MARAuder's Map: Motion-Aware Real-time Activity Recognition with Layout-Based Trajectories

TL;DR

This paper presents MARAuder’s Map, a real-time activity recognition framework for smart homes that operates on unsegmented ambient sensor streams by projecting sensor activations onto the floorplan to form trajectory-like image sequences. A CNN encodes spatial layouts, while a learnable time embedding captures hour/day granularity, and an attention-enabled LSTM models temporal dependencies to robustly classify activities within cross-activity windows. The approach is validated on three CASAS datasets (Milan, Kyoto7, Aruba), showing superior performance over strong baselines and demonstrating resilience to temporal ambiguity and multi-activity windows. The results highlight the value of explicit layout-grounded representations, structured temporal cues, and attention in enabling accurate, real-time HAR in ambient-sensor settings with practical deployment potential.

Abstract

Ambient sensor-based human activity recognition (HAR) in smart homes remains challenging due to the need for real-time inference, spatially grounded reasoning, and context-aware temporal modeling. Existing approaches often rely on pre-segmented, within-activity data and overlook the physical layout of the environment, limiting their robustness in continuous, real-world deployments. In this paper, we propose MARAuder's Map, a novel framework for real-time activity recognition from raw, unsegmented sensor streams. Our method projects sensor activations onto the physical floorplan to generate trajectory-aware, image-like sequences that capture the spatial flow of human movement. These representations are processed by a hybrid deep learning model that jointly captures spatial structure and temporal dependencies. To enhance temporal awareness, we introduce a learnable time embedding module that encodes contextual cues such as hour-of-day and day-of-week. Additionally, an attention-based encoder selectively focuses on informative segments within each observation window, enabling accurate recognition even under cross-activity transitions and temporal ambiguity. Extensive experiments on multiple real-world smart home datasets demonstrate that our method outperforms strong baselines, offering a practical solution for real-time HAR in ambient sensor environments.

Paper Structure

This paper contains 23 sections, 8 equations, 18 figures, 5 tables.

Figures (18)

  • Figure 1: Illustration of the inspiration behind our proposed Marauder’s Map framework. (a) The original Marauder’s Map from Harry Potter, which visualizes real-time movement within a physical space chatgpt_image2025. (b) Our adaptation for Human Activity Recognition (HAR), depicting a floorplan with a trajectory example in the kitchen area.
  • Figure 2: Distribution of the number of unique activity labels contained within each sliding window for all activity classes in the Milan dataset (window size = 60, step size = 6). Each donut chart represents a specific activity and shows the proportion of windows containing 1 to 5 distinct activity labels. Higher proportions of multi-label windows indicate greater label ambiguity, especially for activities like Bed_to_Toilet, Leave_Home, and Eve_Meds.
  • Figure 3: Spatial heatmaps illustrating the distribution and intensity of sensor activations during three specific activities: (a) Morning_Meds, (b) Kitchen_Activity, and (c) Eve_Meds. The color intensity of each circle represent the frequency of sensor triggers at corresponding locations, highlighting patterns of movement and activity hotspots within the home environment.
  • Figure 4: Hourly distribution of activity start times for Morning_Meds, Evening_Meds, and Kitchen_Activity. Line plots represent the start times of medication-related activities, highlighting clear temporal boundaries—Morning_Meds typically begins between 7–10 AM, while Evening_Meds starts around 7–9 PM. The green bars show the distribution of Kitchen_Activity start times, which occur throughout the day but peak around midday and early evening.
  • Figure 5: Overview of the proposed Marauder’s Map framework for cross-activity human activity recognition. Sensor activations are collected within temporal windows and transformed into spatially structured trajectory images. Simultaneously, timestamp information is encoded through a time positional encoder using weekday, hour, and minute components. Both image and time embeddings are fused into behavior embeddings at each timestep and then sequentially processed by an RNN encoder. An attention mechanism is applied to the RNN outputs, dynamically assigning weights to highlight salient temporal features across the sequence. The resulting context vector is passed to a classifier for final activity prediction. The architecture of the attention mechanism is shown on the left, consisting of a two-layer fully connected network with Tanh activation and softmax normalization to generate attention scores.
  • ...and 13 more figures