Table of Contents
Fetching ...

Transportation mode recognition based on low-rate acceleration and location signals with an attention-based multiple-instance learning network

Christos Siargkas, Vasileios Papapanagiotou, Anastasios Delopoulos

TL;DR

The paper addresses transportation mode recognition (TMR) using energy-efficient, low-rate smartphone signals. It introduces Fusion-MIL, an attention-based MIL framework with modality-specific encoders that map acceleration and location data into a shared embedding space for robust multi-modal fusion, followed by HMM post-processing to exploit temporal structure. The model leverages sequences of acceleration windows to enhance MIL effectiveness and to localize informative regions in the data. Evaluated on the SHL Preview Dataset with LOSO cross-subject validation, Fusion-MIL achieves state-of-the-art performance across eight transportation modes and demonstrates robustness to device placement and missing modality data, all while operating at low sampling rates ($f_s^a=10$ Hz, $f_s^l=1/60$ Hz).

Abstract

Transportation mode recognition (TMR) is a critical component of human activity recognition (HAR) that focuses on understanding and identifying how people move within transportation systems. It is commonly based on leveraging inertial, location, or both types of signals, captured by modern smartphone devices. Each type has benefits (such as increased effectiveness) and drawbacks (such as increased battery consumption) depending on the transportation mode (TM). Combining the two types is challenging as they exhibit significant differences such as very different sampling rates. This paper focuses on the TMR task and proposes an approach for combining the two types of signals in an effective and robust classifier. Our network includes two sub-networks for processing acceleration and location signals separately, using different window sizes for each signal. The two sub-networks are designed to also embed the two types of signals into the same space so that we can then apply an attention-based multiple-instance learning classifier to recognize TM. We use very low sampling rates for both signal types to reduce battery consumption. We evaluate the proposed methodology on a publicly available dataset and compare against other well known algorithms.

Transportation mode recognition based on low-rate acceleration and location signals with an attention-based multiple-instance learning network

TL;DR

The paper addresses transportation mode recognition (TMR) using energy-efficient, low-rate smartphone signals. It introduces Fusion-MIL, an attention-based MIL framework with modality-specific encoders that map acceleration and location data into a shared embedding space for robust multi-modal fusion, followed by HMM post-processing to exploit temporal structure. The model leverages sequences of acceleration windows to enhance MIL effectiveness and to localize informative regions in the data. Evaluated on the SHL Preview Dataset with LOSO cross-subject validation, Fusion-MIL achieves state-of-the-art performance across eight transportation modes and demonstrates robustness to device placement and missing modality data, all while operating at low sampling rates ( Hz, Hz).

Abstract

Transportation mode recognition (TMR) is a critical component of human activity recognition (HAR) that focuses on understanding and identifying how people move within transportation systems. It is commonly based on leveraging inertial, location, or both types of signals, captured by modern smartphone devices. Each type has benefits (such as increased effectiveness) and drawbacks (such as increased battery consumption) depending on the transportation mode (TM). Combining the two types is challenging as they exhibit significant differences such as very different sampling rates. This paper focuses on the TMR task and proposes an approach for combining the two types of signals in an effective and robust classifier. Our network includes two sub-networks for processing acceleration and location signals separately, using different window sizes for each signal. The two sub-networks are designed to also embed the two types of signals into the same space so that we can then apply an attention-based multiple-instance learning classifier to recognize TM. We use very low sampling rates for both signal types to reduce battery consumption. We evaluate the proposed methodology on a publicly available dataset and compare against other well known algorithms.
Paper Structure (27 sections, 7 equations, 5 figures, 9 tables)

This paper contains 27 sections, 7 equations, 5 figures, 9 tables.

Figures (5)

  • Figure 1: Visual representation of the windows that are used for a single bag (for MIL). Given a target timestamp (dashed line), we include $n_l = 1$ window of location data that is $12$ minutes long and $n_a = 3$ successive windows of acceleration data that are each $1$ minute long. The next bag is obtained by shifting everything by $1$ minute.
  • Figure 2: Proposed multi-modal TMR network. The input is a bag containing two types of instances: acceleration ($n_a$) and location ($n_l$). Instances are processed in parallel by two modality-specific feature encoders $\mathbf{f}_a$ and $\mathbf{f}_l$ accordingly, which embed them in the same $d$-dimensional space. The attention-based MIL ANN $\mathbf{f}$, follows; it aggregates $n$ instances (of the $\mathbb{R}^{d}$ embedding space) into a fused attentive encoding of the same dimension $d$. Finally, to make transportation mode predictions, a small classification network maps the fused encoding $\mathbf{z}$ into category-wise decision scores.
  • Figure 3: Accuracy comparison between Acc-CNN, Acc-MIL and Fusion-MIL versus the duration $d$ of acceleration input data
  • Figure 4: Effect of data augmentation on the accuracy of Acc-MIL versus the number $d$ of one-minute acceleration instances
  • Figure 5: Multi-class ROC curves for Fusion-MIL