Table of Contents
Fetching ...

Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data

Eun Som Jeon, Hongjun Choi, Ankita Shukla, Yuan Wang, Hyunglae Lee, Matthew P. Buman, Pavan Turaga

TL;DR

This work introduces Topological Persistence Guided Knowledge Distillation (TPKD), a framework that distills knowledge from two teachers trained on complementary representations of wearable-sensor time-series signals—raw time-series and persistence images—into a single compact student that uses only time-series at test time. It fuses multimodal teacher information through merged activation maps and enforces orthogonal feature relationships, augmented by an annealing KD strategy to reduce the teacher–student gap. Empirical evaluation on GENEActiv and PAMAP2 shows that TPKD with orthogonal features consistently outperforms single- and multi-teacher baselines, achieving robust accuracy while maintaining test-time efficiency; ablations reveal dataset-dependent optimal hyperparameters and the benefit of preserving topological features. The approach advances edge-ready HAR by enabling topological insights to be transferred into lightweight models without requiring topological inputs during deployment, with potential for broader multimodal integration and robustness to signal corruptions.

Abstract

Deep learning methods have achieved a lot of success in various applications involving converting wearable sensor data to actionable health insights. A common application areas is activity recognition, where deep-learning methods still suffer from limitations such as sensitivity to signal quality, sensor characteristic variations, and variability between subjects. To mitigate these issues, robust features obtained by topological data analysis (TDA) have been suggested as a potential solution. However, there are two significant obstacles to using topological features in deep learning: (1) large computational load to extract topological features using TDA, and (2) different signal representations obtained from deep learning and TDA which makes fusion difficult. In this paper, to enable integration of the strengths of topological methods in deep-learning for time-series data, we propose to use two teacher networks, one trained on the raw time-series data, and another trained on persistence images generated by TDA methods. The distilled student model utilizes only the raw time-series data at test-time. This approach addresses both issues. The use of KD with multiple teachers utilizes complementary information, and results in a compact model with strong supervisory features and an integrated richer representation. To assimilate desirable information from different modalities, we design new constraints, including orthogonality imposed on feature correlation maps for improving feature expressiveness and allowing the student to easily learn from the teacher. Also, we apply an annealing strategy in KD for fast saturation and better accommodation from different features, while the knowledge gap between the teachers and student is reduced. Finally, a robust student model is distilled, which uses only the time-series data as an input, while implicitly preserving topological features.

Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data

TL;DR

This work introduces Topological Persistence Guided Knowledge Distillation (TPKD), a framework that distills knowledge from two teachers trained on complementary representations of wearable-sensor time-series signals—raw time-series and persistence images—into a single compact student that uses only time-series at test time. It fuses multimodal teacher information through merged activation maps and enforces orthogonal feature relationships, augmented by an annealing KD strategy to reduce the teacher–student gap. Empirical evaluation on GENEActiv and PAMAP2 shows that TPKD with orthogonal features consistently outperforms single- and multi-teacher baselines, achieving robust accuracy while maintaining test-time efficiency; ablations reveal dataset-dependent optimal hyperparameters and the benefit of preserving topological features. The approach advances edge-ready HAR by enabling topological insights to be transferred into lightweight models without requiring topological inputs during deployment, with potential for broader multimodal integration and robustness to signal corruptions.

Abstract

Deep learning methods have achieved a lot of success in various applications involving converting wearable sensor data to actionable health insights. A common application areas is activity recognition, where deep-learning methods still suffer from limitations such as sensitivity to signal quality, sensor characteristic variations, and variability between subjects. To mitigate these issues, robust features obtained by topological data analysis (TDA) have been suggested as a potential solution. However, there are two significant obstacles to using topological features in deep learning: (1) large computational load to extract topological features using TDA, and (2) different signal representations obtained from deep learning and TDA which makes fusion difficult. In this paper, to enable integration of the strengths of topological methods in deep-learning for time-series data, we propose to use two teacher networks, one trained on the raw time-series data, and another trained on persistence images generated by TDA methods. The distilled student model utilizes only the raw time-series data at test-time. This approach addresses both issues. The use of KD with multiple teachers utilizes complementary information, and results in a compact model with strong supervisory features and an integrated richer representation. To assimilate desirable information from different modalities, we design new constraints, including orthogonality imposed on feature correlation maps for improving feature expressiveness and allowing the student to easily learn from the teacher. Also, we apply an annealing strategy in KD for fast saturation and better accommodation from different features, while the knowledge gap between the teachers and student is reduced. Finally, a robust student model is distilled, which uses only the time-series data as an input, while implicitly preserving topological features.
Paper Structure (27 sections, 9 equations, 13 figures, 10 tables)

This paper contains 27 sections, 9 equations, 13 figures, 10 tables.

Figures (13)

  • Figure 1: An overview of Topological Persistence Guided Knowledge Distillation (TPKD). Two teachers, learned with different representations of the same raw time-series data, are utilized to train a compact student model.
  • Figure 2: PD and its corresponding PI. In PD, higher life-time appears brighter.
  • Figure 3: Examples of activation similarity maps $G'$ produced by a layer for the indicated stage of the network for a batch on GENEActiv. High similarities for samples of the batch are represented with high values. The blockwise pattern is more distinctive for WRN16-3 networks, implying the higher capacity of this network can well capture the semantics of the dataset.
  • Figure 4: Framework of extracting orthogonal features. A and B denote mini-batch features at a layer of Teacher1 and Teacher2, respectively. C denotes mini-batch features at a layer of Student.
  • Figure 5: Details of GENEActiv dataset. Each sample has 500 time-steps. From bottom to top, labels correspond to classes 0 to 13.
  • ...and 8 more figures