Table of Contents
Fetching ...

Terrain-Aware Stride-Level Trajectory Forecasting for a Powered Hip Exoskeleton via Vision and Kinematics Fusion

Ruoqi Zhao, Xingbang Yan, Yubo Fan

TL;DR

A network called Sandwich Fusion Transformer for Image and Kinematics (SFTIK), which predicts the thigh angle of the ensuing stride given the terrain images at the beginning of the preceding and the ensuing stride and the IMU time series during the preceding stride, is designed.

Abstract

Powered hip exoskeletons have shown the ability for locomotion assistance during treadmill walking. However, providing suitable assistance in real-world walking scenarios which involve changing terrain remains challenging. Recent research suggests that forecasting the lower limb joint's angles could provide target trajectories for exoskeletons and prostheses, and the performance could be improved with visual information. In this letter, We share a real-world dataset of 10 healthy subjects walking through five common types of terrain with stride-level label. We design a network called Sandwich Fusion Transformer for Image and Kinematics (SFTIK), which predicts the thigh angle of the ensuing stride given the terrain images at the beginning of the preceding and the ensuing stride and the IMU time series during the preceding stride. We introduce width-level patchify, tailored for egocentric terrain images, to reduce the computational demands. We demonstrate the proposed sandwich input and fusion mechanism could significantly improve the forecasting performance. Overall, the SFTIK outperforms baseline methods, achieving a computational efficiency of 3.31 G Flops, and root mean square error (RMSE) of 3.445 \textpm \ 0.804\textdegree \ and Pearson's correlation coefficient (PCC) of 0.971 \textpm\ 0.025. The results demonstrate that SFTIK could forecast the thigh's angle accurately with low computational cost, which could serve as a terrain adaptive trajectory planning method for hip exoskeletons. Codes and data are available at https://github.com/RuoqiZhao116/SFTIK.

Terrain-Aware Stride-Level Trajectory Forecasting for a Powered Hip Exoskeleton via Vision and Kinematics Fusion

TL;DR

A network called Sandwich Fusion Transformer for Image and Kinematics (SFTIK), which predicts the thigh angle of the ensuing stride given the terrain images at the beginning of the preceding and the ensuing stride and the IMU time series during the preceding stride, is designed.

Abstract

Powered hip exoskeletons have shown the ability for locomotion assistance during treadmill walking. However, providing suitable assistance in real-world walking scenarios which involve changing terrain remains challenging. Recent research suggests that forecasting the lower limb joint's angles could provide target trajectories for exoskeletons and prostheses, and the performance could be improved with visual information. In this letter, We share a real-world dataset of 10 healthy subjects walking through five common types of terrain with stride-level label. We design a network called Sandwich Fusion Transformer for Image and Kinematics (SFTIK), which predicts the thigh angle of the ensuing stride given the terrain images at the beginning of the preceding and the ensuing stride and the IMU time series during the preceding stride. We introduce width-level patchify, tailored for egocentric terrain images, to reduce the computational demands. We demonstrate the proposed sandwich input and fusion mechanism could significantly improve the forecasting performance. Overall, the SFTIK outperforms baseline methods, achieving a computational efficiency of 3.31 G Flops, and root mean square error (RMSE) of 3.445 \textpm \ 0.804\textdegree \ and Pearson's correlation coefficient (PCC) of 0.971 \textpm\ 0.025. The results demonstrate that SFTIK could forecast the thigh's angle accurately with low computational cost, which could serve as a terrain adaptive trajectory planning method for hip exoskeletons. Codes and data are available at https://github.com/RuoqiZhao116/SFTIK.
Paper Structure (23 sections, 5 equations, 6 figures, 7 tables)

This paper contains 23 sections, 5 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: The experimental procedure for data collection. (a) shows the sensors used. A D435 camera is fixed on the chest. One IMU is positioned on the posterior pelvis and two IMUs are positioned on the left and right tights respectively. (b) illustrates the route employed for the data collection, encompassing three distinct trails. (c) presents an in-situ photo captured during the data collection. All the sensors mentioned before are linked to a laptop within a backpack.
  • Figure 2: The framework of proposed SFTIK. (a) for each step, paired kinematic data and terrain image ($K_n$, $I_n$) were constructed based on the maximum hip extension. (b) the inputs of SFTIK ($K_{n-1}$, $I_{n-1}$, $I_n$) were patched and embedded into the same dimension $D_{emb}$. (c) the sandwich fusion mechanism first applied $N_1$ layers of transformer blocks to learn the relationship between $K_{n-1}$ and $I_{n-1}$. Then $N_2$ layers of transformer blocks were used to forecast the latent feature of thigh angle $A_n$ according to $I_n$ and the previously learned relationship. Finally, a combination of mean pooling and feed forward network produced the value of $A_n$ .
  • Figure 3: Comparative Illustration of Fusion Methods with Identical Depth. (a) shows the Sandwich Fusion method, showcasing the two-step concatenation process involving previous and current stride's terrain images and kinematic data. (b) illustrates the Early Fusion method, where all patches are concatenated prior to processing through transformer blocks. (c) shows the Late Fusion approach, highlighting the sequential application of transformer blocks to individual patches followed by a collective concatenation and subsequent transformer block processing.
  • Figure 4: Time series pattern comparison between sliding window and stride level. (a) Sliding window method with look-back, prediction and stride length = 100. (b) Stride-level method with one gait cycle length = 100.
  • Figure 5: Box-plot of RMSE across modality and patch methods of image (* means p < 0.05, ** means p < 0.01, and *** means p < 0.001).
  • ...and 1 more figures