Table of Contents
Fetching ...

SPAQ-DL-SLAM: Towards Optimizing Deep Learning-based SLAM for Resource-Constrained Embedded Platforms

Niraj Pudasaini, Muhammad Abdullah Hanif, Muhammad Shafique

TL;DR

This paper presents SPAQ-DL-SLAM, a framework that strategically applies Structured Pruning and Quantization (SPAQ) to the architecture of one of the state-of-the-art DL-SLAM algorithms, DROID-SLAM, for resource and energy-efficiency.

Abstract

Optimizing Deep Learning-based Simultaneous Localization and Mapping (DL-SLAM) algorithms is essential for efficient implementation on resource-constrained embedded platforms, enabling real-time on-board computation in autonomous mobile robots. This paper presents SPAQ-DL-SLAM, a framework that strategically applies Structured Pruning and Quantization (SPAQ) to the architecture of one of the state-ofthe-art DL-SLAM algorithms, DROID-SLAM, for resource and energy-efficiency. Specifically, we perform structured pruning with fine-tuning based on layer-wise sensitivity analysis followed by 8-bit post-training static quantization (PTQ) on the deep learning modules within DROID-SLAM. Our SPAQ-DROIDSLAM model, optimized version of DROID-SLAM model using our SPAQ-DL-SLAM framework with 20% structured pruning and 8-bit PTQ, achieves an 18.9% reduction in FLOPs and a 79.8% reduction in overall model size compared to the DROID-SLAM model. Our evaluations on the TUM-RGBD benchmark shows that SPAQ-DROID-SLAM model surpasses the DROID-SLAM model by an average of 10.5% on absolute trajectory error (ATE) metric. Additionally, our results on the ETH3D SLAM training benchmark demonstrate enhanced generalization capabilities of the SPAQ-DROID-SLAM model, seen by a higher Area Under the Curve (AUC) score and success in 2 additional data sequences compared to the DROIDSLAM model. Despite these improvements, the model exhibits performance variance on the distinct Vicon Room sequences from the EuRoC dataset, which are captured at high angular velocities. This varying performance at some distinct scenarios suggests that designing DL-SLAM algorithms taking operating environments and tasks in consideration can achieve optimal performance and resource efficiency for deployment in resource-constrained embedded platforms.

SPAQ-DL-SLAM: Towards Optimizing Deep Learning-based SLAM for Resource-Constrained Embedded Platforms

TL;DR

This paper presents SPAQ-DL-SLAM, a framework that strategically applies Structured Pruning and Quantization (SPAQ) to the architecture of one of the state-of-the-art DL-SLAM algorithms, DROID-SLAM, for resource and energy-efficiency.

Abstract

Optimizing Deep Learning-based Simultaneous Localization and Mapping (DL-SLAM) algorithms is essential for efficient implementation on resource-constrained embedded platforms, enabling real-time on-board computation in autonomous mobile robots. This paper presents SPAQ-DL-SLAM, a framework that strategically applies Structured Pruning and Quantization (SPAQ) to the architecture of one of the state-ofthe-art DL-SLAM algorithms, DROID-SLAM, for resource and energy-efficiency. Specifically, we perform structured pruning with fine-tuning based on layer-wise sensitivity analysis followed by 8-bit post-training static quantization (PTQ) on the deep learning modules within DROID-SLAM. Our SPAQ-DROIDSLAM model, optimized version of DROID-SLAM model using our SPAQ-DL-SLAM framework with 20% structured pruning and 8-bit PTQ, achieves an 18.9% reduction in FLOPs and a 79.8% reduction in overall model size compared to the DROID-SLAM model. Our evaluations on the TUM-RGBD benchmark shows that SPAQ-DROID-SLAM model surpasses the DROID-SLAM model by an average of 10.5% on absolute trajectory error (ATE) metric. Additionally, our results on the ETH3D SLAM training benchmark demonstrate enhanced generalization capabilities of the SPAQ-DROID-SLAM model, seen by a higher Area Under the Curve (AUC) score and success in 2 additional data sequences compared to the DROIDSLAM model. Despite these improvements, the model exhibits performance variance on the distinct Vicon Room sequences from the EuRoC dataset, which are captured at high angular velocities. This varying performance at some distinct scenarios suggests that designing DL-SLAM algorithms taking operating environments and tasks in consideration can achieve optimal performance and resource efficiency for deployment in resource-constrained embedded platforms.
Paper Structure (16 sections, 12 figures, 8 tables)

This paper contains 16 sections, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Pipeline for SPAQ-DL-SLAM. The original model layers (e.g., Layer k-1 and Layer k) are first analyzed for sensitivity to pruning impacts. This is followed by iterative fine-tuning and structured pruning across all sensitive layers. The model is then quantized to 8-bit using PTQ, enhancing computational efficiency while maintaining or improving performance metrics such as inference speed, model size, and accuracy.
  • Figure 2: DROID-SLAM operational flow, highlighting key components.
  • Figure 3: Feature and Context Network Architecture: Utilizes six residual blocks for 1/8 resolution feature extraction. The feature encoder uses instance normalization, producing $D=128$, whereas the context encoder, which omits normalization uses $D=256$. Figure is adapted from teed2021droid.
  • Figure 4: Update Network: Iteratively integrates context, correlation, and flow features into a GRU, enabling prediction of revisions ($r$) and confidence weights ($w$) from the evolving hidden state teed2021droid. Figure is adapted from teed2021droid.
  • Figure 5: (a) Unstructured pruning involves pruning individual weights in a neural network, leading to a sparse architecture. (b) Structured pruning targets entire filters or channels for removal, which can maintain the dense structure of the network while reducing the number of filters or channels
  • ...and 7 more figures