Table of Contents
Fetching ...

REMOTE: Real-time Ego-motion Tracking for Various Endoscopes via Multimodal Visual Feature Learning

Liangjing Shao, Benshuang Chen, Shuting Zhao, Xinrong Chen

TL;DR

The paper tackles real-time ego-motion tracking for endoscopes across varied scenes. It introduces REMOTE, a multimodal visual feature learning framework that exploits scene, motion, and joint features from consecutive frames to predict relative pose, followed by absolute pose calculation from an initial pose. A novel attention-based joint feature extractor and depthwise-separable pose decoder enable rich representation and real-time performance, validated on NEPose, SimCol, and EndoSLAM with state-of-the-art accuracy and >30 fps. This approach has strong potential to enhance navigation and automation in robot-assisted endoscopy.

Abstract

Real-time ego-motion tracking for endoscope is a significant task for efficient navigation and robotic automation of endoscopy. In this paper, a novel framework is proposed to perform real-time ego-motion tracking for endoscope. Firstly, a multi-modal visual feature learning network is proposed to perform relative pose prediction, in which the motion feature from the optical flow, the scene features and the joint feature from two adjacent observations are all extracted for prediction. Due to more correlation information in the channel dimension of the concatenated image, a novel feature extractor is designed based on an attention mechanism to integrate multi-dimensional information from the concatenation of two continuous frames. To extract more complete feature representation from the fused features, a novel pose decoder is proposed to predict the pose transformation from the concatenated feature map at the end of the framework. At last, the absolute pose of endoscope is calculated based on relative poses. The experiment is conducted on three datasets of various endoscopic scenes and the results demonstrate that the proposed method outperforms state-of-the-art methods. Besides, the inference speed of the proposed method is over 30 frames per second, which meets the real-time requirement. The project page is here: remote-bmxs.netlify.app

REMOTE: Real-time Ego-motion Tracking for Various Endoscopes via Multimodal Visual Feature Learning

TL;DR

The paper tackles real-time ego-motion tracking for endoscopes across varied scenes. It introduces REMOTE, a multimodal visual feature learning framework that exploits scene, motion, and joint features from consecutive frames to predict relative pose, followed by absolute pose calculation from an initial pose. A novel attention-based joint feature extractor and depthwise-separable pose decoder enable rich representation and real-time performance, validated on NEPose, SimCol, and EndoSLAM with state-of-the-art accuracy and >30 fps. This approach has strong potential to enhance navigation and automation in robot-assisted endoscopy.

Abstract

Real-time ego-motion tracking for endoscope is a significant task for efficient navigation and robotic automation of endoscopy. In this paper, a novel framework is proposed to perform real-time ego-motion tracking for endoscope. Firstly, a multi-modal visual feature learning network is proposed to perform relative pose prediction, in which the motion feature from the optical flow, the scene features and the joint feature from two adjacent observations are all extracted for prediction. Due to more correlation information in the channel dimension of the concatenated image, a novel feature extractor is designed based on an attention mechanism to integrate multi-dimensional information from the concatenation of two continuous frames. To extract more complete feature representation from the fused features, a novel pose decoder is proposed to predict the pose transformation from the concatenated feature map at the end of the framework. At last, the absolute pose of endoscope is calculated based on relative poses. The experiment is conducted on three datasets of various endoscopic scenes and the results demonstrate that the proposed method outperforms state-of-the-art methods. Besides, the inference speed of the proposed method is over 30 frames per second, which meets the real-time requirement. The project page is here: remote-bmxs.netlify.app

Paper Structure

This paper contains 21 sections, 12 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The pipelines of the proposed network and previous methods (categorized into the structures of A, B and C) to estimate relative pose of endoscopes. A: feature of adjacent observations are extracted separately and then concatenated simcol,offset. B: the feature is extracted from the concatenation of adjacent observations endoslam,liu,shao. C: based on A, LSTM layers are used in the end borrego,li.
  • Figure 2: The pipeline of the proposed framework. $\mathcal{F}_{S}$ represents the feature extractor to extract scene features and the motion feature from two adjacent endoscopic images and the corresponding optical flow respectively. $\mathcal{F}_{J}$ represents the feature extractor to extract joint feature from the concatenation of two frames. In the pipeline of $\mathcal{F}_{J}$, 'Conv1' and 'Conv2' represent the first two layers of ResNet-34.
  • Figure 3: The trajectories tracking based on different methods. A-E are trajectories from dataset NEPose, F-H are trajectories from dataset SimCol (three best methods are compared for clear display), I and J are trajectories from dataset EndoSLAM. Specially, the trajectories in I and J are generated by $\hat{P_{i}}= P_{i-1}\hat{P}_{i-1}^i$ to evaluate the performance of relative pose estimation. The sampled frames of corresponding trajectories are shown at the bottom of the figure. The visualization of real-time position tracking can be found on the project page:https://remote-bmxs.netlify.app.
  • Figure 4: The visualization of directions predicted by different methods. The first row: results from SimCol dataset. The second row: results from EndoSLAM dataset. The third row: results from NEPose dataset. More examples can be found on the project page:https://remote-bmxs.netlify.app.