Table of Contents
Fetching ...

Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction

Zhongnuo Yan, Xin Yang, Mingyuan Luo, Jiongquan Chen, Rusi Chen, Lian Liu, Dong Ni

TL;DR

This work proposes ReMamba, which mines multi-scale spatio-temporal information by devising a multi-directional SSM, and proposes an adaptive fusion strategy that introduces multiple inertial measurement units as auxiliary temporal information to enhance spatio-temporal perception.

Abstract

Fine-grained spatio-temporal learning is crucial for freehand 3D ultrasound reconstruction. Previous works mainly resorted to the coarse-grained spatial features and the separated temporal dependency learning and struggles for fine-grained spatio-temporal learning. Mining spatio-temporal information in fine-grained scales is extremely challenging due to learning difficulties in long-range dependencies. In this context, we propose a novel method to exploit the long-range dependency management capabilities of the state space model (SSM) to address the above challenge. Our contribution is three-fold. First, we propose ReMamba, which mines multi-scale spatio-temporal information by devising a multi-directional SSM. Second, we propose an adaptive fusion strategy that introduces multiple inertial measurement units as auxiliary temporal information to enhance spatio-temporal perception. Last, we design an online alignment strategy that encodes the temporal information as pseudo labels for multi-modal alignment to further improve reconstruction performance. Extensive experimental validations on two large-scale datasets show remarkable improvement from our method over competitors.

Fine-grained Context and Multi-modal Alignment for Freehand 3D Ultrasound Reconstruction

TL;DR

This work proposes ReMamba, which mines multi-scale spatio-temporal information by devising a multi-directional SSM, and proposes an adaptive fusion strategy that introduces multiple inertial measurement units as auxiliary temporal information to enhance spatio-temporal perception.

Abstract

Fine-grained spatio-temporal learning is crucial for freehand 3D ultrasound reconstruction. Previous works mainly resorted to the coarse-grained spatial features and the separated temporal dependency learning and struggles for fine-grained spatio-temporal learning. Mining spatio-temporal information in fine-grained scales is extremely challenging due to learning difficulties in long-range dependencies. In this context, we propose a novel method to exploit the long-range dependency management capabilities of the state space model (SSM) to address the above challenge. Our contribution is three-fold. First, we propose ReMamba, which mines multi-scale spatio-temporal information by devising a multi-directional SSM. Second, we propose an adaptive fusion strategy that introduces multiple inertial measurement units as auxiliary temporal information to enhance spatio-temporal perception. Last, we design an online alignment strategy that encodes the temporal information as pseudo labels for multi-modal alignment to further improve reconstruction performance. Extensive experimental validations on two large-scale datasets show remarkable improvement from our method over competitors.
Paper Structure (9 sections, 4 equations, 5 figures, 1 table)

This paper contains 9 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Pipeline of freehand 3D US reconstruction with multiple IMUs.
  • Figure 2: Overview of the proposed FiMA.
  • Figure 3: Detail design of ReMamba Block.
  • Figure 4: Details of Fusion Module. Its input are the image features from ReMamba, the acceleration and Euler angles of multiple IMUs. It outputs multi-modal fused feature.
  • Figure 5: Reconstruction examples produced by our proposed method. Red surface denotes the vessels reconstructed. Probe trajectory represents the scanning path.