Table of Contents
Fetching ...

PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging

Mattias Paul Heinrich

TL;DR

PointVoxelFormer introduces a hybrid point-voxel framework for 3D medical imaging that alternates point-wise MLP processing with differentiable rasterisation to efficiently fuse high-resolution point features and low-resolution voxel features. The method enables deformable registration via an early fusion of source and target coordinates and a two-step, inverse-consistent formulation, achieving large gains in speed and memory while improving registration accuracy. Across segmentation and registration benchmarks on ultrasound and CT datasets, PointVoxelFormer outperforms kNN-based and pure rasterisation baselines, delivering up to threefold speed-ups, fivefold memory reduction, and substantial reductions in target registration error. The work demonstrates that hybrid point-voxel architectures can provide modality-agnostic, privacy-preserving, and on-device-friendly solutions for 3D medical imaging with strong practical impact.

Abstract

Point clouds are a very efficient way to represent volumetric data in medical imaging. First, they do not occupy resources for empty spaces and therefore can avoid trade-offs between resolution and field-of-view for voxel-based 3D convolutional networks (CNNs) - leading to smaller and robust models. Second, they provide a modality agnostic representation of anatomical surfaces and shapes to avoid domain gaps for generic geometric models. Third, they remove identifiable patient-specific information and may increase privacy preservation when publicly sharing data. Despite their benefits, point clouds are still underexplored in medical imaging compared to volumetric 3D CNNs and vision transformers. To date both datasets and stringent studies on comparative strengths and weaknesses of methodological choices are missing. Interactions and information exchange of spatially close points - e.g. through k-nearest neighbour graphs in edge convolutions or point transformations - within points clouds are crucial for learning geometrically meaningful features but may incur computational bottlenecks. This work presents a hybrid approach that combines point-wise operations with intermediate differentiable rasterisation and dense localised CNNs. For deformable point cloud registration, we devise an early fusion scheme for coordinate features that joins both clouds within a common reference frame and is coupled with an inverse consistent, two-step alignment architecture. Our extensive experiments on three different datasets for segmentation and registration demonstrate that our method, PointVoxelFormer, enables very compact models that excel with threefold speed-ups, fivefold memory reduction and over 30% registration error reduction against edge convolutions and other state-of-the-art models in geometric deep learning.

PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging

TL;DR

PointVoxelFormer introduces a hybrid point-voxel framework for 3D medical imaging that alternates point-wise MLP processing with differentiable rasterisation to efficiently fuse high-resolution point features and low-resolution voxel features. The method enables deformable registration via an early fusion of source and target coordinates and a two-step, inverse-consistent formulation, achieving large gains in speed and memory while improving registration accuracy. Across segmentation and registration benchmarks on ultrasound and CT datasets, PointVoxelFormer outperforms kNN-based and pure rasterisation baselines, delivering up to threefold speed-ups, fivefold memory reduction, and substantial reductions in target registration error. The work demonstrates that hybrid point-voxel architectures can provide modality-agnostic, privacy-preserving, and on-device-friendly solutions for 3D medical imaging with strong practical impact.

Abstract

Point clouds are a very efficient way to represent volumetric data in medical imaging. First, they do not occupy resources for empty spaces and therefore can avoid trade-offs between resolution and field-of-view for voxel-based 3D convolutional networks (CNNs) - leading to smaller and robust models. Second, they provide a modality agnostic representation of anatomical surfaces and shapes to avoid domain gaps for generic geometric models. Third, they remove identifiable patient-specific information and may increase privacy preservation when publicly sharing data. Despite their benefits, point clouds are still underexplored in medical imaging compared to volumetric 3D CNNs and vision transformers. To date both datasets and stringent studies on comparative strengths and weaknesses of methodological choices are missing. Interactions and information exchange of spatially close points - e.g. through k-nearest neighbour graphs in edge convolutions or point transformations - within points clouds are crucial for learning geometrically meaningful features but may incur computational bottlenecks. This work presents a hybrid approach that combines point-wise operations with intermediate differentiable rasterisation and dense localised CNNs. For deformable point cloud registration, we devise an early fusion scheme for coordinate features that joins both clouds within a common reference frame and is coupled with an inverse consistent, two-step alignment architecture. Our extensive experiments on three different datasets for segmentation and registration demonstrate that our method, PointVoxelFormer, enables very compact models that excel with threefold speed-ups, fivefold memory reduction and over 30% registration error reduction against edge convolutions and other state-of-the-art models in geometric deep learning.

Paper Structure

This paper contains 11 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of our proposed geometric learning framework PointVoxelFormer that alternates between point-wise MLPs on sparse high-resolution point cloud features and differentiable rasterisation coupled with dense shallow 3D CNNs to efficiently incorporate neighbour information.
  • Figure 2: Comparison between previous work on dynamic graph CNNs (DGCNN) which need to compute kNN and gather features irregularly and our approach that first rasterises the feature maps to enable more efficient neighbourhood interactions using dense CNNs. Subsequently the results are sampled at high-resolution point coordinates using trilinear interpolation
  • Figure 3: Automatic extraction of 3D point clouds with edges in background (gray/black) based on original ultrasound scans (left) using the Canny edge detector (middle), morphological processing and point sampling (right).
  • Figure 4: Left: Visual results for the PVT1010 registration task (Case #4 of DirLab-COPD), showing overlay of inspiration (red) and expiration (blue) high-resolution point clouds. Purple indicates good alignment. The target registration errors are shown as average ($\varnothing$) over all ten cases in mm. Right: TRE results for the PVT1010 registration tasks evaluated at every epoch.