PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging

Mattias Paul Heinrich

PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging

Mattias Paul Heinrich

TL;DR

PointVoxelFormer introduces a hybrid point-voxel framework for 3D medical imaging that alternates point-wise MLP processing with differentiable rasterisation to efficiently fuse high-resolution point features and low-resolution voxel features. The method enables deformable registration via an early fusion of source and target coordinates and a two-step, inverse-consistent formulation, achieving large gains in speed and memory while improving registration accuracy. Across segmentation and registration benchmarks on ultrasound and CT datasets, PointVoxelFormer outperforms kNN-based and pure rasterisation baselines, delivering up to threefold speed-ups, fivefold memory reduction, and substantial reductions in target registration error. The work demonstrates that hybrid point-voxel architectures can provide modality-agnostic, privacy-preserving, and on-device-friendly solutions for 3D medical imaging with strong practical impact.

Abstract

Point clouds are a very efficient way to represent volumetric data in medical imaging. First, they do not occupy resources for empty spaces and therefore can avoid trade-offs between resolution and field-of-view for voxel-based 3D convolutional networks (CNNs) - leading to smaller and robust models. Second, they provide a modality agnostic representation of anatomical surfaces and shapes to avoid domain gaps for generic geometric models. Third, they remove identifiable patient-specific information and may increase privacy preservation when publicly sharing data. Despite their benefits, point clouds are still underexplored in medical imaging compared to volumetric 3D CNNs and vision transformers. To date both datasets and stringent studies on comparative strengths and weaknesses of methodological choices are missing. Interactions and information exchange of spatially close points - e.g. through k-nearest neighbour graphs in edge convolutions or point transformations - within points clouds are crucial for learning geometrically meaningful features but may incur computational bottlenecks. This work presents a hybrid approach that combines point-wise operations with intermediate differentiable rasterisation and dense localised CNNs. For deformable point cloud registration, we devise an early fusion scheme for coordinate features that joins both clouds within a common reference frame and is coupled with an inverse consistent, two-step alignment architecture. Our extensive experiments on three different datasets for segmentation and registration demonstrate that our method, PointVoxelFormer, enables very compact models that excel with threefold speed-ups, fivefold memory reduction and over 30% registration error reduction against edge convolutions and other state-of-the-art models in geometric deep learning.

PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging

TL;DR

Abstract

PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)