LOOC: Localizing Organs using Occupancy Networks and Body Surface Depth Images

Pit Henrich; Franziska Mathis-Ullrich

LOOC: Localizing Organs using Occupancy Networks and Body Surface Depth Images

Pit Henrich, Franziska Mathis-Ullrich

TL;DR

This work tackles the challenge of non-invasively localizing 67 anatomical structures from a single depth image. It introduces a multi-class occupancy-network model conditioned on the sensor point cloud, augmented by a revised SortSample to handle densely packed internal structures, and it generates patient-specific 3D anatomical atlases. Trained on augmented CT-derived masks from the Atlas Dataset, the method outperforms a template-matching baseline on 50 held-out masks and yields qualitative real-world reconstructions for 12 clothed individuals, with an average inference time of about 3.2 seconds on a high-end GPU. The approach promises practical impact for automated medical imaging and diagnostic workflows by enabling accurate, non-invasive localization of critical structures from simple depth sensing, while acknowledging limitations related to pose variation, clothing, and complex anatomy.

Abstract

We introduce a novel approach for the precise localization of 67 anatomical structures from single depth images captured from the exterior of the human body. Our method uses a multi-class occupancy network, trained using segmented CT scans augmented with body-pose changes, and incorporates a specialized sampling strategy to handle densely packed internal organs. Our contributions include the application of occupancy networks for occluded structure localization, a robust method for estimating anatomical positions from depth images, and the creation of detailed, individualized 3D anatomical atlases. We outperform localization using template matching and provide qualitative real-world reconstructions. This method promises improvements in automated medical imaging and diagnostic procedures by offering accurate, non-invasive localization of critical anatomical structures.

LOOC: Localizing Organs using Occupancy Networks and Body Surface Depth Images

TL;DR

Abstract

Paper Structure (16 sections, 8 figures)

This paper contains 16 sections, 8 figures.

Introduction
Related Work
Contribution
Method
Preliminaries on Occupancy Networks
Revised SortSample
Training Data
Evaluation Data
Template Matching Baseline
Evaluation Method
Results and Discussion
Inference Time
Comparison to Baseline
Details on Results of Occupancy Network
Limitations
...and 1 more sections

Figures (8)

Figure 1: A real-world depth image (a) is converted to a point cloud (b). Our occupancy network, conditioned on the point cloud, estimates the bounding boxes of 67 anatomical structures (c). It also generates a patient specific 3D anatomical atlas (d). As shown in (e), changes in the patient's body pose are reflected in the output.
Figure 2: A Patient lying on the insertion table of a Medical Scanner. The Sensor Point Cloud from the fixed Depth Sensor is used to automatically estimate Axis Aligned Bounding Boxes (AABBs) for $67$ Anatomical Structures (ANSs). The AABB of the wanted ANS is considered the ROI. The ROI is used as the Scan Region, which is imaged by the Medical Scanner to produce a volumetric Image Stack.
Figure 3: The Sensor Point Cloud is normalized and passed to a Point Drop node, that randomly discards up to $70\%$ of all points. The remaining points are the input to PointNet++ which distills a latent vector $(l_1,\cdots,l_{1024})$. The latent vector is appended with a query point $(x,y,z)$. The combined vector is used as input to the MLP. Additionally, a skip connection to the layer $5$ is used. Between each layer, ReLU is used as an activation function. During training, batch normalization is used for the hidden layers of the MLP. The output is the predicted occupancy value (one-hot) and the distance to the signed nearest surface. The output is denormalized to obtain the output in the original camera coordinate system.
Figure 4: The data and training pipeline. A random mask is selected from the Atlas Dataset. Anatomical structures are grouped and meshes are extracted. To produce training data, all meshes obtained from a mask are augmented through deformations and camera movements. The improved SortSample is used to obtain an Occupancy Samples. Simultaneously, a Sensor Point Cloud from the camera perspective is created. The Query Point Cloud is obtained by removing class information from the Occupancy Samples. The Occupancy Network, conditioned on the Sensor Point Cloud, estimates the labels for all point in the Query Point Cloud. The loss is computed with respect to the Occupancy Samples and used to update the Occupancy Network.
Figure 5: Examples of the mask data augmentation using a $4\times4\times4$ lattice (red dots in (a)). The original mask (a) is augmented using the lattice to obtain augmentations (b), (c), and (d).
...and 3 more figures

LOOC: Localizing Organs using Occupancy Networks and Body Surface Depth Images

TL;DR

Abstract

LOOC: Localizing Organs using Occupancy Networks and Body Surface Depth Images

Authors

TL;DR

Abstract

Table of Contents

Figures (8)