Table of Contents
Fetching ...

3D Holistic OR Anonymization

Tony Danjun Wang

TL;DR

This work tackles privacy-preserving analysis of multi-view RGB-D operating-room videos by introducing a 3D-centric anonymization pipeline that first localizes faces in 3D and then reprojects texture-mapped replacements into all views, preserving the data distribution for downstream tasks. It contributes a new multi-view OR RGB-D dataset captured during real swine-based laparoscopic procedures, along with a complete pipeline that fuses 3D key-points, SMPL-based mesh fitting, and texture rendering via an adversarial autoencoder, together with occlusion-aware back-projection. Through extensive evaluation, the approach achieves superior face localization in challenging OR views and produces more realistic anonymized faces than state-of-the-art 2D-detection–based methods and GAN-based baselines, while maintaining task-relevant information better than naive obfuscation. The work highlights the practical impact of leveraging 3D information for privacy in surgical data, outlines limitations (e.g., dependence on 3D key-points, two-step mesh fitting), and points to open-source tooling for reproducibility and future improvements.

Abstract

We propose a novel method that leverages 3D information to automatically anonymize multi-view RGB-D video recordings of operating rooms (OR). Our anonymization method preserves the original data distribution by replacing the faces in each image with different faces so that the data remains suitable for further downstream tasks. In contrast to established anonymization methods, our approach localizes faces in 3D space first rather than in 2D space. Each face is then anonymized by reprojecting a different face back into each camera view, ultimately replacing the original faces in the resulting images. Furthermore, we introduce a multi-view RGB-D dataset, captured during a real operation of experienced surgeons performing laparoscopic surgery on an animal object (swine), which encapsulates typical characteristics of ORs. Finally, we present experimental results evaluated on that dataset, showing that leveraging 3D data can achieve better face localization in OR images and generate more realistic faces than the current state-of-the-art. There has been, to our knowledge, no prior work that addresses the anonymization of multi-view OR recordings, nor 2D face localization that leverages 3D information.

3D Holistic OR Anonymization

TL;DR

This work tackles privacy-preserving analysis of multi-view RGB-D operating-room videos by introducing a 3D-centric anonymization pipeline that first localizes faces in 3D and then reprojects texture-mapped replacements into all views, preserving the data distribution for downstream tasks. It contributes a new multi-view OR RGB-D dataset captured during real swine-based laparoscopic procedures, along with a complete pipeline that fuses 3D key-points, SMPL-based mesh fitting, and texture rendering via an adversarial autoencoder, together with occlusion-aware back-projection. Through extensive evaluation, the approach achieves superior face localization in challenging OR views and produces more realistic anonymized faces than state-of-the-art 2D-detection–based methods and GAN-based baselines, while maintaining task-relevant information better than naive obfuscation. The work highlights the practical impact of leveraging 3D information for privacy in surgical data, outlines limitations (e.g., dependence on 3D key-points, two-step mesh fitting), and points to open-source tooling for reproducibility and future improvements.

Abstract

We propose a novel method that leverages 3D information to automatically anonymize multi-view RGB-D video recordings of operating rooms (OR). Our anonymization method preserves the original data distribution by replacing the faces in each image with different faces so that the data remains suitable for further downstream tasks. In contrast to established anonymization methods, our approach localizes faces in 3D space first rather than in 2D space. Each face is then anonymized by reprojecting a different face back into each camera view, ultimately replacing the original faces in the resulting images. Furthermore, we introduce a multi-view RGB-D dataset, captured during a real operation of experienced surgeons performing laparoscopic surgery on an animal object (swine), which encapsulates typical characteristics of ORs. Finally, we present experimental results evaluated on that dataset, showing that leveraging 3D data can achieve better face localization in OR images and generate more realistic faces than the current state-of-the-art. There has been, to our knowledge, no prior work that addresses the anonymization of multi-view OR recordings, nor 2D face localization that leverages 3D information.
Paper Structure (42 sections, 17 figures, 6 tables)

This paper contains 42 sections, 17 figures, 6 tables.

Figures (17)

  • Figure 1: Overview of the OR with Labels. The plethora of different equipment in the OR is one of the reasons, which makes the OR a complex environment.
  • Figure 2: Overview of the Cameras' Positions in the OR. The four cameras were positioned in order to capture as much as possible from the operating table and the OR. That is, while cameras 01, 03, and 04 provide a wide-angle view over the room, camera 02 points face down focusing merely on the operating table. We use the cardinal directions in order to describe the OR setup as indicated by the denotations of each corner. know_your_sensors
  • Figure 3: Frame of Evaluation Dataset 2. This evaluation dataset resembles a sterile environment, for all people are wearing surgical hats, gowns, scrubs and medical masks. The image of cn02 (top right) and cn04 (bottom right) depict how faces are usually partially obstructed by the overhead lights in this evaluation dataset.
  • Figure 4: Depth Map From All Four Camera Views. Depth maps associate each RGB pixel with a depth value (if available). Since the FOV of depth sensors are smaller than that of the RGB sensors, merely the non-white and non-faint values in the octagonal shape are valid values (i.e., non-zero values).
  • Figure 5: Merged Point Cloud. This point cloud depicts a frame from the second evaluation dataset (cf. \ref{['subsec:evaluation_dataset_2']}) viewed from the south side of the OR. The four individual point clouds of each camera were merged into one, using the extrinsic parameters of each camera.
  • ...and 12 more figures