Table of Contents
Fetching ...

Optimised ProPainter for Video Diminished Reality Inpainting

Pengze Li, Lihao Liu, Carola-Bibiane Schönlieb, Angelica I Aviles-Rivero

TL;DR

This work adapts the ProPainter video inpainting framework for medical use, specifically in oral and maxillofacial surgery, by introducing zero-shot optimisations and preprocessing to achieve temporally coherent reconstructions of occluded operative scenes. The methodology combines dual-domain propagation with a Mask-guided Sparse Video Transformer (MSVT) to maintain motion and texture consistency across frames, supplemented by deformable flow completion. Evaluation on a curated surgical video dataset shows superior performance over baselines and Stable Diffusion across multiple metrics, with a top phase 1 ranking in the DREAMING challenge. The results demonstrate the potential of memory-efficient, zero-shot diminished reality for improving visualization in real clinical workflows.

Abstract

In this paper, part of the DREAMING Challenge - Diminished Reality for Emerging Applications in Medicine through Inpainting, we introduce a refined video inpainting technique optimised from the ProPainter method to meet the specialised demands of medical imaging, specifically in the context of oral and maxillofacial surgery. Our enhanced algorithm employs the zero-shot ProPainter, featuring optimized parameters and pre-processing, to adeptly manage the complex task of inpainting surgical video sequences, without requiring any training process. It aims to produce temporally coherent and detail-rich reconstructions of occluded regions, facilitating clearer views of operative fields. The efficacy of our approach is evaluated using comprehensive metrics, positioning it as a significant advancement in the application of diminished reality for medical purposes.

Optimised ProPainter for Video Diminished Reality Inpainting

TL;DR

This work adapts the ProPainter video inpainting framework for medical use, specifically in oral and maxillofacial surgery, by introducing zero-shot optimisations and preprocessing to achieve temporally coherent reconstructions of occluded operative scenes. The methodology combines dual-domain propagation with a Mask-guided Sparse Video Transformer (MSVT) to maintain motion and texture consistency across frames, supplemented by deformable flow completion. Evaluation on a curated surgical video dataset shows superior performance over baselines and Stable Diffusion across multiple metrics, with a top phase 1 ranking in the DREAMING challenge. The results demonstrate the potential of memory-efficient, zero-shot diminished reality for improving visualization in real clinical workflows.

Abstract

In this paper, part of the DREAMING Challenge - Diminished Reality for Emerging Applications in Medicine through Inpainting, we introduce a refined video inpainting technique optimised from the ProPainter method to meet the specialised demands of medical imaging, specifically in the context of oral and maxillofacial surgery. Our enhanced algorithm employs the zero-shot ProPainter, featuring optimized parameters and pre-processing, to adeptly manage the complex task of inpainting surgical video sequences, without requiring any training process. It aims to produce temporally coherent and detail-rich reconstructions of occluded regions, facilitating clearer views of operative fields. The efficacy of our approach is evaluated using comprehensive metrics, positioning it as a significant advancement in the application of diminished reality for medical purposes.
Paper Structure (9 sections, 3 equations, 2 figures, 2 tables)

This paper contains 9 sections, 3 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Schematic of the ProPainter inpainting pipeline in our framework: This diagram illustrates the workflow of our ProPainter-based video inpainting. MSVT: Mask-guided Sparse Video Transformer.
  • Figure 2: Visual comparison of video inpainting on surgical sequences: selected frames demonstrate our method's effectiveness across different time points $(t1, t2, t3, t_{n-1}, t_n)$. 'Case 1' and 'Case 2' compare the original frames with obstructions ('Ground Truth') against the inpainted results ('Ours').