Airway Label Prediction in Video Bronchoscopy: Capturing Temporal Dependencies Utilizing Anatomical Knowledge

Ron Keuth; Mattias Heinrich; Martin Eichenlaub; Marian Himstedt

Airway Label Prediction in Video Bronchoscopy: Capturing Temporal Dependencies Utilizing Anatomical Knowledge

Ron Keuth, Mattias Heinrich, Martin Eichenlaub, Marian Himstedt

TL;DR

This paper tackles vision-only navigation for video bronchoscopy in the absence of electromagnetic tracking and patient-specific CT scans by localizing the bronchoscope within an interpatient airway model. It combines CNN-based single-frame airway classification with a Hidden Markov Model that enforces anatomical plausibility through a distance-based regularization on the bronchial tree, using a Viterbi-based inference (with an approximate forward-backward) to exploit temporal context. Calibrated CNN likelihoods feed a dynamic program that balances data terms and anatomical priors, optimized on validation data to yield a robust weighting parameter. Phantom experiments demonstrate substantial improvements in frame-level accuracy and reduced spatial error when temporal context and anatomy-aware regularization are incorporated, highlighting potential for ICU-guided bronchoscopy without CTs or EMT. Overall, the work presents the first vision-only, topology-aware bronchoscopy guidance approach and points to path toward broader clinical deployment outside biopsy settings.

Abstract

Purpose: Navigation guidance is a key requirement for a multitude of lung interventions using video bronchoscopy. State-of-the-art solutions focus on lung biopsies using electromagnetic tracking and intraoperative image registration w.r.t. preoperative CT scans for guidance. The requirement of patient-specific CT scans hampers the utilisation of navigation guidance for other applications such as intensive care units. Methods: This paper addresses navigation guidance solely incorporating bronchosopy video data. In contrast to state-of-the-art approaches we entirely omit the use of electromagnetic tracking and patient-specific CT scans. Guidance is enabled by means of topological bronchoscope localization w.r.t. an interpatient airway model. Particularly, we take maximally advantage of anatomical constraints of airway trees being sequentially traversed. This is realized by incorporating sequences of CNN-based airway likelihoods into a Hidden Markov Model. Results: Our approach is evaluated based on multiple experiments inside a lung phantom model. With the consideration of temporal context and use of anatomical knowledge for regularization, we are able to improve the accuracy up to to 0.98 compared to 0.81 (weighted F1: 0.98 compared to 0.81) for a classification based on individual frames. Conclusion: We combine CNN-based single image classification of airway segments with anatomical constraints and temporal HMM-based inference for the first time. Our approach renders vision-only guidance for bronchoscopy interventions in the absence of electromagnetic tracking and patient-specific CT scans possible.

Airway Label Prediction in Video Bronchoscopy: Capturing Temporal Dependencies Utilizing Anatomical Knowledge

TL;DR

Abstract

Paper Structure (19 sections, 9 equations, 7 figures, 2 tables)

This paper contains 19 sections, 9 equations, 7 figures, 2 tables.

Introduction
Related work
Airway classification
Electromagnetic Navigation Bronchoscopy
Endobronchial pose estimation
Summary
Methods
Dataset
Classification pipeline
Calibration of cnn classifier
Dynamic programming for time domain
hmm
Data and regularization term
Viterbi algorithm
Approximation of the forward-backward algorithm
...and 4 more sections

Figures (7)

Figure 1: Interpatient model with multi-label segmentation of airways generated based on falta2022. Brochoscopy video frames are assigned labels according to this anatomical model. Our approach predicts the airway label of the current bronchoscope location in a topological manner (grey circle; dashed line).
Figure 2: Class distribution of each dataset split in percent. Please see Fig. \ref{['fig:baumgraph']} for the anatomical position of each label.
Figure 3: Structure of our proposed pipeline for the image-based localization of the endoscope during a vb. $f$ maps the current frame $\bm{m}_{n\in[1,\dots, N]}$ to its corresponding semantic segmentation $\hat{s}_n$. The classifier $e$ predicts the likelihood $\hat{p}(\hat{s}_n\vert\omega_n)$ for each possible anatomical label $\omega$ based on $\hat{s}_n$. Finally, a hmm captures the temporal context and predicts the posterior probability $\hat{p}(\omega_n\vert\bm{m}_{1,\dots,N})$ for the current frame given the whole sequence of frames. $f$ and $e$ are implemented via two cnn with their trainable parameters $\bm{\theta}$.
Figure 4: An undirectional tree graph modelling the bronchial tree with labeled bronchial branches (nodes) covered by our phantom (see Fig. \ref{['fig:cover_figure']}). For simplicity, the distance between adjacent bronchial branches is 1 regardless of their actual anatomical distance.
Figure 5: Intuition for the approximation of the forward-backward algorithm via two Viterbi, enabling the calculation of the marginal distribution over all classes being proportional to the posterior probabilities. The $n\in N$ individual steps of the sequence with their possible labels $\omega\in\Omega$ are shown from left to right. All paths considered by the forward-backward algorithm are drawn in red. The blue hull marks the incoming paths covered by the Viterbi $m_n^f(\omega_1)$ running forward through the sequence, and the orange hull the one running backwards and covering all outgoing paths $m_n^b(\omega_1)$.
...and 2 more figures

Airway Label Prediction in Video Bronchoscopy: Capturing Temporal Dependencies Utilizing Anatomical Knowledge

TL;DR

Abstract

Airway Label Prediction in Video Bronchoscopy: Capturing Temporal Dependencies Utilizing Anatomical Knowledge

Authors

TL;DR

Abstract

Table of Contents

Figures (7)