Table of Contents
Fetching ...

Dual Invariance Self-training for Reliable Semi-supervised Surgical Phase Recognition

Sahar Nasirihaghighi, Negin Ghamsarian, Raphael Sznitman, Klaus Schoeffmann

TL;DR

This work tackles the challenge of scarce labeled data in surgical phase recognition by introducing DIST, a semi-supervised framework that enforces Temporal and Transformation Invariance to obtain reliable pseudo-labels. The method operates in two stages: a teacher–student cycle that first filters pseudo-labels via a reliability score derived from temporal consistency and dual invariance across two views, then a second refinement with improved pseudo-labels. DIST consistently surpasses supervised and state-of-the-art SSL baselines on Cataract-1k and Cholec80 across multiple architectures, especially in data-scarce settings, and reduces labeling requirements without increasing test-time. The results suggest DIST’s pseudo-label filtering approach robustly aligns decision boundaries with the true data distribution, with implications for broader video analysis tasks.

Abstract

Accurate surgical phase recognition is crucial for advancing computer-assisted interventions, yet the scarcity of labeled data hinders training reliable deep learning models. Semi-supervised learning (SSL), particularly with pseudo-labeling, shows promise over fully supervised methods but often lacks reliable pseudo-label assessment mechanisms. To address this gap, we propose a novel SSL framework, Dual Invariance Self-Training (DIST), that incorporates both Temporal and Transformation Invariance to enhance surgical phase recognition. Our two-step self-training process dynamically selects reliable pseudo-labels, ensuring robust pseudo-supervision. Our approach mitigates the risk of noisy pseudo-labels, steering decision boundaries toward true data distribution and improving generalization to unseen data. Evaluations on Cataract and Cholec80 datasets show our method outperforms state-of-the-art SSL approaches, consistently surpassing both supervised and SSL baselines across various network architectures.

Dual Invariance Self-training for Reliable Semi-supervised Surgical Phase Recognition

TL;DR

This work tackles the challenge of scarce labeled data in surgical phase recognition by introducing DIST, a semi-supervised framework that enforces Temporal and Transformation Invariance to obtain reliable pseudo-labels. The method operates in two stages: a teacher–student cycle that first filters pseudo-labels via a reliability score derived from temporal consistency and dual invariance across two views, then a second refinement with improved pseudo-labels. DIST consistently surpasses supervised and state-of-the-art SSL baselines on Cataract-1k and Cholec80 across multiple architectures, especially in data-scarce settings, and reduces labeling requirements without increasing test-time. The results suggest DIST’s pseudo-label filtering approach robustly aligns decision boundaries with the true data distribution, with implications for broader video analysis tasks.

Abstract

Accurate surgical phase recognition is crucial for advancing computer-assisted interventions, yet the scarcity of labeled data hinders training reliable deep learning models. Semi-supervised learning (SSL), particularly with pseudo-labeling, shows promise over fully supervised methods but often lacks reliable pseudo-label assessment mechanisms. To address this gap, we propose a novel SSL framework, Dual Invariance Self-Training (DIST), that incorporates both Temporal and Transformation Invariance to enhance surgical phase recognition. Our two-step self-training process dynamically selects reliable pseudo-labels, ensuring robust pseudo-supervision. Our approach mitigates the risk of noisy pseudo-labels, steering decision boundaries toward true data distribution and improving generalization to unseen data. Evaluations on Cataract and Cholec80 datasets show our method outperforms state-of-the-art SSL approaches, consistently surpassing both supervised and SSL baselines across various network architectures.

Paper Structure

This paper contains 5 sections, 3 equations, 2 figures, 3 tables, 1 algorithm.

Figures (2)

  • Figure 1: The framework of the proposed model
  • Figure 2: Comparison of the number of pseudo labels in the proposed model.