Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision

Rulin Zhou; Wenlong He; An Wang; Qiqi Yao; Haijun Hu; Jiankun Wang; Xi Zhang an Hongliang Ren

Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision

Rulin Zhou, Wenlong He, An Wang, Qiqi Yao, Haijun Hu, Jiankun Wang, Xi Zhang an Hongliang Ren

TL;DR

This paper tackles robust tissue point tracking in endoscopic videos where deformation, occlusion, and artifacts challenge tracking and dense annotations are scarce. It introduces Endo-TTAP, integrating a Multi-Facet Guided Attention (MFGA) module that fuses multi-scale flow, semantic embeddings, and motion cues with a two-stage Auxiliary Curriculum Adapter (ACA) to smoothly adapt from synthetic to real data. A hybrid supervision scheme combines unsupervised optical-flow distillation and semi-supervised pseudo-label learning to reduce annotation dependence. Across SurgT, STIR, and the Endo-TAPC5 dataset, Endo-TTAP achieves state-of-the-art accuracy and robustness, especially under occlusion and long sequences, demonstrating potential for improved surgical navigation and scene understanding.

Abstract

Accurate tissue point tracking in endoscopic videos is critical for robotic-assisted surgical navigation and scene understanding, but remains challenging due to complex deformations, instrument occlusion, and the scarcity of dense trajectory annotations. Existing methods struggle with long-term tracking under these conditions due to limited feature utilization and annotation dependence. We present Endo-TTAP, a novel framework addressing these challenges through: (1) A Multi-Facet Guided Attention (MFGA) module that synergizes multi-scale flow dynamics, DINOv2 semantic embeddings, and explicit motion patterns to jointly predict point positions with uncertainty and occlusion awareness; (2) A two-stage curriculum learning strategy employing an Auxiliary Curriculum Adapter (ACA) for progressive initialization and hybrid supervision. Stage I utilizes synthetic data with optical flow ground truth for uncertainty-occlusion regularization, while Stage II combines unsupervised flow consistency and semi-supervised learning with refined pseudo-labels from off-the-shelf trackers. Extensive validation on two MICCAI Challenge datasets and our collected dataset demonstrates that Endo-TTAP achieves state-of-the-art performance in tissue point tracking, particularly in scenarios characterized by complex endoscopic conditions. The source code and dataset will be available at https://anonymous.4open.science/r/Endo-TTAP-36E5.

Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision

TL;DR

Abstract

Endo-TTAP: Robust Endoscopic Tissue Tracking via Multi-Facet Guided Attention and Hybrid Flow-point Supervision

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)