Table of Contents
Fetching ...

LapFM: A Laparoscopic Segmentation Foundation Model via Hierarchical Concept Evolving Pre-training

Qing Xu, Kun Yuan, Yuxiang Luo, Yuhao Zhai, Wenting Duan, Nassir Navab, Zhen Chen

TL;DR

LapFM introduces a hierarchical concept evolving pre-training framework for laparoscopic segmentation, unifying anatomy, tissue, and instruments under a Laparoscopic Concept Hierarchy and a confidence-driven pseudo-labeling loop to leverage unlabeled data. It builds LapBench-114K by progressively annotating unlabeled images with high-confidence pseudo-labels and uses a transformer-based encoder with a hierarchical mask decoder and hierarchical losses to enable granularity-adaptive segmentation. Extensive experiments across 20 categories show state-of-the-art performance and strong generalization to unseen data, including GynSurg, with expert validation supporting label quality. This work advances universal surgical scene understanding by enabling flexible granularity and prompt-free segmentation in diverse clinical scenarios.

Abstract

Surgical segmentation is pivotal for scene understanding yet remains hindered by annotation scarcity and semantic inconsistency across diverse procedures. Existing approaches typically fine-tune natural foundation models (e.g., SAM) with limited supervision, functioning merely as domain adapters rather than surgical foundation models. Consequently, they struggle to generalize across the vast variability of surgical targets. To bridge this gap, we present LapFM, a foundation model designed to evolve robust segmentation capabilities from massive unlabeled surgical images. Distinct from medical foundation models relying on inefficient self-supervised proxy tasks, LapFM leverages a Hierarchical Concept Evolving Pre-training paradigm. First, we establish a Laparoscopic Concept Hierarchy (LCH) via a hierarchical mask decoder with parent-child query embeddings, unifying diverse entities (i.e., Anatomy, Tissue, and Instrument) into a scalable knowledge structure with cross-granularity semantic consistency. Second, we propose a Confidence-driven Evolving Labeling that iteratively generates and filters pseudo-labels based on hierarchical consistency, progressively incorporating reliable samples from unlabeled images into training. This process yields LapBench-114K, a large-scale benchmark comprising 114K image-mask pairs. Extensive experiments demonstrate that LapFM significantly outperforms state-of-the-art methods, establishing new standards for granularity-adaptive generalization in universal laparoscopic segmentation. The source code is available at https://github.com/xq141839/LapFM.

LapFM: A Laparoscopic Segmentation Foundation Model via Hierarchical Concept Evolving Pre-training

TL;DR

LapFM introduces a hierarchical concept evolving pre-training framework for laparoscopic segmentation, unifying anatomy, tissue, and instruments under a Laparoscopic Concept Hierarchy and a confidence-driven pseudo-labeling loop to leverage unlabeled data. It builds LapBench-114K by progressively annotating unlabeled images with high-confidence pseudo-labels and uses a transformer-based encoder with a hierarchical mask decoder and hierarchical losses to enable granularity-adaptive segmentation. Extensive experiments across 20 categories show state-of-the-art performance and strong generalization to unseen data, including GynSurg, with expert validation supporting label quality. This work advances universal surgical scene understanding by enabling flexible granularity and prompt-free segmentation in diverse clinical scenarios.

Abstract

Surgical segmentation is pivotal for scene understanding yet remains hindered by annotation scarcity and semantic inconsistency across diverse procedures. Existing approaches typically fine-tune natural foundation models (e.g., SAM) with limited supervision, functioning merely as domain adapters rather than surgical foundation models. Consequently, they struggle to generalize across the vast variability of surgical targets. To bridge this gap, we present LapFM, a foundation model designed to evolve robust segmentation capabilities from massive unlabeled surgical images. Distinct from medical foundation models relying on inefficient self-supervised proxy tasks, LapFM leverages a Hierarchical Concept Evolving Pre-training paradigm. First, we establish a Laparoscopic Concept Hierarchy (LCH) via a hierarchical mask decoder with parent-child query embeddings, unifying diverse entities (i.e., Anatomy, Tissue, and Instrument) into a scalable knowledge structure with cross-granularity semantic consistency. Second, we propose a Confidence-driven Evolving Labeling that iteratively generates and filters pseudo-labels based on hierarchical consistency, progressively incorporating reliable samples from unlabeled images into training. This process yields LapBench-114K, a large-scale benchmark comprising 114K image-mask pairs. Extensive experiments demonstrate that LapFM significantly outperforms state-of-the-art methods, establishing new standards for granularity-adaptive generalization in universal laparoscopic segmentation. The source code is available at https://github.com/xq141839/LapFM.

Paper Structure

This paper contains 19 sections, 10 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: (a) The LCH that unifies diverse surgical entities into a scalable taxonomy with three fundamental branches: Anatomy, Tissue, and Instrument. Parent (P) nodes guide Child (C) nodes for granularity-adaptive segmentation from coarse to fine levels. (b) Distribution of segmentation labels per image in our LapBench-114K dataset. (c) Histogram of pixel-level annotations across surgical categories.
  • Figure 2: The overview of the proposed LapFM framework, consisting of (a) a transformer-based image encoder and a hierarchical mask decoder for multi-granularity segmentation. (b) The hierarchical mask decoder leverages parent and child query embeddings with explicit parent-child dependencies, where parent-specific features guide child-level concept segmentation. (c) Detailed architecture of the segmentation head. We illustrate anatomy segmentation as an example to demonstrate how LapFM achieves adaptive hierarchy traversal across granularity levels.
  • Figure 3: The overview of our Confidence-driven Evolving Labeling. This process exploits model-assisted pseudo-labeling and confidence filtering to maximize data utilization while ensuring high-quality annotations.
  • Figure 4: The progression of data expansion via the Confidence-driven Evolving Labeling. This process iteratively corrects low-confidence mask samples from reliable high-quality annotations and progressively integrates datasets with varying categories and granularities, exploiting cross-dataset surgical knowledge to construct LapBench-114K.
  • Figure 5: Comparison of average HD across different methods on anatomical structure segmentation. Our LapFM achieves substantially lower HD (186mm) compared to all baseline methods, demonstrating superior boundary localization precision.
  • ...and 4 more figures