Table of Contents
Fetching ...

Lite ENSAM: a lightweight cancer segmentation model for 3D Computed Tomography

Agnar Martin Bjørnstad, Elias Stenhede, Arian Ranjbar

TL;DR

Lite ENSAM introduces a memory- and compute-efficient adaptation of the ENSAM architecture to convert RECIST diameter annotations into 3D CT tumor segmentations under CPU constraints. The model uses a 3D U‑Net backbone with a prompt encoder and a SAM-style cross-attention mechanism guided by diameter endpoints encoded with Lie Rotational Positional Encoding, while applying substantial memory optimizations and a non-interactive workflow. Evaluated on the FLARE 2025 dataset, Lite ENSAM achieves a DSC of approximately 76% and an NSD around 79% on public validation, with CPU inference averaging ~14 seconds and RAM usage well below 8 GB. This work demonstrates that accurate volumetric tumor segmentation from RECIST annotations is feasible in resource-constrained clinical settings, potentially enabling broader adoption of volumetric response assessment in cancer care.

Abstract

Accurate tumor size measurement is a cornerstone of evaluating cancer treatment response. The most widely adopted standard for this purpose is the Response Evaluation Criteria in Solid Tumors (RECIST) v1.1, which relies on measuring the longest tumor diameter in a single plane. However, volumetric measurements have been shown to provide a more reliable assessment of treatment effect. Their clinical adoption has been limited, though, due to the labor-intensive nature of manual volumetric annotation. In this paper, we present Lite ENSAM, a lightweight adaptation of the ENSAM architecture designed for efficient volumetric tumor segmentation from CT scans annotated with RECIST annotations. Lite ENSAM was submitted to the MICCAI FLARE 2025 Task 1: Pan-cancer Segmentation in CT Scans, Subtask 2, where it achieved a Dice Similarity Coefficient (DSC) of 60.7% and a Normalized Surface Dice (NSD) of 63.6% on the hidden test set, and an average total RAM time of 50.6 GBs and an average inference time of 14.4 s on CPU on the public validation dataset.

Lite ENSAM: a lightweight cancer segmentation model for 3D Computed Tomography

TL;DR

Lite ENSAM introduces a memory- and compute-efficient adaptation of the ENSAM architecture to convert RECIST diameter annotations into 3D CT tumor segmentations under CPU constraints. The model uses a 3D U‑Net backbone with a prompt encoder and a SAM-style cross-attention mechanism guided by diameter endpoints encoded with Lie Rotational Positional Encoding, while applying substantial memory optimizations and a non-interactive workflow. Evaluated on the FLARE 2025 dataset, Lite ENSAM achieves a DSC of approximately 76% and an NSD around 79% on public validation, with CPU inference averaging ~14 seconds and RAM usage well below 8 GB. This work demonstrates that accurate volumetric tumor segmentation from RECIST annotations is feasible in resource-constrained clinical settings, potentially enabling broader adoption of volumetric response assessment in cancer care.

Abstract

Accurate tumor size measurement is a cornerstone of evaluating cancer treatment response. The most widely adopted standard for this purpose is the Response Evaluation Criteria in Solid Tumors (RECIST) v1.1, which relies on measuring the longest tumor diameter in a single plane. However, volumetric measurements have been shown to provide a more reliable assessment of treatment effect. Their clinical adoption has been limited, though, due to the labor-intensive nature of manual volumetric annotation. In this paper, we present Lite ENSAM, a lightweight adaptation of the ENSAM architecture designed for efficient volumetric tumor segmentation from CT scans annotated with RECIST annotations. Lite ENSAM was submitted to the MICCAI FLARE 2025 Task 1: Pan-cancer Segmentation in CT Scans, Subtask 2, where it achieved a Dice Similarity Coefficient (DSC) of 60.7% and a Normalized Surface Dice (NSD) of 63.6% on the hidden test set, and an average total RAM time of 50.6 GBs and an average inference time of 14.4 s on CPU on the public validation dataset.

Paper Structure

This paper contains 23 sections, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Network architecture of Lite ENSAM, consisting of three main components: image encoder, prompt encoder, and mask decoder. The diameter markings are incorporated via cross-attention between image embeddings and diameter embeddings in the bottom part of the U-Net.
  • Figure 2: Example slices from five volumes in the validation set. The volumes were chosen based on their Dice scores, corresponding to the 5th, 25th, 50th, 75th, and 95th percentiles. For each volume, the slice aligned with an input marker from class 1 was selected.