Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation

Tao Tang; Shijie Xu; Jionglong Su; Zhixiang Lu

Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation

Tao Tang, Shijie Xu, Jionglong Su, Zhixiang Lu

TL;DR

The paper tackles the generalization gap in medical image segmentation caused by domain-style confounds. It introduces Causal-SAM-LLM, which freezes a Segment Anything Model encoder and adds two causal mechanisms: Linguistic Adversarial Disentanglement (LAD) to purge style-related information from features, and Test-Time Causal Intervention (TCI) where an LLM modulates the decoder via FiLM in response to natural-language prompts. LAD leverages a Vision-Language Model to generate detailed style descriptions and trains a contrastive objective with a CLIP-based embedding to enforce semantic disentanglement. On a composite benchmark spanning cross-scanner, cross-modality, and cross-anatomy shifts (BTCV, CHAOS, AMOS, BraTS), Causal-SAM-LLM achieves state-of-the-art OOD robustness, boosting average Dice by up to 6.2 points and reducing HD by up to 15.8 mm while using under 9% of the full model's trainable parameters, and enabling practical, interactive error correction through language prompts.

Abstract

The clinical utility of deep learning models for medical image segmentation is severely constrained by their inability to generalize to unseen domains. This failure is often rooted in the models learning spurious correlations between anatomical content and domain-specific imaging styles. To overcome this fundamental challenge, we introduce Causal-SAM-LLM, a novel framework that elevates Large Language Models (LLMs) to the role of causal reasoners. Our framework, built upon a frozen Segment Anything Model (SAM) encoder, incorporates two synergistic innovations. First, Linguistic Adversarial Disentanglement (LAD) employs a Vision-Language Model to generate rich, textual descriptions of confounding image styles. By training the segmentation model's features to be contrastively dissimilar to these style descriptions, it learns a representation robustly purged of non-causal information. Second, Test-Time Causal Intervention (TCI) provides an interactive mechanism where an LLM interprets a clinician's natural language command to modulate the segmentation decoder's features in real-time, enabling targeted error correction. We conduct an extensive empirical evaluation on a composite benchmark from four public datasets (BTCV, CHAOS, AMOS, BraTS), assessing generalization under cross-scanner, cross-modality, and cross-anatomy settings. Causal-SAM-LLM establishes a new state of the art in out-of-distribution (OOD) robustness, improving the average Dice score by up to 6.2 points and reducing the Hausdorff Distance by 15.8 mm over the strongest baseline, all while using less than 9% of the full model's trainable parameters. Our work charts a new course for building robust, efficient, and interactively controllable medical AI systems.

Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation

TL;DR

Abstract

Causal-SAM-LLM: Large Language Models as Causal Reasoners for Robust Medical Segmentation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)