LINGUAL: Language-INtegrated GUidance in Active Learning for Medical Image Segmentation
Md Shazid Islam, Shreyangshu Bera, Sudipta Paul, Amit K. Roy-Chowdhury
TL;DR
LINGUAL addresses the heavy labeling burden in medical image segmentation within active learning by replacing dense polygonal delineation with language-guided, autonomous refinement. It translates expert natural language feedback into executable boundary refinement programs via an in-context learning Program Generator and Executor, enabling iterative corrections without manual pixel-level annotation. In ADA experiments on CHAOS MRI and BUSI ultrasound, LINGUAL achieves competitive Dice scores compared to patch-based AL and surpasses superpixel-based AL while reducing annotation time by roughly 80%. This highlights a scalable, language-driven paradigm for efficient human-AI collaboration in medical image annotation.
Abstract
Although active learning (AL) in segmentation tasks enables experts to annotate selected regions of interest (ROIs) instead of entire images, it remains highly challenging, labor-intensive, and cognitively demanding due to the blurry and ambiguous boundaries commonly observed in medical images. Also, in conventional AL, annotation effort is a function of the ROI- larger regions make the task cognitively easier but incur higher annotation costs, whereas smaller regions demand finer precision and more attention from the expert. In this context, language guidance provides an effective alternative, requiring minimal expert effort while bypassing the cognitively demanding task of precise boundary delineation in segmentation. Towards this goal, we introduce LINGUAL: a framework that receives natural language instructions from an expert, translates them into executable programs through in-context learning, and automatically performs the corresponding sequence of sub-tasks without any human intervention. We demonstrate the effectiveness of LINGUAL in active domain adaptation (ADA) achieving comparable or superior performance to AL baselines while reducing estimated annotation time by approximately 80%.
