Table of Contents
Fetching ...

LINGUAL: Language-INtegrated GUidance in Active Learning for Medical Image Segmentation

Md Shazid Islam, Shreyangshu Bera, Sudipta Paul, Amit K. Roy-Chowdhury

TL;DR

LINGUAL addresses the heavy labeling burden in medical image segmentation within active learning by replacing dense polygonal delineation with language-guided, autonomous refinement. It translates expert natural language feedback into executable boundary refinement programs via an in-context learning Program Generator and Executor, enabling iterative corrections without manual pixel-level annotation. In ADA experiments on CHAOS MRI and BUSI ultrasound, LINGUAL achieves competitive Dice scores compared to patch-based AL and surpasses superpixel-based AL while reducing annotation time by roughly 80%. This highlights a scalable, language-driven paradigm for efficient human-AI collaboration in medical image annotation.

Abstract

Although active learning (AL) in segmentation tasks enables experts to annotate selected regions of interest (ROIs) instead of entire images, it remains highly challenging, labor-intensive, and cognitively demanding due to the blurry and ambiguous boundaries commonly observed in medical images. Also, in conventional AL, annotation effort is a function of the ROI- larger regions make the task cognitively easier but incur higher annotation costs, whereas smaller regions demand finer precision and more attention from the expert. In this context, language guidance provides an effective alternative, requiring minimal expert effort while bypassing the cognitively demanding task of precise boundary delineation in segmentation. Towards this goal, we introduce LINGUAL: a framework that receives natural language instructions from an expert, translates them into executable programs through in-context learning, and automatically performs the corresponding sequence of sub-tasks without any human intervention. We demonstrate the effectiveness of LINGUAL in active domain adaptation (ADA) achieving comparable or superior performance to AL baselines while reducing estimated annotation time by approximately 80%.

LINGUAL: Language-INtegrated GUidance in Active Learning for Medical Image Segmentation

TL;DR

LINGUAL addresses the heavy labeling burden in medical image segmentation within active learning by replacing dense polygonal delineation with language-guided, autonomous refinement. It translates expert natural language feedback into executable boundary refinement programs via an in-context learning Program Generator and Executor, enabling iterative corrections without manual pixel-level annotation. In ADA experiments on CHAOS MRI and BUSI ultrasound, LINGUAL achieves competitive Dice scores compared to patch-based AL and surpasses superpixel-based AL while reducing annotation time by roughly 80%. This highlights a scalable, language-driven paradigm for efficient human-AI collaboration in medical image annotation.

Abstract

Although active learning (AL) in segmentation tasks enables experts to annotate selected regions of interest (ROIs) instead of entire images, it remains highly challenging, labor-intensive, and cognitively demanding due to the blurry and ambiguous boundaries commonly observed in medical images. Also, in conventional AL, annotation effort is a function of the ROI- larger regions make the task cognitively easier but incur higher annotation costs, whereas smaller regions demand finer precision and more attention from the expert. In this context, language guidance provides an effective alternative, requiring minimal expert effort while bypassing the cognitively demanding task of precise boundary delineation in segmentation. Towards this goal, we introduce LINGUAL: a framework that receives natural language instructions from an expert, translates them into executable programs through in-context learning, and automatically performs the corresponding sequence of sub-tasks without any human intervention. We demonstrate the effectiveness of LINGUAL in active domain adaptation (ADA) achieving comparable or superior performance to AL baselines while reducing estimated annotation time by approximately 80%.

Paper Structure

This paper contains 15 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: In both (a) and (b), $P_{init}$ denotes the initial prediction map, where two regions of interest (ROIs) are selected for expert review. Green dotted line indicates ground truth boundary (a) In conventional AL, the expert manually annotates the ROIs through polygonal delineation to obtain refined segmentation $Y_{AL}$. (b) In contrast, LINGUAL enables the expert to provide high-level language feedback which is executed autonomously to achieve equivalent corrections as in (a) , thereby reducing manual effort.
  • Figure 2: (a) On the initial prediction map, 2 ROIs have been chosen (Red box - smaller, Blue box -larger). Conventional AL methods would need higher manual effort to annotate the Blue ROI. However, LINGUAL enables same language instruction ("Expand to Bottom-Right") to perform required corrective operation irrespective of ROI size, keeping human effort unchanged. (b) Area inside yellow contour indicates a super-pixel containing regions both from inside and outside of ground truth (GT) boundary. Annotating this entire super-pixel as foreground leads to false positive error. In contrast, Executing "Expand to Bottom-Right" command through LINGUAL reduces annotation error by fitting updated boundary more precisely with the GT boundary.
  • Figure 3: Workflow Overview: (Figure 2a) A pretrained source model $f_{\theta}$ is adapted to the target domain through active domain adaptation (ADA). Given the training data in target domain $X_{\text{train}}$, $f_{\theta}$ produces an initial prediction map $P_{\text{init}}$. Through acquisition (Figure 2b) a budget number of regions $\{R_i\}_{i=1}^b$ are sampled where the expert provides language feedback. Each region $R_i$ is refined using a refinement block (Figure 2c), composed of a Program Generator and Program Executor. The Program Generator translates $L_i$ into a program (sequence of operations), which the Program Executor applies to produce the refined patch $R'_i$. Substituting $\{R'_i\}_{i=1}^b$ for $\{R_i\}_{i=1}^b$ yields the refined segmentation map $Y_{AL}$. Subsequently, $f_{\theta}$ is updated using the loss computed between $P_{\text{init}}$ and $Y_{\text{AL}}$. After convergence, the adapted and frozen $f_{\theta}$ is deployed for inference.
  • Figure 4: Segmentation refinement by LINGUAL: (a) Program Generator : We show initial prediction map $P_{init}$ (red overlay on input image) along with the ROI (yellow box) and the Ground Truth (GT) boundary (green dotted line) inside ROI . The GT boundary is shown only for illustration, the expert does not use that. In order to refine segmentation, the expert provides natural language instructions (target example) which are translated into program sequences using in-context learning capability of GPT-3.5 with manually crafted in-context examples. (b) Program Executor: Each step of generated program (purple text box) is shown along with its output image and output variable name (orange text box), and its interpretation (blue text box). We note that the input image is used during both the EXPAND and SHRINK operations. For simplicity, it is not passed explicitly as a function argument, since the program already has access to it internally.
  • Figure 5: Illustration of each step involved in the EXPAND and SHRINK operations: The corresponding language commands are "EXPAND to Right" and "SHRINK at Top-Right". The process is depicted as follows: (a) Initial prediction map (red) overlaid on the input image. (b) The yellow bounding box (ROI) obtained by AQF. Inside the ROI, true positive (orange), false positive (blue), and false negative regions (green) are shown. (c) Boundary of prediction map along with the direction of operation shown by red arrow. (d) Sampled points from the inside ($P_{\text{in}}$) and outside ($P_{\text{out}}$) of boundary within ROI are shown as blue and yellow dots, respectively. (e) $P_{\text{in}}$ and $P_{\text{out}}$ are also highlighted on the input image. (f) $A_{\text{in}}$ (pink) and $A_{\text{out}}$ (violet) are highlighted.$A_{\text{in}} \cap$$A_{\text{out}}$ shown by brown color. (g) Updated prediction after the first iteration using Equations \ref{['expand_update']} and \ref{['shrink_update']}. (h) Results after multiple iterations. (i) Plot of $\eta_{t}$ versus iteration number $t$; the minimum point on the plot indicates the best output. (j) Final refined prediction corresponding to the optimal iteration, showing notable improvement over the initial state in (b).