Table of Contents
Fetching ...

Prompt-Conditioned FiLM and Multi-Scale Fusion on MedSigLIP for Low-Dose CT Quality Assessment

Tolga Demiroglu, Mehmet Ozan Unal, Metin Ertas, Isa Yildirim

TL;DR

Low-Dose CT quality assessment without reference images is addressed by a prompt-conditioned framework built on the MedSigLIP backbone. The method injects textual priors via FiLM and jointly aggregates global, local, and texture cues through dedicated pooling streams, with a pairwise ranking objective guiding learning. Key contributions include (i) a MedSigLIP-based prompt-guided formulation, (ii) FiLM-based text injection with multi-scale pooling, and (iii) a ranking loss tailored to MOS ordering, achieving PLCC 0.9575, SROCC 0.9561, and KROCC 0.8301 on LDCTIQA2023. The approach is data-efficient, adapts quickly by editing prompts, and can serve as a plug-in criterion for LDCT restoration systems in clinical workflows.

Abstract

We propose a prompt-conditioned framework built on MedSigLIP that injects textual priors via Feature-wise Linear Modulation (FiLM) and multi-scale pooling. Text prompts condition patch-token features on clinical intent, enabling data-efficient learning and rapid adaptation. The architecture combines global, local, and texture-aware pooling through separate regression heads fused by a lightweight MLP, trained with pairwise ranking loss. Evaluated on the LDCTIQA2023 (a public LDCT quality assessment challenge) with 1,000 training images, we achieve PLCC = 0.9575, SROCC = 0.9561, and KROCC = 0.8301, surpassing the top-ranked published challenge submissions and demonstrating the effectiveness of our prompt-guided approach.

Prompt-Conditioned FiLM and Multi-Scale Fusion on MedSigLIP for Low-Dose CT Quality Assessment

TL;DR

Low-Dose CT quality assessment without reference images is addressed by a prompt-conditioned framework built on the MedSigLIP backbone. The method injects textual priors via FiLM and jointly aggregates global, local, and texture cues through dedicated pooling streams, with a pairwise ranking objective guiding learning. Key contributions include (i) a MedSigLIP-based prompt-guided formulation, (ii) FiLM-based text injection with multi-scale pooling, and (iii) a ranking loss tailored to MOS ordering, achieving PLCC 0.9575, SROCC 0.9561, and KROCC 0.8301 on LDCTIQA2023. The approach is data-efficient, adapts quickly by editing prompts, and can serve as a plug-in criterion for LDCT restoration systems in clinical workflows.

Abstract

We propose a prompt-conditioned framework built on MedSigLIP that injects textual priors via Feature-wise Linear Modulation (FiLM) and multi-scale pooling. Text prompts condition patch-token features on clinical intent, enabling data-efficient learning and rapid adaptation. The architecture combines global, local, and texture-aware pooling through separate regression heads fused by a lightweight MLP, trained with pairwise ranking loss. Evaluated on the LDCTIQA2023 (a public LDCT quality assessment challenge) with 1,000 training images, we achieve PLCC = 0.9575, SROCC = 0.9561, and KROCC = 0.8301, surpassing the top-ranked published challenge submissions and demonstrating the effectiveness of our prompt-guided approach.

Paper Structure

This paper contains 16 sections, 8 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Prompt-conditioned FiLM with multi-scale (global/local/texture) pooling.