ChEX: Interactive Localization and Region Description in Chest X-rays
Philip Müller, Georgios Kaissis, Daniel Rueckert
TL;DR
<3-5 sentence high-level summary>ChEX addresses the lack of interactivity and localized interpretability in chest X-ray report generation by introducing a multitask architecture that jointly handles textual prompts and bounding boxes. The model integrates a ViT-based image encoder, a frozen CLIP-based prompt encoder, a DETR-style prompt detector, and a GPT-2 language model with P-tuning v2 to produce region-specific descriptions, scalable to zero-shot inference. Trained on multi-source data (MIMIC-CXR, VinDr-CXR, CIG, NIH8, MS-CXR), ChEX is evaluated across nine tasks, achieving competitive performance with state-of-the-art baselines while offering strong interactive prompting and interpretable, region-grounded outputs. These capabilities advance clinical applicability by enabling radiologist-guided, transparent, and customizable chest X-ray interpretation pipelines.
Abstract
Report generation models offer fine-grained textual interpretations of medical images like chest X-rays, yet they often lack interactivity (i.e. the ability to steer the generation process through user queries) and localized interpretability (i.e. visually grounding their predictions), which we deem essential for future adoption in clinical practice. While there have been efforts to tackle these issues, they are either limited in their interactivity by not supporting textual queries or fail to also offer localized interpretability. Therefore, we propose a novel multitask architecture and training paradigm integrating textual prompts and bounding boxes for diverse aspects like anatomical regions and pathologies. We call this approach the Chest X-Ray Explainer (ChEX). Evaluations across a heterogeneous set of 9 chest X-ray tasks, including localized image interpretation and report generation, showcase its competitiveness with SOTA models while additional analysis demonstrates ChEX's interactive capabilities. Code: https://github.com/philip-mueller/chex
