Table of Contents
Fetching ...

e-Health CSIRO at RRG24: Entropy-Augmented Self-Critical Sequence Training for Radiology Report Generation

Aaron Nicolson, Jinghui Liu, Jason Dowling, Anthony Nguyen, Bevan Koopman

TL;DR

The paper tackles automated radiology report generation from chest X-rays under the RRG24 benchmark. It introduces EAST, an entropy-augmented extension of self-critical sequence training that optimizes a RadGraph-F1 reward, with the objective $L_{EAST}(θ) = L_{SCST}(θ) + λ H(π)$ and $H(π) = -∑_{v∈V} π(v|x; θ) log π(v|x; θ)$ to promote exploration. The CXRMate-RRG24 architecture combines a UniFormer-based image encoder with a Llama decoder and uses special tokens to handle missing sections, trained in two stages (teacher forcing followed by RL) with carefully tuned hyperparameters (e.g., $λ=0.05$). Empirically, EAST achieves robust gains and first-place finishes on several RRG24 test sets, demonstrating improved generalisation across diverse data sources and radiology-report styles; the model is available on HuggingFace for public use and benchmarking.

Abstract

The Shared Task on Large-Scale Radiology Report Generation (RRG24) aims to expedite the development of assistive systems for interpreting and reporting on chest X-ray (CXR) images. This task challenges participants to develop models that generate the findings and impression sections of radiology reports from CXRs from a patient's study, using five different datasets. This paper outlines the e-Health CSIRO team's approach, which achieved multiple first-place finishes in RRG24. The core novelty of our approach lies in the addition of entropy regularisation to self-critical sequence training, to maintain a higher entropy in the token distribution. This prevents overfitting to common phrases and ensures a broader exploration of the vocabulary during training, essential for handling the diversity of the radiology reports in the RRG24 datasets. Our model is available on Hugging Face https://huggingface.co/aehrc/cxrmate-rrg24.

e-Health CSIRO at RRG24: Entropy-Augmented Self-Critical Sequence Training for Radiology Report Generation

TL;DR

The paper tackles automated radiology report generation from chest X-rays under the RRG24 benchmark. It introduces EAST, an entropy-augmented extension of self-critical sequence training that optimizes a RadGraph-F1 reward, with the objective and to promote exploration. The CXRMate-RRG24 architecture combines a UniFormer-based image encoder with a Llama decoder and uses special tokens to handle missing sections, trained in two stages (teacher forcing followed by RL) with carefully tuned hyperparameters (e.g., ). Empirically, EAST achieves robust gains and first-place finishes on several RRG24 test sets, demonstrating improved generalisation across diverse data sources and radiology-report styles; the model is available on HuggingFace for public use and benchmarking.

Abstract

The Shared Task on Large-Scale Radiology Report Generation (RRG24) aims to expedite the development of assistive systems for interpreting and reporting on chest X-ray (CXR) images. This task challenges participants to develop models that generate the findings and impression sections of radiology reports from CXRs from a patient's study, using five different datasets. This paper outlines the e-Health CSIRO team's approach, which achieved multiple first-place finishes in RRG24. The core novelty of our approach lies in the addition of entropy regularisation to self-critical sequence training, to maintain a higher entropy in the token distribution. This prevents overfitting to common phrases and ensures a broader exploration of the vocabulary during training, essential for handling the diversity of the radiology reports in the RRG24 datasets. Our model is available on Hugging Face https://huggingface.co/aehrc/cxrmate-rrg24.
Paper Structure (8 sections, 3 equations, 1 figure, 2 tables)

This paper contains 8 sections, 3 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: e-Health CSIRO's submission into RRG24, named CXRMate-RRG24. [BOS] denotes the beginning-of-sentence special token, [SEP] denotes the separator special token, and [EOS] denotes the end-of-sentence special token. $\textbf{E}_k[i]$ is the $i^{th}$ output of the projected last hidden state of the encoder for the $k^{th}$ image of the study.