e-Health CSIRO at RRG24: Entropy-Augmented Self-Critical Sequence Training for Radiology Report Generation
Aaron Nicolson, Jinghui Liu, Jason Dowling, Anthony Nguyen, Bevan Koopman
TL;DR
The paper tackles automated radiology report generation from chest X-rays under the RRG24 benchmark. It introduces EAST, an entropy-augmented extension of self-critical sequence training that optimizes a RadGraph-F1 reward, with the objective $L_{EAST}(θ) = L_{SCST}(θ) + λ H(π)$ and $H(π) = -∑_{v∈V} π(v|x; θ) log π(v|x; θ)$ to promote exploration. The CXRMate-RRG24 architecture combines a UniFormer-based image encoder with a Llama decoder and uses special tokens to handle missing sections, trained in two stages (teacher forcing followed by RL) with carefully tuned hyperparameters (e.g., $λ=0.05$). Empirically, EAST achieves robust gains and first-place finishes on several RRG24 test sets, demonstrating improved generalisation across diverse data sources and radiology-report styles; the model is available on HuggingFace for public use and benchmarking.
Abstract
The Shared Task on Large-Scale Radiology Report Generation (RRG24) aims to expedite the development of assistive systems for interpreting and reporting on chest X-ray (CXR) images. This task challenges participants to develop models that generate the findings and impression sections of radiology reports from CXRs from a patient's study, using five different datasets. This paper outlines the e-Health CSIRO team's approach, which achieved multiple first-place finishes in RRG24. The core novelty of our approach lies in the addition of entropy regularisation to self-critical sequence training, to maintain a higher entropy in the token distribution. This prevents overfitting to common phrases and ensures a broader exploration of the vocabulary during training, essential for handling the diversity of the radiology reports in the RRG24 datasets. Our model is available on Hugging Face https://huggingface.co/aehrc/cxrmate-rrg24.
