Table of Contents
Fetching ...

Radiology Report Generation via Multi-objective Preference Optimization

Ting Xiao, Lei Shi, Peng Liu, Zhe Wang, Chenjia Bai

TL;DR

This work tackles automatic radiology report generation under heterogeneous radiologist preferences by proposing Multi-objective Preference Optimization (MPO). MPO conditions generation on a low-dimensional preference vector $\mathbf{p}$ and optimizes a weighted, multi-dimensional reward $R=\sum_{i=1}^m p_i r^i(Y)$ via reinforcement learning, with a Preference Vector Fusion (PVF) network that fuses $\mathbf{p}$ into encoded image features. The model is trained in two stages (MLE pretraining and RL) with random sampling of diverse $\mathbf{p}$ values to cover the entire preference space, enabling inference-time control without fine-tuning. Experiments on IU-Xray and MIMIC-CXR show state-of-the-art natural language generation metrics and competitive clinical-efficacy scores, demonstrating effective accommodation of varied radiologist preferences within a single model.

Abstract

Automatic Radiology Report Generation (RRG) is an important topic for alleviating the substantial workload of radiologists. Existing RRG approaches rely on supervised regression based on different architectures or additional knowledge injection,while the generated report may not align optimally with radiologists' preferences. Especially, since the preferences of radiologists are inherently heterogeneous and multidimensional, e.g., some may prioritize report fluency, while others emphasize clinical accuracy. To address this problem,we propose a new RRG method via Multi-objective Preference Optimization (MPO) to align the pre-trained RRG model with multiple human preferences, which can be formulated by multi-dimensional reward functions and optimized by multi-objective reinforcement learning (RL). Specifically, we use a preference vector to represent the weight of preferences and use it as a condition for the RRG model. Then, a linearly weighed reward is obtained via a dot product between the preference vector and multi-dimensional reward. Next,the RRG model is optimized to align with the preference vector by optimizing such a reward via RL. In the training stage,we randomly sample diverse preference vectors from the preference space and align the model by optimizing the weighted multi-objective rewards, which leads to an optimal policy on the entire preference space. When inference,our model can generate reports aligned with specific preferences without further fine-tuning. Extensive experiments on two public datasets show the proposed method can generate reports that cater to different preferences in a single model and achieve state-of-the-art performance.

Radiology Report Generation via Multi-objective Preference Optimization

TL;DR

This work tackles automatic radiology report generation under heterogeneous radiologist preferences by proposing Multi-objective Preference Optimization (MPO). MPO conditions generation on a low-dimensional preference vector and optimizes a weighted, multi-dimensional reward via reinforcement learning, with a Preference Vector Fusion (PVF) network that fuses into encoded image features. The model is trained in two stages (MLE pretraining and RL) with random sampling of diverse values to cover the entire preference space, enabling inference-time control without fine-tuning. Experiments on IU-Xray and MIMIC-CXR show state-of-the-art natural language generation metrics and competitive clinical-efficacy scores, demonstrating effective accommodation of varied radiologist preferences within a single model.

Abstract

Automatic Radiology Report Generation (RRG) is an important topic for alleviating the substantial workload of radiologists. Existing RRG approaches rely on supervised regression based on different architectures or additional knowledge injection,while the generated report may not align optimally with radiologists' preferences. Especially, since the preferences of radiologists are inherently heterogeneous and multidimensional, e.g., some may prioritize report fluency, while others emphasize clinical accuracy. To address this problem,we propose a new RRG method via Multi-objective Preference Optimization (MPO) to align the pre-trained RRG model with multiple human preferences, which can be formulated by multi-dimensional reward functions and optimized by multi-objective reinforcement learning (RL). Specifically, we use a preference vector to represent the weight of preferences and use it as a condition for the RRG model. Then, a linearly weighed reward is obtained via a dot product between the preference vector and multi-dimensional reward. Next,the RRG model is optimized to align with the preference vector by optimizing such a reward via RL. In the training stage,we randomly sample diverse preference vectors from the preference space and align the model by optimizing the weighted multi-objective rewards, which leads to an optimal policy on the entire preference space. When inference,our model can generate reports aligned with specific preferences without further fine-tuning. Extensive experiments on two public datasets show the proposed method can generate reports that cater to different preferences in a single model and achieve state-of-the-art performance.

Paper Structure

This paper contains 21 sections, 14 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: The architecture of our MPO, where the blue dashed box represents the encoder-decoder with the PVF network, and the green dashed box represents the MOO module. The red dashed arrows represent the back-propagation of gradients.
  • Figure 2: Reports from ground truth and MPO with different preference configurations on MIMIC-CXR, where the same color highlights the descriptions of the same content.