Table of Contents
Fetching ...

Multi-Dimensional Optimization for Text Summarization via Reinforcement Learning

Sangwon Ryu, Heejin Do, Yunsu Kim, Gary Geunbae Lee, Jungseul Ok

TL;DR

Balancing multiple quality dimensions in summarization is tackled by introducing a multi-objective reinforcement learning framework that uses UniEval-based rewards. The paper presents two optimization strategies, MDO_min and MDO_pro, to adapt learning across dimensions and to mitigate conflicting gradients. It replaces ROUGE-based rewards with a QA-based reward model and shows that the proposed methods achieve substantial gains on BillSum and CNN/DM, including improved relevance and consistency and shorter, information-rich outputs. It also demonstrates that the summary length can be controlled via the discount factor $\gamma$, enabling concise but informative summaries. The approach yields performance competitive with larger LLMs while using smaller models, suggesting practical benefits for scalable, balanced summarization.

Abstract

The evaluation of summary quality encompasses diverse dimensions such as consistency, coherence, relevance, and fluency. However, existing summarization methods often target a specific dimension, facing challenges in generating well-balanced summaries across multiple dimensions. In this paper, we propose multi-objective reinforcement learning tailored to generate balanced summaries across all four dimensions. We introduce two multi-dimensional optimization (MDO) strategies for adaptive learning: 1) MDO_min, rewarding the current lowest dimension score, and 2) MDO_pro, optimizing multiple dimensions similar to multi-task learning, resolves conflicting gradients across dimensions through gradient projection. Unlike prior ROUGE-based rewards relying on reference summaries, we use a QA-based reward model that aligns with human preferences. Further, we discover the capability to regulate the length of summaries by adjusting the discount factor, seeking the generation of concise yet informative summaries that encapsulate crucial points. Our approach achieved substantial performance gains compared to baseline models on representative summarization datasets, particularly in the overlooked dimensions.

Multi-Dimensional Optimization for Text Summarization via Reinforcement Learning

TL;DR

Balancing multiple quality dimensions in summarization is tackled by introducing a multi-objective reinforcement learning framework that uses UniEval-based rewards. The paper presents two optimization strategies, MDO_min and MDO_pro, to adapt learning across dimensions and to mitigate conflicting gradients. It replaces ROUGE-based rewards with a QA-based reward model and shows that the proposed methods achieve substantial gains on BillSum and CNN/DM, including improved relevance and consistency and shorter, information-rich outputs. It also demonstrates that the summary length can be controlled via the discount factor , enabling concise but informative summaries. The approach yields performance competitive with larger LLMs while using smaller models, suggesting practical benefits for scalable, balanced summarization.

Abstract

The evaluation of summary quality encompasses diverse dimensions such as consistency, coherence, relevance, and fluency. However, existing summarization methods often target a specific dimension, facing challenges in generating well-balanced summaries across multiple dimensions. In this paper, we propose multi-objective reinforcement learning tailored to generate balanced summaries across all four dimensions. We introduce two multi-dimensional optimization (MDO) strategies for adaptive learning: 1) MDO_min, rewarding the current lowest dimension score, and 2) MDO_pro, optimizing multiple dimensions similar to multi-task learning, resolves conflicting gradients across dimensions through gradient projection. Unlike prior ROUGE-based rewards relying on reference summaries, we use a QA-based reward model that aligns with human preferences. Further, we discover the capability to regulate the length of summaries by adjusting the discount factor, seeking the generation of concise yet informative summaries that encapsulate crucial points. Our approach achieved substantial performance gains compared to baseline models on representative summarization datasets, particularly in the overlooked dimensions.
Paper Structure (34 sections, 2 equations, 8 figures, 8 tables, 2 algorithms)

This paper contains 34 sections, 2 equations, 8 figures, 8 tables, 2 algorithms.

Figures (8)

  • Figure 1: While the baseline model produces an imbalanced summary ($\color{blue!45}\mdblksquare$), we aim to generate overall high-quality summaries ($\color{red!45}\mdblksquare$). The radar chart illustrates UniEval scores for four dimensions.
  • Figure 2: Entire process of Multi-dimensional Optimization (MDO). Through MDO, we optimize the scores for each dimension while training the policy. $d1$, $d2$, $d3$, and $d4$ refer to coherence, consistency, fluency, and relevance, respectively.
  • Figure 3: Examples of the generated summaries by the baseline model and our MDO$_\text{pro}$ on the same document. Unimportant contents are highlighted in yellow, and unnatural or structurally disruptive ones are marked in green.
  • Figure 4: Multi-dimensional evaluation results with ChatGPT on the BillSum.
  • Figure 5: Human preferences for each model. Rank 1 signifies the most preferred summary among the evaluated summaries.
  • ...and 3 more figures