Table of Contents
Fetching ...

Reward Engineering for Generating Semi-structured Explanation

Jiuzhou Han, Wray Buntine, Ehsan Shareghi

TL;DR

The paper tackles the challenge of generating semi-structured explanations (SEG) alongside answers, arguing that supervised fine-tuning alone falls short for producing explicit reasoning graphs. It introduces reward engineering within a reinforcement learning framework, combining a reward model and graph-based metric rewards, and optimizes via PPO to align SEG with ground-truth explanations. Across ExplaGraph and COPA-SSE, the proposed SFT+RL approach achieves new state-of-the-art results, with Graph-BERTScore emerging as a particularly effective metric for guiding learning. The work also provides a detailed analysis of reward-hacking risks, the impact of reward-aggregation settings, and human evaluation insights, highlighting both the potential and challenges of RL for SEG. Overall, the approach offers a principled path to improving the fidelity and interpretability of reasoning graphs in semi-structured explanations, with practical implications for evaluating and deploying SEG-enabled systems.

Abstract

Semi-structured explanation depicts the implicit process of a reasoner with an explicit representation. This explanation highlights how available information in a specific query is utilised and supplemented with information a reasoner produces from its internal weights towards generating an answer. Despite the recent improvements in generative capabilities of language models, producing structured explanations to verify a model's true reasoning capabilities remains a challenge. This issue is particularly pronounced for not-so-large LMs (e.g., FLAN-T5-XXL). In this work, we first underscore the limitations of supervised fine-tuning (SFT) in tackling this challenge, and then introduce a carefully crafted reward engineering method in reinforcement learning (RL) to better address this problem. We investigate multiple reward aggregation methods and provide a detailed discussion which sheds light on the promising potential of RL for future research. Our proposed method on two semi-structured explanation generation benchmarks (ExplaGraph and COPA-SSE) achieves new state-of-the-art results.

Reward Engineering for Generating Semi-structured Explanation

TL;DR

The paper tackles the challenge of generating semi-structured explanations (SEG) alongside answers, arguing that supervised fine-tuning alone falls short for producing explicit reasoning graphs. It introduces reward engineering within a reinforcement learning framework, combining a reward model and graph-based metric rewards, and optimizes via PPO to align SEG with ground-truth explanations. Across ExplaGraph and COPA-SSE, the proposed SFT+RL approach achieves new state-of-the-art results, with Graph-BERTScore emerging as a particularly effective metric for guiding learning. The work also provides a detailed analysis of reward-hacking risks, the impact of reward-aggregation settings, and human evaluation insights, highlighting both the potential and challenges of RL for SEG. Overall, the approach offers a principled path to improving the fidelity and interpretability of reasoning graphs in semi-structured explanations, with practical implications for evaluating and deploying SEG-enabled systems.

Abstract

Semi-structured explanation depicts the implicit process of a reasoner with an explicit representation. This explanation highlights how available information in a specific query is utilised and supplemented with information a reasoner produces from its internal weights towards generating an answer. Despite the recent improvements in generative capabilities of language models, producing structured explanations to verify a model's true reasoning capabilities remains a challenge. This issue is particularly pronounced for not-so-large LMs (e.g., FLAN-T5-XXL). In this work, we first underscore the limitations of supervised fine-tuning (SFT) in tackling this challenge, and then introduce a carefully crafted reward engineering method in reinforcement learning (RL) to better address this problem. We investigate multiple reward aggregation methods and provide a detailed discussion which sheds light on the promising potential of RL for future research. Our proposed method on two semi-structured explanation generation benchmarks (ExplaGraph and COPA-SSE) achieves new state-of-the-art results.
Paper Structure (29 sections, 2 equations, 4 figures, 6 tables)

This paper contains 29 sections, 2 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Given the belief and argument, the task is to predict the stance (support/counter) and generate an explanation graph representing the reasoning process. The explanation graph under SFT+RL is more expressive.
  • Figure 2: Comparison, on ExplaGraph, of SFT and various RL configurations to calculate $R_m$. The KL Coefficient $\beta$ is 0.3 for all experiments. (left) RL using only reward metric, (right) RL using both reward model and metric without any weights.
  • Figure 3: FLAN-T5-XXL - SFT in comparison (on ExplaGraph dev set) with SFT+RL under (a) different values of KL Coefficient $\beta$ (we use the aggregation method without weights), and (b) different values of weight factor $\alpha$ (fixing $\beta=0.3$).
  • Figure 4: An illustration of the mean reward and the kl during RL training on ExplaGraph: (a) as the training continues, the rewards of both settings increase. While in (b) when $\beta$ is 0.1, the large KL indicates significant deviation from the original SFT model, thus leading to a reward hacking phenomenon.