Table of Contents
Fetching ...

Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning

Haining Wang, Jason Clark, Hannah McKelvey, Leila Sterman, Zheng Gao, Zuoyu Tian, Sandra Kübler, Xiaozhong Liu

TL;DR

The paper tackles the barrier of accessibility in scholarly abstracts by introducing RLAM, a PPO-based reinforcement learning framework that rewrites abstracts using balanced word-level and sentence-level accessibility rewards. It leverages the SASS corpus to train and evaluate the approach, achieving approximately a six-grade-level readability improvement while maintaining semantic fidelity, as shown by metrics such as BERTScore and SARI. Token-distribution analyses reveal systematic shifts in generation behavior under reinforcement learning, with RLAM producing more common, shorter words and meaningful lexical substitutions without sacrificing quality. The work demonstrates a practical path toward bridging open science with a broader audience and suggests avenues for extending the approach to full-text summaries and other domains.

Abstract

A vast amount of scholarly work is published daily, yet much of it remains inaccessible to the general public due to dense jargon and complex language. To address this challenge in science communication, we introduce a reinforcement learning framework that fine-tunes a language model to rewrite scholarly abstracts into more comprehensible versions. Guided by a carefully balanced combination of word- and sentence-level accessibility rewards, our language model effectively substitutes technical terms with more accessible alternatives, a task which models supervised fine-tuned or guided by conventional readability measures struggle to accomplish. Our best model adjusts the readability level of scholarly abstracts by approximately six U.S. grade levels -- in other words, from a postgraduate to a high school level. This translates to roughly a 90% relative boost over the supervised fine-tuning baseline, all while maintaining factual accuracy and high-quality language. An in-depth analysis of our approach shows that balanced rewards lead to systematic modifications in the base model, likely contributing to smoother optimization and superior performance. We envision this work as a step toward bridging the gap between scholarly research and the general public, particularly younger readers and those without a college degree.

Science Out of Its Ivory Tower: Improving Accessibility with Reinforcement Learning

TL;DR

The paper tackles the barrier of accessibility in scholarly abstracts by introducing RLAM, a PPO-based reinforcement learning framework that rewrites abstracts using balanced word-level and sentence-level accessibility rewards. It leverages the SASS corpus to train and evaluate the approach, achieving approximately a six-grade-level readability improvement while maintaining semantic fidelity, as shown by metrics such as BERTScore and SARI. Token-distribution analyses reveal systematic shifts in generation behavior under reinforcement learning, with RLAM producing more common, shorter words and meaningful lexical substitutions without sacrificing quality. The work demonstrates a practical path toward bridging open science with a broader audience and suggests avenues for extending the approach to full-text summaries and other domains.

Abstract

A vast amount of scholarly work is published daily, yet much of it remains inaccessible to the general public due to dense jargon and complex language. To address this challenge in science communication, we introduce a reinforcement learning framework that fine-tunes a language model to rewrite scholarly abstracts into more comprehensible versions. Guided by a carefully balanced combination of word- and sentence-level accessibility rewards, our language model effectively substitutes technical terms with more accessible alternatives, a task which models supervised fine-tuned or guided by conventional readability measures struggle to accomplish. Our best model adjusts the readability level of scholarly abstracts by approximately six U.S. grade levels -- in other words, from a postgraduate to a high school level. This translates to roughly a 90% relative boost over the supervised fine-tuning baseline, all while maintaining factual accuracy and high-quality language. An in-depth analysis of our approach shows that balanced rewards lead to systematic modifications in the base model, likely contributing to smoother optimization and superior performance. We envision this work as a step toward bridging the gap between scholarly research and the general public, particularly younger readers and those without a college degree.

Paper Structure

This paper contains 28 sections, 4 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: RLAM Training Workflow: RLAM rewrites scholarly abstracts using PPO (Section \ref{['sec: RLAM']}), guided by a reward function that balances average sentence length and word accessibility. The policy model is optimized iteratively through an actor-critic framework: it generates simplified abstracts, whose quality is assessed by the reward function and regularized through KL divergence from the frozen reference model (the supervised fine-tuning model). The reward signal is further contrasted with expected returns estimated by the value model, implemented as a linear head atop the policy model. The resulting advantage is distributed across tokens via Generalized Advantage Estimation (GAE). RLAM enables the lightweight Gemma-2B model to reduce abstracts from postgraduate to high school readability, achieving a 90% improvement over the supervised baseline.
  • Figure 2: Discipline and readability distributions of abstracts and significance statements found in the training set of the Scientific Abstract-Significance Statement corpus. The count of paired samples in different disciplines is shown in blue bars on a log10 scale (disciplines with fewer than ten samples are not shown). Readability is measured using the Automated Readability Index (ARI), which estimates the number of years of schooling required to understand a text. On average, abstracts have a readability slightly below 20 ARI, indicating a post-graduate level. Significance statements are generally more readable than their corresponding abstracts. Orange arrows indicate the change in readability from abstracts to significance statements.
  • Figure 3: Results from the annotation of 5% of the generated outputs from reinforcement learning models guided by ARI (RLARI) and accessibility measures (RLAM, with $\beta_{\text{WA}} = 4.0$ and varying $\beta_{\text{SL}}$ values), assessing language quality, faithfulness, and completeness.
  • Figure 4: Token distribution shift analysis for reinforcement learning models. The figure illustrates the average distribution of marginal and shifted tokens across RLAM models (top row) and RLARI models (bottom row). The left column represents the proportion of marginal tokens, while the right column shows the proportion of shifted tokens, both relative to the total token count at each position in the generated sequences.