Learning to Substitute Words with Model-based Score Ranking
Hongye Liu, Ricardo Henao
TL;DR
This work tackles smart word substitution (SWS) without relying on costly human annotations by leveraging model-based sentence scoring (BARTScore) to guide both generation and evaluation of substitutions. It formalizes token substitution with a probabilistic model, introduces a null-score distribution to assess substitution quality, and develops a preference-aware learning framework that aligns model predictions with model-based quality scores via ranking losses and score-improvement objectives, including Direct Preference Optimization variants. Empirically, the proposed MR+AS approach outperforms masked language models and large language models across multiple datasets in terms of both substitution quality (as measured by BARTScore-based metrics) and the statistical significance of scores, while requiring far fewer labeled resources. The method offers a cost-effective, scalable path to high-quality word substitutions and demonstrates promising applicability to real-world writing and translation tasks, with careful attention to the limitations of automated scoring and potential biases. Overall, the paper contributes a novel, annotation-free paradigm for SWS that closes the gap between model-generated substitutions and human judgments through principled, score-driven optimization.
Abstract
Smart word substitution aims to enhance sentence quality by improving word choices; however current benchmarks rely on human-labeled data. Since word choices are inherently subjective, ground-truth word substitutions generated by a small group of annotators are often incomplete and likely not generalizable. To circumvent this issue, we instead employ a model-based score (BARTScore) to quantify sentence quality, thus forgoing the need for human annotations. Specifically, we use this score to define a distribution for each word substitution, allowing one to test whether a substitution is statistically superior relative to others. In addition, we propose a loss function that directly optimizes the alignment between model predictions and sentence scores, while also enhancing the overall quality score of a substitution. Crucially, model learning no longer requires human labels, thus avoiding the cost of annotation while maintaining the quality of the text modified with substitutions. Experimental results show that the proposed approach outperforms both masked language models (BERT, BART) and large language models (GPT-4, LLaMA). The source code is available at https://github.com/Hyfred/Substitute-Words-with-Ranking.
