Linguistically-Controlled Paraphrase Generation
Mohamed Elgaar, Hadi Amiri
TL;DR
This work tackles controlled paraphrase generation by introducing LingConv, an encoder-decoder framework that supports fine-grained control over 40 linguistic attributes. It couples a linguistic attribute predictor and a semantic equivalence classifier with a novel inference-time quality-control loop that iteratively refines outputs to align with target attributes while preserving meaning, via decoder-side attribute injection and MICE-based imputation. Empirical results show LingConv outperforms strong baselines in attribute adherence (up to 34% improvement) with an additional 14% gain from quality control, and demonstrates practical utility in data augmentation for downstream tasks. The paper also introduces a Novel Target Challenge to test adaptability to unseen attribute combinations and discusses ethical considerations and potential extensions to multilingual settings.
Abstract
Controlled paraphrase generation produces paraphrases that preserve meaning while allowing precise control over linguistic attributes of the output. We introduce LingConv, an encoder-decoder framework that enables fine-grained control over 40 linguistic attributes in English. To improve reliability, we introduce a novel inference-time quality control mechanism that iteratively refines attribute embeddings to generate paraphrases that closely match target attributes without sacrificing semantic fidelity. LingConv reduces attribute error by up to 34% over existing models, with the quality control mechanism contributing an additional 14% improvement.
