Table of Contents
Fetching ...

ICLEF: In-Context Learning with Expert Feedback for Explainable Style Transfer

Arkadiy Saakyan, Smaranda Muresan

TL;DR

ICLEF introduces a human-AI collaboration framework that leverages expert feedback to distill explanations for style transfer from large LLMs into smaller, trainable models. By augmenting GYAFC and WNC into e-GYAFC and e-WNC, the approach yields high-quality explainable datasets that enable one-shot-superior student models and competitive few-shot teacher performance, with demonstrated downstream benefits for authorship attribution. An extrinsic evaluation confirms that explanations from smaller models can provide useful signals for author attribution, highlighting practical utility. The work offers data, models, and code to promote explainability, learning from scarce expert feedback, and advances in style transfer research.

Abstract

While state-of-the-art large language models (LLMs) can excel at adapting text from one style to another, current work does not address the explainability of style transfer models. Recent work has explored generating textual explanations from larger teacher models and distilling them into smaller student models. One challenge with such approach is that LLM outputs may contain errors that require expertise to correct, but gathering and incorporating expert feedback is difficult due to cost and availability. To address this challenge, we propose ICLEF, a novel human-AI collaboration approach to model distillation that incorporates scarce expert human feedback by combining in-context learning and model self-critique. We show that our method leads to generation of high-quality synthetic explainable style transfer datasets for formality (e-GYAFC) and subjective bias (e-WNC). Via automatic and human evaluation, we show that specialized student models fine-tuned on our datasets outperform generalist teacher models on the explainable style transfer task in one-shot settings, and perform competitively compared to few-shot teacher models, highlighting the quality of the data and the role of expert feedback. In an extrinsic task of authorship attribution, we show that explanations generated by smaller models fine-tuned on e-GYAFC are more predictive of authorship than explanations generated by few-shot teacher models.

ICLEF: In-Context Learning with Expert Feedback for Explainable Style Transfer

TL;DR

ICLEF introduces a human-AI collaboration framework that leverages expert feedback to distill explanations for style transfer from large LLMs into smaller, trainable models. By augmenting GYAFC and WNC into e-GYAFC and e-WNC, the approach yields high-quality explainable datasets that enable one-shot-superior student models and competitive few-shot teacher performance, with demonstrated downstream benefits for authorship attribution. An extrinsic evaluation confirms that explanations from smaller models can provide useful signals for author attribution, highlighting practical utility. The work offers data, models, and code to promote explainability, learning from scarce expert feedback, and advances in style transfer research.

Abstract

While state-of-the-art large language models (LLMs) can excel at adapting text from one style to another, current work does not address the explainability of style transfer models. Recent work has explored generating textual explanations from larger teacher models and distilling them into smaller student models. One challenge with such approach is that LLM outputs may contain errors that require expertise to correct, but gathering and incorporating expert feedback is difficult due to cost and availability. To address this challenge, we propose ICLEF, a novel human-AI collaboration approach to model distillation that incorporates scarce expert human feedback by combining in-context learning and model self-critique. We show that our method leads to generation of high-quality synthetic explainable style transfer datasets for formality (e-GYAFC) and subjective bias (e-WNC). Via automatic and human evaluation, we show that specialized student models fine-tuned on our datasets outperform generalist teacher models on the explainable style transfer task in one-shot settings, and perform competitively compared to few-shot teacher models, highlighting the quality of the data and the role of expert feedback. In an extrinsic task of authorship attribution, we show that explanations generated by smaller models fine-tuned on e-GYAFC are more predictive of authorship than explanations generated by few-shot teacher models.
Paper Structure (49 sections, 9 figures, 18 tables)

This paper contains 49 sections, 9 figures, 18 tables.

Figures (9)

  • Figure 1: Generating e-GYAFC: formality style transfer dataset GYAFC rao-tetreault-2018-dear is augmented with semi-structured natural language explanations. The LLM generates the informal attributes of the input sentence, a formal paraphrase, and the formal attributes of the resulting sentence. Expert feedback is incorporated via in-context learning and self-critique to refine the initial generations.
  • Figure 2: Generating e-WNC: WNC wnccorpus is augmented with natural language explanations. The LLM generates the bias attributes of the input sentence and an unbiased paraphrase. Expert feedback is incorporated via in-context learning and self-critique to refine the initial generations.
  • Figure 3: Top 10 informal attributes. See top 50 (in)formality attributes in Appendix Figure \ref{['fig:informalDistr']}, \ref{['fig:formalDistr']}).
  • Figure 4: ICLEF performance increases with amount of feedback, reaching satisfactory accuracy at around 35 shots.
  • Figure 5: Distribution of 50 most frequent informal attributes in the e-GYAFC dataset.
  • ...and 4 more figures