Table of Contents
Fetching ...

Controlling Language Difficulty in Dialogues with Linguistic Features

Shuyao Xu, Wenguang Wang, Handong Gao, Wei Kang, Long Qin, Weizhi Wang

TL;DR

This work proposes a framework for controlling language proficiency in educational dialogue systems and demonstrates that training LLMs on linguistically annotated dialogue data enables precise modulation of language proficiency, outperforming prompt-based methods in both flexibility and stability.

Abstract

Large language models (LLMs) have emerged as powerful tools for supporting second language acquisition, particularly in simulating interactive dialogues for speaking practice. However, adapting the language difficulty of LLM-generated responses to match learners' proficiency levels remains a challenge. This work addresses this issue by proposing a framework for controlling language proficiency in educational dialogue systems. Our approach leverages three categories of linguistic features, readability features (e.g., Flesch-Kincaid Grade Level), syntactic features (e.g., syntactic tree depth), and lexical features (e.g., simple word ratio), to quantify and regulate text complexity. We demonstrate that training LLMs on linguistically annotated dialogue data enables precise modulation of language proficiency, outperforming prompt-based methods in both flexibility and stability. To evaluate this, we introduce Dilaprix, a novel metric integrating the aforementioned features, which shows strong correlation with expert judgments of language difficulty. Empirical results reveal that our approach achieves superior controllability of language proficiency while maintaining high dialogue quality.

Controlling Language Difficulty in Dialogues with Linguistic Features

TL;DR

This work proposes a framework for controlling language proficiency in educational dialogue systems and demonstrates that training LLMs on linguistically annotated dialogue data enables precise modulation of language proficiency, outperforming prompt-based methods in both flexibility and stability.

Abstract

Large language models (LLMs) have emerged as powerful tools for supporting second language acquisition, particularly in simulating interactive dialogues for speaking practice. However, adapting the language difficulty of LLM-generated responses to match learners' proficiency levels remains a challenge. This work addresses this issue by proposing a framework for controlling language proficiency in educational dialogue systems. Our approach leverages three categories of linguistic features, readability features (e.g., Flesch-Kincaid Grade Level), syntactic features (e.g., syntactic tree depth), and lexical features (e.g., simple word ratio), to quantify and regulate text complexity. We demonstrate that training LLMs on linguistically annotated dialogue data enables precise modulation of language proficiency, outperforming prompt-based methods in both flexibility and stability. To evaluate this, we introduce Dilaprix, a novel metric integrating the aforementioned features, which shows strong correlation with expert judgments of language difficulty. Empirical results reveal that our approach achieves superior controllability of language proficiency while maintaining high dialogue quality.

Paper Structure

This paper contains 27 sections, 2 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Dialogue under different language proficiency.
  • Figure 2: A data example of language proficiency control in textbook dialogues. The LLMs are tasked with continuing a conversation based on a given context while completing the specified dialogue task. Additionally, the generated responses are expected to adhere as closely as possible to predefined linguistic feature constraints.
  • Figure 3: Dialogue quality (RSR) achieved versus language proficiency (Dilaprix). Comparisons are presented for (a) LLAMA and (b) QWEN architectures, illustrating performance across the controllable complexity range.
  • Figure 4: Comparison of linguistic feature controllability across models for varying target language proficiency ($t$). Each subfigure presents the distribution of 11 linguistic features at different target complexity levels: $t = 0.0, 0.2, 0.4, 0.6, 0.8, 1.0$. The concentric circles represent increasing values of each feature. Colors indicate the actual values achieved by the model for each feature under the specified target configuration. A greater deviation of the colored line from the corresponding circle indicates weaker control over that feature.
  • Figure 5: An example of textbook dialogue topic.
  • ...and 4 more figures