Chain-of-MetaWriting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts
Ioana Buhnila, Georgeta Cislaru, Amalia Todirascu
TL;DR
The paper tackles the gap where large and small language models lack a meta-representational writing process, proposing Chain-of-MetaWriting (CoMW) to guide models through planning, revision, and evaluation in education-focused French tasks. It conducts a cross-lingual evaluation of open-source 3B SLMs and ChatGPT-4o using a rich dataset that includes keystroke logging to capture the actual writing process. Findings show that while SLMs can imitate high-level writing steps, they struggle with audience-appropriate vocabulary, narrative authenticity, and handling sensitive topics like school violence; COMW can elicit more structured outputs but does not fully replicate human writing dynamics. The work highlights practical implications for safe, educational AI use and points to avenues for improving meta-writing capabilities, including data design, content structuring, and cross-lingual robustness.
Abstract
Large Language Models (LLMs) have been used to generate texts in response to different writing tasks: reports, essays, story telling. However, language models do not have a meta-representation of the text writing process, nor inherent communication learning needs, comparable to those of young human students. This paper introduces a fine-grained linguistic and textual analysis of multilingual Small Language Models' (SLMs) writing. With our method, Chain-of-MetaWriting, SLMs can imitate some steps of the human writing process, such as planning and evaluation. We mainly focused on short story and essay writing tasks in French for schoolchildren and undergraduate students respectively. Our results show that SLMs encounter difficulties in assisting young students on sensitive topics such as violence in the schoolyard, and they sometimes use words too complex for the target audience. In particular, the output is quite different from the human produced texts in term of text cohesion and coherence regarding temporal connectors, topic progression, reference.
