Syntax-Guided Diffusion Language Models with User-Integrated Personalization
Ruqian Zhang, Yijiao Zhang, Juan Shen, Zhongyi Zhu, Annie Qu
TL;DR
This work tackles the tendency of large language models to produce stylistically uniform text by introducing syntax-guided diffusion with user-integrated personalization. It presents a cascaded STDiff pipeline where syntax generation informs text realization, and a noncascaded variant with overlapped diffusion for bidirectional refinement, both augmented with a shared personality space for fine-grained personalization. The approach demonstrates improved fluency, diversity, and stylistic fidelity across sentiment and emotion tasks, with strong zero-shot generalization through interpolated personality weights. The results suggest that explicit syntactic structure and cross-style information sharing can enhance controllability and interpretability in diffusion-based language generation, offering practical impact for personalized text synthesis and style-aware NLP applications.
Abstract
Large language models have made revolutionary progress in generating human-like text, yet their outputs often tend to be generic, exhibiting insufficient structural diversity, which limits personalized expression. Recent advances in diffusion models have opened new opportunities for improving language generation beyond the limitations of autoregressive paradigms. In this work, we propose a syntax-guided diffusion language model that integrates structural supervision and personalized conditioning to enhance text quality, diversity, and controllability. We introduce a cascaded framework that generates syntactic guidance before conditional text generation, and further generalize it to a novel noncascaded architecture for better alignment between structure and content. By incorporating syntactic information in the generating process, the proposed model better captures the lexical and structural characteristics of stylistic sentence construction. To enable fine-grained personalization, we develop a shared representation mechanism that facilitates information integration across users, supporting both faithful stylistic generation and generalizable zero-shot inference. Extensive experiments on multiple tasks demonstrate the superiority of our approach in fluency, diversity, and stylistic fidelity. Further qualitative analyses highlight its interpretability and flexibility in learning personalized patterns.
