Table of Contents
Fetching ...

LingGen: Scalable Multi-Attribute Linguistic Control via Power-Law Masking

Mohamed Elgaar, Hadi Amiri

TL;DR

LingGen addresses the challenge of fine-grained, real-valued linguistic control in controlled text generation by introducing a dedicated attribute encoder whose output is injected into the BOS embedding of a base LLM. A key contribution is P-MASKING, which samples per-example masking rates from a truncated Pareto distribution to enable robust control across variable attribute subsets from 1 to $k$ (up to $k=40$). Empirical results show LingGen achieves the lowest average attribute-control error (MSE) while preserving high fluency and inference efficiency, outperforming a wide set of baselines including prompting, fine-tuning, and decoding-time control methods. The work also analyzes attribute interactions, uncovering synergies and conflicts among linguistic attributes, and demonstrates LingGen’s robustness across different attribute counts and seeds. This approach enables scalable, controllable text generation with practical impact in accessibility, personalization, and education while highlighting ethical considerations for responsible deployment.

Abstract

We present LingGen, a controlled text generation model that allows fine-grained control over a large number of real-valued linguistic attributes. It encodes target attribute values with a dedicated linguistic attribute encoder and conditions the language model by injecting the resulting representation into the language model using the beginning-of-sequence (BOS) embeddings. To improve robustness when controlling different attribute subsets, we introduce P-MASKING, which samples per-example attribute masking rates from a truncated Pareto distribution during training. Across 1-40 control attributes, LingGen achieves the lowest average control error among evaluated methods, while remaining efficient at inference and receiving the highest fluency scores in human evaluation. Ablations show that Pareto-sampled masking and BOS-based injection are effective choices compared to alternative masking and integration variants.

LingGen: Scalable Multi-Attribute Linguistic Control via Power-Law Masking

TL;DR

LingGen addresses the challenge of fine-grained, real-valued linguistic control in controlled text generation by introducing a dedicated attribute encoder whose output is injected into the BOS embedding of a base LLM. A key contribution is P-MASKING, which samples per-example masking rates from a truncated Pareto distribution to enable robust control across variable attribute subsets from 1 to (up to ). Empirical results show LingGen achieves the lowest average attribute-control error (MSE) while preserving high fluency and inference efficiency, outperforming a wide set of baselines including prompting, fine-tuning, and decoding-time control methods. The work also analyzes attribute interactions, uncovering synergies and conflicts among linguistic attributes, and demonstrates LingGen’s robustness across different attribute counts and seeds. This approach enables scalable, controllable text generation with practical impact in accessibility, personalization, and education while highlighting ethical considerations for responsible deployment.

Abstract

We present LingGen, a controlled text generation model that allows fine-grained control over a large number of real-valued linguistic attributes. It encodes target attribute values with a dedicated linguistic attribute encoder and conditions the language model by injecting the resulting representation into the language model using the beginning-of-sequence (BOS) embeddings. To improve robustness when controlling different attribute subsets, we introduce P-MASKING, which samples per-example attribute masking rates from a truncated Pareto distribution during training. Across 1-40 control attributes, LingGen achieves the lowest average control error among evaluated methods, while remaining efficient at inference and receiving the highest fluency scores in human evaluation. Ablations show that Pareto-sampled masking and BOS-based injection are effective choices compared to alternative masking and integration variants.

Paper Structure

This paper contains 42 sections, 6 equations, 3 figures, 12 tables.

Figures (3)

  • Figure 1: Overview of the LingGen architecture for controlled text generation. 1) Masking Rate Sampler (Training): Implements P-MASKING by sampling attribute masking rates ($\rho_{mask}$) from a Pareto distribution per sample. 2) Attribute Encoder: Encodes linguistic attributes ($L_1..L_K$) into a combined representation using embeddings and attribute-specific token types ($T_1..T_K$). 3) Language Model: A Transformer Decoder generates text ($\hat{y}_1..\hat{y}_n$) conditioned on the attribute representation, which is injected into the BOS token (<s>) embedding to steer generation.
  • Figure 2: Interaction Effect ($\Delta \text{MSE}_{i \leftarrow j}$) of Controlling Attribute $j$ on Controlling Attribute $i$ (Statistically Significant Interactions, Row-Normalized). The cell values are the interaction effect ($\Delta \text{MSE}_{i \leftarrow j}$); Negative (blue) indicates synergy, while positive (red) indicates conflict.
  • Figure 3: Effect of the Pareto shape parameter $b$ on the distribution of sampled masking proportions. Each bar aggregates Monte Carlo samples into masking-rate buckets, with darker segments indicating higher masking.