JAM: Controllable and Responsible Text Generation via Causal Reasoning and Latent Vector Manipulation

Yingbing Huang; Deming Chen; Abhishek K. Umrawal

JAM: Controllable and Responsible Text Generation via Causal Reasoning and Latent Vector Manipulation

Yingbing Huang, Deming Chen, Abhishek K. Umrawal

TL;DR

JAM introduces a causal, latent-space approach to controllable text generation, enabling small latent-vector moves to steer outputs while preserving LLM causality. A binary linear classifier is trained on latent representations to detect attributes, and during inference JAM computes a minimal perturbation along a decision hyperplane to manipulate the output, achieving improved alignment with Harmless, Honest, and Helpful criteria. Across multiple LLMs and with GPT-4 as a judge, JAM demonstrates up to 10% gains on HHH metrics and favorable human-aligned preferences, with negligible overhead compared to prior CTG methods. The work advances interpretability and reliability in CTG, suggesting broader applicability to real-world AI systems and future extensions to more complex latent manipulations and agent-based architectures.

Abstract

While large language models (LLMs) have made significant strides in generating coherent and contextually relevant text, they often function as opaque black boxes, trained on vast unlabeled datasets with statistical objectives, lacking an interpretable framework for responsible control. In this paper, we introduce JAM (Just A Move), a novel framework that interprets and controls text generation by integrating cause-effect analysis within the latent space of LLMs. Based on our observations, we uncover the inherent causality in LLM generation, which is critical for producing responsible and realistic outputs. Moreover, we explore latent vectors as fundamental components in LLM architectures, aiming to understand and manipulate them for more effective and efficient controllable text generation. We evaluate our framework using a range of tools, including the HHH criteria, toxicity reduction benchmarks, and GPT-4 alignment measures. Our results show that JAM achieves up to a 22% improvement over previous Controllable Text Generation (CTG) methods across multiple quantitative metrics and human-centric evaluations. Furthermore, JAM demonstrates greater computational efficiency compared to other CTG methods. These results highlight the effectiveness and efficiency of JAM for responsible and realistic text generation, paving the way for more interpretable and controllable models.

JAM: Controllable and Responsible Text Generation via Causal Reasoning and Latent Vector Manipulation

TL;DR

Abstract

JAM: Controllable and Responsible Text Generation via Causal Reasoning and Latent Vector Manipulation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)