Controlling Chat Style in Language Models via Single-Direction Editing

Zhenyu Xu; Victor S. Sheng

Controlling Chat Style in Language Models via Single-Direction Editing

Zhenyu Xu, Victor S. Sheng

TL;DR

This paper provides strong empirical evidence for the hypothesis that distinct stylistic attributes - from emotional tone to linguistic structure - are encoded as linear directions in the model's activation space and presents a lightweight, training-free method for precise style control.

Abstract

Controlling stylistic attributes in large language models (LLMs) remains challenging, with existing approaches relying on either prompt engineering or post-training alignment. This paper investigates this challenge through the lens of representation engineering, testing the hypothesis that distinct stylistic attributes - from emotional tone to linguistic structure - are encoded as linear directions in the model's activation space. We provide strong empirical evidence for this hypothesis across a wide range of styles and, based on this finding, present a lightweight, training-free method for precise style control. Our approach supports linear style composition, enhances safety by ablating undesirable behaviors, and, as confirmed by experiments on over a dozen models, achieves high style adherence while preserving core capabilities at minimal computational cost.

Controlling Chat Style in Language Models via Single-Direction Editing

TL;DR

Abstract

Paper Structure (39 sections, 4 equations, 6 figures, 8 tables)

This paper contains 39 sections, 4 equations, 6 figures, 8 tables.

Introduction
Related Work
Human Preference Optimization
Vector-based Editing and Activation Steering
Method
Data Collection
Chat-Style Direction Extraction
Weight Modification via Orthogonalization
Style Direction Composition
Experimental Setup
Models and Datasets
Evaluation Metrics
Eval Score
Style Adherence Rate
Unsafe Score
...and 24 more sections

Figures (6)

Figure 1: Single direction vector steering chat-style: editing style vectors transform a neutral LLM into expressive personas. Linear addition of vectors also yields hybrid styles.
Figure 2: Overview of our style direction extraction and orthogonalization approach. The modified model generates outputs that consistently exhibit the target style.
Figure 3: Chat-style controllable image descriptions using our modified LLaVA-1.5 model. (a) Original output from the base model; (b) Pessimistic output after injecting a chat-style vector. Both are generated from the same image input.
Figure 4: Chat-style controllable image descriptions using our modified LLaVA-1.5 model. (a) Original output from the base model; (b) Safer output after injecting a chat-style vector. Both are generated from the same image input.
Figure 5: GPT-4 Eval Scores for base models and their chat-style edited variants across 14 instruction-tuned models. Chat-style edits consistently preserve high generation quality across architectures.
...and 1 more figures

Controlling Chat Style in Language Models via Single-Direction Editing

TL;DR

Abstract

Controlling Chat Style in Language Models via Single-Direction Editing

Authors

TL;DR

Abstract

Table of Contents

Figures (6)