Representation Engineering for Large-Language Models: Survey and Research Challenges

Lukasz Bartoszcze; Sarthak Munshi; Bryan Sukidi; Jennifer Yen; Zejia Yang; David Williams-King; Linh Le; Kosi Asuzu; Carsten Maple

Representation Engineering for Large-Language Models: Survey and Research Challenges

Lukasz Bartoszcze, Sarthak Munshi, Bryan Sukidi, Jennifer Yen, Zejia Yang, David Williams-King, Linh Le, Kosi Asuzu, Carsten Maple

TL;DR

This survey maps Representation Engineering for LLMs into Reading and Control, arguing that high-level concepts are encoded in latent subspaces that can be read and edited without full retraining. It develops a taxonomy of linear and optimized steering vectors, input-contrast methods, and dynamic strength strategies, supported by theoretical notions like the Linear Representation Hypothesis and the Superposition Hypothesis. The work systematically compares RepE to prompt-engineering, fine-tuning, and mechanistic interpretability, and discusses evaluation pipelines, open problems, and ethical considerations. Its findings underscore the potential for inference-time control to achieve personalized, safe, and high-performing LLMs, while highlighting standardization, generalization, and multimodal challenges that must be addressed to deploy RepE broadly.

Abstract

Large-language models are capable of completing a variety of tasks, but remain unpredictable and intractable. Representation engineering seeks to resolve this problem through a new approach utilizing samples of contrasting inputs to detect and edit high-level representations of concepts such as honesty, harmfulness or power-seeking. We formalize the goals and methods of representation engineering to present a cohesive picture of work in this emerging discipline. We compare it with alternative approaches, such as mechanistic interpretability, prompt-engineering and fine-tuning. We outline risks such as performance decrease, compute time increases and steerability issues. We present a clear agenda for future research to build predictable, dynamic, safe and personalizable LLMs.

Representation Engineering for Large-Language Models: Survey and Research Challenges

TL;DR

Abstract

Representation Engineering for Large-Language Models: Survey and Research Challenges

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)