Table of Contents
Fetching ...

Designing Role Vectors to Improve LLM Inference Behaviour

Daniele Potertì, Andrea Seveso, Fabio Mercorio

TL;DR

This paper introduces role vectors—latent activation-space directions extracted from internal model activations—as a mechanism to steer LLMs toward domain-specific expertise, offering an alternative to traditional persona-based prompting. By constructing 29 role directions from difference-of-means activations and applying two interventions, activation addition and directional ablation, the authors demonstrate measurable improvements on domain-relevant benchmarks, with limited impact on unrelated tasks. The approach relies on a three-part pipeline: generating role-specific prompts from PersonaHub and Alpaca to form $D_r$ and $D_{ ext{base}}$, computing directions $d_{i,r}^{(l)} = \mu_{i,r}^{(l)} - \nu_i^{(l)}$, and evaluating interventions across layers $l$, positions $i$, and coefficients $\alpha$ on the MMLU-style test set $\\mathcal{D}_{\\text{test}}$. Results show that larger models exhibit stronger, more interpretable directional signals, and that only a subset of improving directions align with the intended role according to patch-scoping; ablation studies reveal that removing these directions can degrade domain-specific performance while sometimes sparing or slightly improving unrelated tasks. These findings indicate that manipulating internal representations can more effectively steer performance than persona-based prompting, with significant implications for controllable LLM behavior and future mechanistic analyses using Activation Patching.

Abstract

The influence of personas on Large Language Models (LLMs) has been widely studied, yet their direct impact on performance remains uncertain. This work explores a novel approach to guiding LLM behaviour through role vectors, an alternative to persona-based prompting. We construct 29 role vectors derived from model activations and evaluate their impact on benchmark performance across multiple domains. Our analysis investigates whether these vectors can effectively steer models toward domain-specific expertise. We measure two key interventions: (i) activation addition, which reinforces role-specific directions, and (ii) directional ablation, which removes them. Results on well-established benchmarks indicate that role vectors do, in fact, influence model behaviour, improving task performance in relevant domains while marginally affecting unrelated tasks. This, in turn, suggests that manipulating internal model representations has a greater impact on outcomes than persona-based prompting.

Designing Role Vectors to Improve LLM Inference Behaviour

TL;DR

This paper introduces role vectors—latent activation-space directions extracted from internal model activations—as a mechanism to steer LLMs toward domain-specific expertise, offering an alternative to traditional persona-based prompting. By constructing 29 role directions from difference-of-means activations and applying two interventions, activation addition and directional ablation, the authors demonstrate measurable improvements on domain-relevant benchmarks, with limited impact on unrelated tasks. The approach relies on a three-part pipeline: generating role-specific prompts from PersonaHub and Alpaca to form and , computing directions , and evaluating interventions across layers , positions , and coefficients on the MMLU-style test set . Results show that larger models exhibit stronger, more interpretable directional signals, and that only a subset of improving directions align with the intended role according to patch-scoping; ablation studies reveal that removing these directions can degrade domain-specific performance while sometimes sparing or slightly improving unrelated tasks. These findings indicate that manipulating internal representations can more effectively steer performance than persona-based prompting, with significant implications for controllable LLM behavior and future mechanistic analyses using Activation Patching.

Abstract

The influence of personas on Large Language Models (LLMs) has been widely studied, yet their direct impact on performance remains uncertain. This work explores a novel approach to guiding LLM behaviour through role vectors, an alternative to persona-based prompting. We construct 29 role vectors derived from model activations and evaluate their impact on benchmark performance across multiple domains. Our analysis investigates whether these vectors can effectively steer models toward domain-specific expertise. We measure two key interventions: (i) activation addition, which reinforces role-specific directions, and (ii) directional ablation, which removes them. Results on well-established benchmarks indicate that role vectors do, in fact, influence model behaviour, improving task performance in relevant domains while marginally affecting unrelated tasks. This, in turn, suggests that manipulating internal model representations has a greater impact on outcomes than persona-based prompting.

Paper Structure

This paper contains 19 sections, 8 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Illustrative example demonstrating how role vectors (e.g., chemist) can influence model outputs.
  • Figure 2: Prompt template for generating persona-specific tasks.
  • Figure 3: Diverse interpretations of $\blacklozenge$, before and after model intervention.
  • Figure 4: Spearman correlation of the percentage improvement in performance (relative to baseline) between each model after applying activation addition. * corresponds to p-values $\leq 0.05$, ** $\leq 0.01$, *** $\leq 0.001$.
  • Figure 5: Prompt for evaluating patch scoping output provided to Claude 3.5 Haiku.