Table of Contents
Fetching ...

BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation

Tsung-Min Pai, Jui-I Wang, Li-Chun Lu, Shao-Hua Sun, Hung-Yi Lee, Kai-Wei Chang

TL;DR

BILLY addresses the inefficiency of multi-LLM collaboration by training-free activation steering that blends multiple persona vectors within a single LLM. It extracts per-layer persona directions via contrastive activation, offline-fuses them into a composite vector, and applies this vector during inference to produce multi-perspective, creative outputs with far lower cost and latency than traditional multi-LLM setups. Across four TTCT-derived benchmarks, BILLY outperforms prompting baselines and even costly LLM-Discussion approaches, achieving higher originality while reducing token usage and inference time. The work also provides analyses of vector composition and activation projections, demonstrating improved controllability and interpretability over prompting, and discusses limitations and future directions for more nuanced vector weighting and composition strategies.

Abstract

Multi-LLM systems enhance the creativity of large language models by simulating human collective intelligence but suffer from significant drawbacks, such as high computational costs and inference latency. To address these limitations, we propose BILLY (BlendIng persona vectors for Large Language model creativitY), a training-free framework that captures the benefits of multi-LLM collaboration, i.e. inducing diverse perspectives and specialized expertise, within a single model. BILLY operates by extracting and blending multiple distinct persona vectors directly in the model's activation space. We steer the model's generation process with this merged vector while inference, enabling multi-perspective output without explicit multi-LLM communication. Our experiments across creativity-oriented benchmarks demonstrate that BILLY surpasses single model prompting and traditional multi-LLM approaches, while substantially reducing inference time and computational costs. Our analyses further reveal that distinct persona vectors can be blended to achieve both effective control over complementary aspects of generation and greater interpretability.

BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation

TL;DR

BILLY addresses the inefficiency of multi-LLM collaboration by training-free activation steering that blends multiple persona vectors within a single LLM. It extracts per-layer persona directions via contrastive activation, offline-fuses them into a composite vector, and applies this vector during inference to produce multi-perspective, creative outputs with far lower cost and latency than traditional multi-LLM setups. Across four TTCT-derived benchmarks, BILLY outperforms prompting baselines and even costly LLM-Discussion approaches, achieving higher originality while reducing token usage and inference time. The work also provides analyses of vector composition and activation projections, demonstrating improved controllability and interpretability over prompting, and discusses limitations and future directions for more nuanced vector weighting and composition strategies.

Abstract

Multi-LLM systems enhance the creativity of large language models by simulating human collective intelligence but suffer from significant drawbacks, such as high computational costs and inference latency. To address these limitations, we propose BILLY (BlendIng persona vectors for Large Language model creativitY), a training-free framework that captures the benefits of multi-LLM collaboration, i.e. inducing diverse perspectives and specialized expertise, within a single model. BILLY operates by extracting and blending multiple distinct persona vectors directly in the model's activation space. We steer the model's generation process with this merged vector while inference, enabling multi-perspective output without explicit multi-LLM communication. Our experiments across creativity-oriented benchmarks demonstrate that BILLY surpasses single model prompting and traditional multi-LLM approaches, while substantially reducing inference time and computational costs. Our analyses further reveal that distinct persona vectors can be blended to achieve both effective control over complementary aspects of generation and greater interpretability.

Paper Structure

This paper contains 32 sections, 6 equations, 6 figures, 22 tables.

Figures (6)

  • Figure 1: BILLY (BlendIng persona vectors for Large Language model creativitY). To enhance the creativity of a single LLM, we extract and fuse the persona vectors of a Creative Professional and an Environmentalist, steering a base model by this composite vector to generate outputs based on both domains.
  • Figure 2: Persona Vector Combinations Analysis. Based on the default 4 vectors, we modify the combination of persona vectors from one to seven.
  • Figure 3: Qualitative Results. Responses generated by models that are steered by BILLY (ENV), BILLY (CRE), BILLY (CRE + ENV), and MRP (CRE + ENV).
  • Figure 4: Projection of Different Methods. Projection of activation changes on the layer-specific creative professional and environmentalist persona vectors. Figures (a) and (b) show the comparison between Base Model (without system prompt), Prompt (CRE+ENV), and BILLY (CRE+ENV). Figure (c) and (d) demonstrate the projection of applying (i) only BILLY (CRE), (ii) only BILLY (ENV), and (iii) BILLY (CRE+ENV) at Layer 20.
  • Figure 5: Amortized Average Input Token Per Query. Token cost per query would be amortized with more inference frequency.
  • ...and 1 more figures