Table of Contents
Fetching ...

Multi-Personality Generation of LLMs at Decoding-time

Rongxin Chen, Yunfan Li, Yige Yuan, Bingbing Xu, Huawei Shen

TL;DR

This work addresses the challenge of generating text that embodies multiple personalization attributes at decoding-time without retraining. It introduces Multi-Personality Generation (MPG), a density-ratio-based framework that aggregates single-attribute preferences into a target distribution, enabling flexible control over multiple traits. To make decoding efficient, it proposes Speculative Chunk-level based Rejection Sampling (SCR), which proposes and validates multi-token chunks in parallel using online thresholds and prefix salvage to maintain correctness. Empirical results on MBTI personality simulation and Role-Playing tasks show up to 16–18% improvements over baselines, with SCR achieving high-quality outputs while significantly reducing computational overhead; the approach also benefits from using specialized reference models as proposals. The work provides open-source code and data and highlights practical implications for deploying dynamic, multi-dimensional personalization in LLM applications.

Abstract

Multi-personality generation for LLMs, enabling simultaneous embodiment of multiple personalization attributes, is a fundamental challenge. Existing retraining-based approaches are costly and poorly scalable, while decoding-time methods often rely on external models or heuristics, limiting flexibility and robustness. In this paper, we propose a novel Multi-Personality Generation (MPG) framework under the decoding-time combination paradigm. It flexibly controls multi-personality without relying on scarce multi-dimensional models or extra training, leveraging implicit density ratios in single-dimensional models as a "free lunch" to reformulate the task as sampling from a target strategy aggregating these ratios. To implement MPG efficiently, we design Speculative Chunk-level based Rejection sampling (SCR), which generates responses in chunks and parallelly validates them via estimated thresholds within a sliding window. This significantly reduces computational overhead while maintaining high-quality generation. Experiments on MBTI personality and Role-Playing demonstrate the effectiveness of MPG, showing improvements up to 16%-18%. Code and data are available at https://github.com/Libra117/MPG .

Multi-Personality Generation of LLMs at Decoding-time

TL;DR

This work addresses the challenge of generating text that embodies multiple personalization attributes at decoding-time without retraining. It introduces Multi-Personality Generation (MPG), a density-ratio-based framework that aggregates single-attribute preferences into a target distribution, enabling flexible control over multiple traits. To make decoding efficient, it proposes Speculative Chunk-level based Rejection Sampling (SCR), which proposes and validates multi-token chunks in parallel using online thresholds and prefix salvage to maintain correctness. Empirical results on MBTI personality simulation and Role-Playing tasks show up to 16–18% improvements over baselines, with SCR achieving high-quality outputs while significantly reducing computational overhead; the approach also benefits from using specialized reference models as proposals. The work provides open-source code and data and highlights practical implications for deploying dynamic, multi-dimensional personalization in LLM applications.

Abstract

Multi-personality generation for LLMs, enabling simultaneous embodiment of multiple personalization attributes, is a fundamental challenge. Existing retraining-based approaches are costly and poorly scalable, while decoding-time methods often rely on external models or heuristics, limiting flexibility and robustness. In this paper, we propose a novel Multi-Personality Generation (MPG) framework under the decoding-time combination paradigm. It flexibly controls multi-personality without relying on scarce multi-dimensional models or extra training, leveraging implicit density ratios in single-dimensional models as a "free lunch" to reformulate the task as sampling from a target strategy aggregating these ratios. To implement MPG efficiently, we design Speculative Chunk-level based Rejection sampling (SCR), which generates responses in chunks and parallelly validates them via estimated thresholds within a sliding window. This significantly reduces computational overhead while maintaining high-quality generation. Experiments on MBTI personality and Role-Playing demonstrate the effectiveness of MPG, showing improvements up to 16%-18%. Code and data are available at https://github.com/Libra117/MPG .

Paper Structure

This paper contains 25 sections, 20 equations, 4 figures, 8 tables, 1 algorithm.

Figures (4)

  • Figure 1: Role-Playing in Multi-personality generation.
  • Figure 2: An illustration of the proposed Speculative Chunk-level based Rejection sampling(SCR) algorithm. Given a prompt, the reference model generates $k$-token speculative chunks, which are scored by multiple preference models in parallel via weighted density ratios relative to the reference. Chunk-level acceptance is performed using the aggregated score, with prefix salvage applied if a full chunk is rejected. This integrates speculative decoding with rejection sampling to efficiently sample from the multi-preference target distribution while reducing large-model evaluations.
  • Figure 3: Iterative tuning process for the $\alpha$. Bars indicate prediction accuracy (left axis) for each MBTI dimension; dashed lines track $\alpha$ values (right axis) at each optimization step. (a) ESTP-targeted tuning shows monotonic $\alpha$ progression. (b) INFJ-targeted tuning demonstrates non-monotonic adjustments with negative $\alpha$ phases.
  • Figure 4: The variation of model Overall Scores with the number of adjustments of $\alpha$.