Table of Contents
Fetching ...

PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning

Hyeong Kyu Choi, Yixuan Li

TL;DR

PICLe addresses the challenge of eliciting diverse target personas from large language models by casting the model as a Bayesian mixture of persona distributions and using informed in-context demonstrations. The core idea is to select a small set of demonstrations via a likelihood-ratio criterion that maximizes the target persona’s influence, thereby increasing the posterior probability of the desired persona $\tilde{\phi}$ given a prompt. Empirically, PICLe consistently surpasses a wide range of baselines across Llama-2, Vicuna, and GPT-J, achieving high action-consistency (e.g., 88.1% on Llama-2) and demonstrating robustness to hyperparameters and data regimes. The approach also shows improved performance for non-RLHF models, and label-aware selection further boosts results, indicating practical viability for steering LLM behavior while maintaining data efficiency and manageable compute overhead.

Abstract

Large Language Models (LLMs) are trained on massive text corpora, which are encoded with diverse personality traits. This triggers an interesting goal of eliciting a desired personality trait from the LLM, and probing its behavioral preferences. Accordingly, we formalize the persona elicitation task, aiming to customize LLM behaviors to align with a target persona. We present Persona In-Context Learning (PICLe), a novel persona elicitation framework grounded in Bayesian inference. At the core, PICLe introduces a new ICL example selection criterion based on likelihood ratio, which is designed to optimally guide the model in eliciting a specific target persona. We demonstrate the effectiveness of PICLe through extensive comparisons against baseline methods across three contemporary LLMs. Code is available at https://github.com/deeplearning-wisc/picle.

PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning

TL;DR

PICLe addresses the challenge of eliciting diverse target personas from large language models by casting the model as a Bayesian mixture of persona distributions and using informed in-context demonstrations. The core idea is to select a small set of demonstrations via a likelihood-ratio criterion that maximizes the target persona’s influence, thereby increasing the posterior probability of the desired persona given a prompt. Empirically, PICLe consistently surpasses a wide range of baselines across Llama-2, Vicuna, and GPT-J, achieving high action-consistency (e.g., 88.1% on Llama-2) and demonstrating robustness to hyperparameters and data regimes. The approach also shows improved performance for non-RLHF models, and label-aware selection further boosts results, indicating practical viability for steering LLM behavior while maintaining data efficiency and manageable compute overhead.

Abstract

Large Language Models (LLMs) are trained on massive text corpora, which are encoded with diverse personality traits. This triggers an interesting goal of eliciting a desired personality trait from the LLM, and probing its behavioral preferences. Accordingly, we formalize the persona elicitation task, aiming to customize LLM behaviors to align with a target persona. We present Persona In-Context Learning (PICLe), a novel persona elicitation framework grounded in Bayesian inference. At the core, PICLe introduces a new ICL example selection criterion based on likelihood ratio, which is designed to optimally guide the model in eliciting a specific target persona. We demonstrate the effectiveness of PICLe through extensive comparisons against baseline methods across three contemporary LLMs. Code is available at https://github.com/deeplearning-wisc/picle.
Paper Structure (45 sections, 15 equations, 2 figures, 16 tables, 1 algorithm)

This paper contains 45 sections, 15 equations, 2 figures, 16 tables, 1 algorithm.

Figures (2)

  • Figure 1: Persona ICL. PICLe aims to elicit a target persona $\tilde{\phi}$ by providing the LLM with the $K$ best demonstrative examples selected via our likelihood-ratio-based criterion in Eq. \ref{['eq:objective-alt']}. The figure depicts $\tilde{\phi} =$ "narcissism", and green is the selected examples.
  • Figure 2: Effect of number of ICL examples. Action Consistency values of PICLe and the Similarity baseline are compared.