PICLe: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning
Hyeong Kyu Choi, Yixuan Li
TL;DR
PICLe addresses the challenge of eliciting diverse target personas from large language models by casting the model as a Bayesian mixture of persona distributions and using informed in-context demonstrations. The core idea is to select a small set of demonstrations via a likelihood-ratio criterion that maximizes the target persona’s influence, thereby increasing the posterior probability of the desired persona $\tilde{\phi}$ given a prompt. Empirically, PICLe consistently surpasses a wide range of baselines across Llama-2, Vicuna, and GPT-J, achieving high action-consistency (e.g., 88.1% on Llama-2) and demonstrating robustness to hyperparameters and data regimes. The approach also shows improved performance for non-RLHF models, and label-aware selection further boosts results, indicating practical viability for steering LLM behavior while maintaining data efficiency and manageable compute overhead.
Abstract
Large Language Models (LLMs) are trained on massive text corpora, which are encoded with diverse personality traits. This triggers an interesting goal of eliciting a desired personality trait from the LLM, and probing its behavioral preferences. Accordingly, we formalize the persona elicitation task, aiming to customize LLM behaviors to align with a target persona. We present Persona In-Context Learning (PICLe), a novel persona elicitation framework grounded in Bayesian inference. At the core, PICLe introduces a new ICL example selection criterion based on likelihood ratio, which is designed to optimally guide the model in eliciting a specific target persona. We demonstrate the effectiveness of PICLe through extensive comparisons against baseline methods across three contemporary LLMs. Code is available at https://github.com/deeplearning-wisc/picle.
