Table of Contents
Fetching ...

Personality Editing for Language Models through Adjusting Self-Referential Queries

Seojin Hwang, Yumin Kim, Byeongjeong Kim, Donghoon Shin, Hwanhee Lee

TL;DR

This work addresses the challenge of robust, data-efficient personality control for large language models (LLMs), where prompt-based and fine-tuning approaches struggle with stability and scalability. It introduces PALETTE, a knowledge-editing framework that targets internal self-representations by generating self-referential adjustment queries grounded in MBTI assessments and applying a low-rank update via Rank-One Model Editing (r-ROME) to align responses with a desired personality. Across MBTI and Big Five evaluations on two diverse LLMs, PALETTE achieves 5–25% improvements in targeted personality alignment using only 12 adjustment queries, while preserving overall response quality and demonstrating robustness against opposing prompts. The approach offers a data-efficient, scalable alternative to fine-tuning and prompt-engineering for reliable personality control in practical AI deployments.

Abstract

Large Language Models (LLMs) are integral to applications such as conversational agents and content creation, where precise control over a model's personality is essential for maintaining tone, consistency, and user engagement. However, prevailing prompt-based or fine-tuning approaches either lack robustness or demand large-scale training data, making them costly and impractical. In this paper, we present PALETTE (Personality Adjustment by LLM SElf-TargeTed quEries), a novel method for personality editing in LLMs. Our approach introduces adjustment queries, where self-referential statements grounded in psychological constructs are treated analogously to factual knowledge, enabling direct editing of personality-related responses. Unlike fine-tuning, PALETTE requires only 12 editing samples to achieve substantial improvements in personality alignment across personality dimensions. Experimental results from both automatic and human evaluations demonstrate that our method enables more stable and well-balanced personality control in LLMs.

Personality Editing for Language Models through Adjusting Self-Referential Queries

TL;DR

This work addresses the challenge of robust, data-efficient personality control for large language models (LLMs), where prompt-based and fine-tuning approaches struggle with stability and scalability. It introduces PALETTE, a knowledge-editing framework that targets internal self-representations by generating self-referential adjustment queries grounded in MBTI assessments and applying a low-rank update via Rank-One Model Editing (r-ROME) to align responses with a desired personality. Across MBTI and Big Five evaluations on two diverse LLMs, PALETTE achieves 5–25% improvements in targeted personality alignment using only 12 adjustment queries, while preserving overall response quality and demonstrating robustness against opposing prompts. The approach offers a data-efficient, scalable alternative to fine-tuning and prompt-engineering for reliable personality control in practical AI deployments.

Abstract

Large Language Models (LLMs) are integral to applications such as conversational agents and content creation, where precise control over a model's personality is essential for maintaining tone, consistency, and user engagement. However, prevailing prompt-based or fine-tuning approaches either lack robustness or demand large-scale training data, making them costly and impractical. In this paper, we present PALETTE (Personality Adjustment by LLM SElf-TargeTed quEries), a novel method for personality editing in LLMs. Our approach introduces adjustment queries, where self-referential statements grounded in psychological constructs are treated analogously to factual knowledge, enabling direct editing of personality-related responses. Unlike fine-tuning, PALETTE requires only 12 editing samples to achieve substantial improvements in personality alignment across personality dimensions. Experimental results from both automatic and human evaluations demonstrate that our method enables more stable and well-balanced personality control in LLMs.

Paper Structure

This paper contains 42 sections, 3 equations, 5 figures, 26 tables.

Figures (5)

  • Figure 1: Lack of consistency for prompt-based personality control: (1) Certain personalities resist control due to biases. (2) Shifts drastically between prompts.
  • Figure 2: Overview of the PALETTE's pipeline for Thinking dimension in MBTI. We (1) produce adjustment queries based on the MBTI questionnaire, then (2) edit the personality through relevant knowledge editing. (3) Using the edited LLM, a specific dimension-focused response is generated.
  • Figure 3: Target personality alignment comparison of PALETTE and base model across MBTI dimensions, evaluated by ChatGPT (left) and human annotators (right) on Qwen-2.5-1.5B. P-values computed with $n=200$. Statistically significant p-values ($p < 0.05$) are underlined. The consistent alignment trend supports the reliability of our automated evaluation.
  • Figure 4: Robustness evaluation results to prompt-induced bias of opposite dimension in MBTI (E/I, N/S, F/T, P/J) for Qwen-2.5-1.5B.
  • Figure 5: Results for MBTI personality comparison evaluation in opposing prompt condition in Qwen-2.5-1.5B.