Personality Editing for Language Models through Adjusting Self-Referential Queries

Seojin Hwang; Yumin Kim; Byeongjeong Kim; Donghoon Shin; Hwanhee Lee

Personality Editing for Language Models through Adjusting Self-Referential Queries

Seojin Hwang, Yumin Kim, Byeongjeong Kim, Donghoon Shin, Hwanhee Lee

TL;DR

This work addresses the challenge of robust, data-efficient personality control for large language models (LLMs), where prompt-based and fine-tuning approaches struggle with stability and scalability. It introduces PALETTE, a knowledge-editing framework that targets internal self-representations by generating self-referential adjustment queries grounded in MBTI assessments and applying a low-rank update via Rank-One Model Editing (r-ROME) to align responses with a desired personality. Across MBTI and Big Five evaluations on two diverse LLMs, PALETTE achieves 5–25% improvements in targeted personality alignment using only 12 adjustment queries, while preserving overall response quality and demonstrating robustness against opposing prompts. The approach offers a data-efficient, scalable alternative to fine-tuning and prompt-engineering for reliable personality control in practical AI deployments.

Abstract

Large Language Models (LLMs) are integral to applications such as conversational agents and content creation, where precise control over a model's personality is essential for maintaining tone, consistency, and user engagement. However, prevailing prompt-based or fine-tuning approaches either lack robustness or demand large-scale training data, making them costly and impractical. In this paper, we present PALETTE (Personality Adjustment by LLM SElf-TargeTed quEries), a novel method for personality editing in LLMs. Our approach introduces adjustment queries, where self-referential statements grounded in psychological constructs are treated analogously to factual knowledge, enabling direct editing of personality-related responses. Unlike fine-tuning, PALETTE requires only 12 editing samples to achieve substantial improvements in personality alignment across personality dimensions. Experimental results from both automatic and human evaluations demonstrate that our method enables more stable and well-balanced personality control in LLMs.

Personality Editing for Language Models through Adjusting Self-Referential Queries

TL;DR

Abstract

Personality Editing for Language Models through Adjusting Self-Referential Queries

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)