Editing Personality for Large Language Models
Shengyu Mao, Xiaohan Wang, Mengru Wang, Yong Jiang, Pengjun Xie, Fei Huang, Ningyu Zhang
TL;DR
The paper addresses the challenge of editing LLM personality traits for topic-specific opinions. It introduces PersonalityEdit, a benchmark built on Big Five facets and enables topic-constrained, trait-guided data generation via GPT-4, followed by automated and human-quality control. The study evaluates multiple baselines (MEND, SERAC, IKE, PROMPT) on GPT-J-6B and Llama-2-chat using novel metrics such as ES, DD, Accuracy, TPEI, and PAE, and employs a personality classifier to quantify trait alignment. Findings indicate that while some methods can steer trait expression, achieving fluent, accurate, and consistently targeted edits remains challenging, highlighting opportunities for further research in model editing, evaluation, and the ethics of personality manipulation in LLMs.
Abstract
This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs). This task seeks to adjust the models' responses to opinion-related questions on specified topics since an individual's personality often manifests in the form of their expressed opinions, thereby showcasing different personality traits. Specifically, we construct PersonalityEdit, a new benchmark dataset to address this task. Drawing on the theory in Social Psychology, we isolate three representative traits, namely Neuroticism, Extraversion, and Agreeableness, as the foundation for our benchmark. We then gather data using GPT-4, generating responses that align with a specified topic and embody the targeted personality trait. We conduct comprehensive experiments involving various baselines and discuss the representation of personality behavior in LLMs. Our findings uncover potential challenges of the proposed task, illustrating several remaining issues. We anticipate that our work can stimulate further annotation in model editing and personality-related research. Code is available at https://github.com/zjunlp/EasyEdit.
