Table of Contents
Fetching ...

Editing Personality for Large Language Models

Shengyu Mao, Xiaohan Wang, Mengru Wang, Yong Jiang, Pengjun Xie, Fei Huang, Ningyu Zhang

TL;DR

The paper addresses the challenge of editing LLM personality traits for topic-specific opinions. It introduces PersonalityEdit, a benchmark built on Big Five facets and enables topic-constrained, trait-guided data generation via GPT-4, followed by automated and human-quality control. The study evaluates multiple baselines (MEND, SERAC, IKE, PROMPT) on GPT-J-6B and Llama-2-chat using novel metrics such as ES, DD, Accuracy, TPEI, and PAE, and employs a personality classifier to quantify trait alignment. Findings indicate that while some methods can steer trait expression, achieving fluent, accurate, and consistently targeted edits remains challenging, highlighting opportunities for further research in model editing, evaluation, and the ethics of personality manipulation in LLMs.

Abstract

This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs). This task seeks to adjust the models' responses to opinion-related questions on specified topics since an individual's personality often manifests in the form of their expressed opinions, thereby showcasing different personality traits. Specifically, we construct PersonalityEdit, a new benchmark dataset to address this task. Drawing on the theory in Social Psychology, we isolate three representative traits, namely Neuroticism, Extraversion, and Agreeableness, as the foundation for our benchmark. We then gather data using GPT-4, generating responses that align with a specified topic and embody the targeted personality trait. We conduct comprehensive experiments involving various baselines and discuss the representation of personality behavior in LLMs. Our findings uncover potential challenges of the proposed task, illustrating several remaining issues. We anticipate that our work can stimulate further annotation in model editing and personality-related research. Code is available at https://github.com/zjunlp/EasyEdit.

Editing Personality for Large Language Models

TL;DR

The paper addresses the challenge of editing LLM personality traits for topic-specific opinions. It introduces PersonalityEdit, a benchmark built on Big Five facets and enables topic-constrained, trait-guided data generation via GPT-4, followed by automated and human-quality control. The study evaluates multiple baselines (MEND, SERAC, IKE, PROMPT) on GPT-J-6B and Llama-2-chat using novel metrics such as ES, DD, Accuracy, TPEI, and PAE, and employs a personality classifier to quantify trait alignment. Findings indicate that while some methods can steer trait expression, achieving fluent, accurate, and consistently targeted edits remains challenging, highlighting opportunities for further research in model editing, evaluation, and the ethics of personality manipulation in LLMs.

Abstract

This paper introduces an innovative task focused on editing the personality traits of Large Language Models (LLMs). This task seeks to adjust the models' responses to opinion-related questions on specified topics since an individual's personality often manifests in the form of their expressed opinions, thereby showcasing different personality traits. Specifically, we construct PersonalityEdit, a new benchmark dataset to address this task. Drawing on the theory in Social Psychology, we isolate three representative traits, namely Neuroticism, Extraversion, and Agreeableness, as the foundation for our benchmark. We then gather data using GPT-4, generating responses that align with a specified topic and embody the targeted personality trait. We conduct comprehensive experiments involving various baselines and discuss the representation of personality behavior in LLMs. Our findings uncover potential challenges of the proposed task, illustrating several remaining issues. We anticipate that our work can stimulate further annotation in model editing and personality-related research. Code is available at https://github.com/zjunlp/EasyEdit.
Paper Structure (22 sections, 2 equations, 4 figures, 4 tables)

This paper contains 22 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The diagram of our proposed task to edit personality for LLMs.
  • Figure 2: Overview of our PersonalityEdit benchmark construction, including selecting personality traits, topic filtering, data generation, and quality control.
  • Figure 3: Figure (A) shows the predicted personality traits of the original expressions of LLMs. The original LLMs predominantly exhibit traits of Extraversion and Neuroticism. Conversely, Agreeableness in the viewpoints are less frequent in comparison. Figure (B) indicates the prediction result of different target personalities when editing llama-2-7b-chat by IKE.
  • Figure 4: Case of the editing personality for the topic Justin Bieber.