Personality Alignment of Large Language Models

Minjun Zhu; Yixuan Weng; Linyi Yang; Yue Zhang

Personality Alignment of Large Language Models

Minjun Zhu, Yixuan Weng, Linyi Yang, Yue Zhang

TL;DR

The paper introduces Personality Alignment to customize LLM behavior to individual users, backed by the PAPI dataset that combines Big Five and Dark Triad measures. It proposes Personality Activation Search (PAS), an activation-space intervention that identifies key attention-head directions and modulates them without weight updates, achieving strong per-trait alignment with high data efficiency. Across Llama backbones, PAS outperforms traditional alignment methods and ICL in both trait-specific and open-ended tasks, while maintaining reasoning and safety. The work also includes extensive human evaluations and open dissemination of data and code, highlighting practical, ethical considerations for personalized AI systems.

Abstract

Aligning large language models (LLMs) typically aim to reflect general human values and behaviors, but they often fail to capture the unique characteristics and preferences of individual users. To address this gap, we introduce the concept of Personality Alignment. This approach tailors LLMs' responses and decisions to match the specific preferences of individual users or closely related groups. Inspired by psychometrics, we created the Personality Alignment with Personality Inventories (PAPI) dataset, which includes data from over 320,000 real subjects across multiple personality assessments, including both the Big Five Personality Factors and Dark Triad traits. This comprehensive dataset enables quantitative evaluation of LLMs' alignment capabilities across both positive and potentially problematic personality dimensions. Recognizing the challenges of personality alignments, such as limited personal data, diverse preferences, and scalability requirements, we developed an activation intervention optimization method. This method enhances LLMs' ability to efficiently align with individual behavioral preferences using minimal data and computational resources. Remarkably, our method, PAS, achieves superior performance while requiring only 1/5 of the optimization time compared to DPO, offering practical value for personality alignment. Our work paves the way for future AI systems to make decisions and reason in truly personality ways, enhancing the relevance and meaning of AI interactions for each user and advancing human-centered artificial intelligence. The dataset and code are released at https://github.com/zhu-minjun/PAlign.

Personality Alignment of Large Language Models

TL;DR

Abstract

Paper Structure (58 sections, 4 equations, 36 figures, 7 tables)

This paper contains 58 sections, 4 equations, 36 figures, 7 tables.

Introduction
Related Work
Personality Alignment Dataset Construction
Personality Activation Search
Language Models
Search the Directions for Activation Intervention
Search Distance for Activation Intervention
Experiments
Experimental Settings
Results on PAPI
Generalization Results
Discussion: Is Value-Aligned Assistant a Good Assistant?
Conclusion
Limitations
Ethical Considerations
...and 43 more sections

Figures (36)

Figure 1: On the left, AI aligns to broad human values like helpfulness, honesty, and harmlessness using a standard set. On the right, the focus is on aligning AI behavior with individual users' specific traits and preferences, using detailed profiles to reflect unique personal values.
Figure 2: Overview of the PAPI dataset. (a) Illustrates the comparison between the subject's self-assessment and the AI's assessment of a specific question. (b) Shows an example of the IPIP-NEO-120 and IPIP-NEO-300 questionnaire responses. (c) Depicts the Big Five personality traits profile.
Figure 3: Visualization of the K-Means Clustering on the PAPI Dataset. The centroids are marked in red, demonstrating the central point of each cluster, while the closest samples to these centroids are highlighted in gold.
Figure 4: Overview of the PAS Process. Step 1 involves selecting a LLM. Step 2 includes creating and activating personality responses using the answer of questionnaire, the subject's answer in the PAPI dataset, and probing each attention head. Step 3 integrates these activations for personality alignment, adjusting model outputs to reflect individual user traits, resulting in an aligned LLM.
Figure 5: Comparison of performance and efficiency for various alignment methods. The larger the multiple on the x-axis, the lower the efficiency.
...and 31 more figures

Personality Alignment of Large Language Models

TL;DR

Abstract

Personality Alignment of Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (36)