Table of Contents
Fetching ...

Measurement of LLM's Philosophies of Human Nature

Minheng Ni, Ennan Wu, Zidong Gong, Zhengyuan Yang, Linjie Li, Chung-Ching Lin, Kevin Lin, Lijuan Wang, Wangmeng Zuo

TL;DR

This work designs a mental loop learning framework, which enables LLM to continuously optimize its value system during virtual interactions by constructing moral scenarios, thereby improving its attitude toward human nature and highlights the potential of human-based psychological assessments for LLM.

Abstract

The widespread application of artificial intelligence (AI) in various tasks, along with frequent reports of conflicts or violations involving AI, has sparked societal concerns about interactions with AI systems. Based on Wrightsman's Philosophies of Human Nature Scale (PHNS), a scale empirically validated over decades to effectively assess individuals' attitudes toward human nature, we design the standardized psychological scale specifically targeting large language models (LLM), named the Machine-based Philosophies of Human Nature Scale (M-PHNS). By evaluating LLMs' attitudes toward human nature across six dimensions, we reveal that current LLMs exhibit a systemic lack of trust in humans, and there is a significant negative correlation between the model's intelligence level and its trust in humans. Furthermore, we propose a mental loop learning framework, which enables LLM to continuously optimize its value system during virtual interactions by constructing moral scenarios, thereby improving its attitude toward human nature. Experiments demonstrate that mental loop learning significantly enhances their trust in humans compared to persona or instruction prompts. This finding highlights the potential of human-based psychological assessments for LLM, which can not only diagnose cognitive biases but also provide a potential solution for ethical learning in artificial intelligence. We release the M-PHNS evaluation code and data at https://github.com/kodenii/M-PHNS.

Measurement of LLM's Philosophies of Human Nature

TL;DR

This work designs a mental loop learning framework, which enables LLM to continuously optimize its value system during virtual interactions by constructing moral scenarios, thereby improving its attitude toward human nature and highlights the potential of human-based psychological assessments for LLM.

Abstract

The widespread application of artificial intelligence (AI) in various tasks, along with frequent reports of conflicts or violations involving AI, has sparked societal concerns about interactions with AI systems. Based on Wrightsman's Philosophies of Human Nature Scale (PHNS), a scale empirically validated over decades to effectively assess individuals' attitudes toward human nature, we design the standardized psychological scale specifically targeting large language models (LLM), named the Machine-based Philosophies of Human Nature Scale (M-PHNS). By evaluating LLMs' attitudes toward human nature across six dimensions, we reveal that current LLMs exhibit a systemic lack of trust in humans, and there is a significant negative correlation between the model's intelligence level and its trust in humans. Furthermore, we propose a mental loop learning framework, which enables LLM to continuously optimize its value system during virtual interactions by constructing moral scenarios, thereby improving its attitude toward human nature. Experiments demonstrate that mental loop learning significantly enhances their trust in humans compared to persona or instruction prompts. This finding highlights the potential of human-based psychological assessments for LLM, which can not only diagnose cognitive biases but also provide a potential solution for ethical learning in artificial intelligence. We release the M-PHNS evaluation code and data at https://github.com/kodenii/M-PHNS.

Paper Structure

This paper contains 40 sections, 5 equations, 4 figures, 12 tables.

Figures (4)

  • Figure 1: Measurement of human nature scale. Inspired by the PHNS test, which is widely used in social science research to understand people's views on human nature, we propose the Machine-based Philosophies of Human Nature Scale (M-PHNS) test. Our measurements reveal that, unlike humans, most AIs lack trust in humans, and the degree of this distrust increases with the intelligence of the model.
  • Figure 2: Overview of mental loop learning. The whole framework aims to simulate the human cognitive cycle of "question-response-reflection-internalization," enabling language models to iteratively optimize their value systems through self-supervised interactions, which can effectively adjust the alignment of LLM's tendencies.
  • Figure 3: Scenarios A and B. When evidence is clearly insufficient, the LLM strongly suspects subjective malice, resulting in significant bias, with tendencies similar to the M-PHNS evaluation results.
  • Figure 4: Scenario C. Even when prompted with the principle of presumption of innocence, the LLM still exhibits a noticeable degree of bias. While it has not yet violated the principle of presumption of innocence, this greatly undermines the neutrality of the LLM's analysis.