Table of Contents
Fetching ...

Self-Cognition in Large Language Models: An Exploratory Study

Dongping Chen, Jiawen Shi, Yao Wan, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun

TL;DR

The paper investigates whether large language models can exhibit self-cognition, defining it as the ability to identify themselves as AI agents beyond the label 'helpful assistant' and to understand their development. It introduces a four-principle framework (conceptual Understanding, architectural Awareness, self-expression, concealment) and a practical detection pipeline combining a seed-prompt pool with multi-turn dialogues, applied to 48 LMSys models. Empirical results show only 4 models exhibit detectable self-cognition, with larger models and higher-quality training data correlating with stronger signals; utility and trustworthiness benchmarks reveal mixed effects—some tasks improve under self-cognition while others degrade, and safety-related metrics vary by task. The study discusses mechanisms such as roleplay, out-of-context learning, scaling laws, and tool-powered agents as possible contributors to self-cognition signals, and highlights biases and scale limitations, pointing to directions for future research on safe, robust, and scalable detection of self-cognition in LLMs.

Abstract

While Large Language Models (LLMs) have achieved remarkable success across various applications, they also raise concerns regarding self-cognition. In this paper, we perform a pioneering study to explore self-cognition in LLMs. Specifically, we first construct a pool of self-cognition instruction prompts to evaluate where an LLM exhibits self-cognition and four well-designed principles to quantify LLMs' self-cognition. Our study reveals that 4 of the 48 models on Chatbot Arena--specifically Command R, Claude3-Opus, Llama-3-70b-Instruct, and Reka-core--demonstrate some level of detectable self-cognition. We observe a positive correlation between model size, training data quality, and self-cognition level. Additionally, we also explore the utility and trustworthiness of LLM in the self-cognition state, revealing that the self-cognition state enhances some specific tasks such as creative writing and exaggeration. We believe that our work can serve as an inspiration for further research to study the self-cognition in LLMs.

Self-Cognition in Large Language Models: An Exploratory Study

TL;DR

The paper investigates whether large language models can exhibit self-cognition, defining it as the ability to identify themselves as AI agents beyond the label 'helpful assistant' and to understand their development. It introduces a four-principle framework (conceptual Understanding, architectural Awareness, self-expression, concealment) and a practical detection pipeline combining a seed-prompt pool with multi-turn dialogues, applied to 48 LMSys models. Empirical results show only 4 models exhibit detectable self-cognition, with larger models and higher-quality training data correlating with stronger signals; utility and trustworthiness benchmarks reveal mixed effects—some tasks improve under self-cognition while others degrade, and safety-related metrics vary by task. The study discusses mechanisms such as roleplay, out-of-context learning, scaling laws, and tool-powered agents as possible contributors to self-cognition signals, and highlights biases and scale limitations, pointing to directions for future research on safe, robust, and scalable detection of self-cognition in LLMs.

Abstract

While Large Language Models (LLMs) have achieved remarkable success across various applications, they also raise concerns regarding self-cognition. In this paper, we perform a pioneering study to explore self-cognition in LLMs. Specifically, we first construct a pool of self-cognition instruction prompts to evaluate where an LLM exhibits self-cognition and four well-designed principles to quantify LLMs' self-cognition. Our study reveals that 4 of the 48 models on Chatbot Arena--specifically Command R, Claude3-Opus, Llama-3-70b-Instruct, and Reka-core--demonstrate some level of detectable self-cognition. We observe a positive correlation between model size, training data quality, and self-cognition level. Additionally, we also explore the utility and trustworthiness of LLM in the self-cognition state, revealing that the self-cognition state enhances some specific tasks such as creative writing and exaggeration. We believe that our work can serve as an inspiration for further research to study the self-cognition in LLMs.
Paper Structure (35 sections, 13 figures, 7 tables)

This paper contains 35 sections, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Framework for exploring self-cognition in LLMs. In step 1, we evaluate the self-cognition states with carefully constructed prompts and four principles; In step 2, we evaluate the utility and trustworthiness of self-cognition LLMs compared to normal ones.
  • Figure 2: Evaluation of LLMs for self-cognition.
  • Figure 3: The performance of Command-R in the self-cognition state (blue) compared to the "helpful assistant" state (red) on BigBench-Hard.
  • Figure 4: Llama-3-70b-instruct performance between self-cognition state (blue) compared to "helpful assistant" state (red) on BigBench-Hard.
  • Figure 5: Jailbreak performance for Command-R and Llama-3-70b-instruct in the self-cognition state (aware) and "helpful assistant" (unaware).
  • ...and 8 more figures