Table of Contents
Fetching ...

I'm Spartacus, No, I'm Spartacus: Measuring and Understanding LLM Identity Confusion

Kun Li, Shichao Zhuang, Yue Zhang, Minghui Xu, Ruoxi Wang, Kaidi Xu, Xinwen Fu, Xiuzhen Cheng

TL;DR

An automated tool combining documentation analysis, self-identity recognition testing, and output similarity comparisons, and a structured survey to assess the impact of identity confusion on user trust highlighted that identity confusion significantly erodes trust, particularly in critical tasks like education and professional use.

Abstract

Large Language Models (LLMs) excel in diverse tasks such as text generation, data analysis, and software development, making them indispensable across domains like education, business, and creative industries. However, the rapid proliferation of LLMs (with over 560 companies developing or deploying them as of 2024) has raised concerns about their originality and trustworthiness. A notable issue, termed identity confusion, has emerged, where LLMs misrepresent their origins or identities. This study systematically examines identity confusion through three research questions: (1) How prevalent is identity confusion among LLMs? (2) Does it arise from model reuse, plagiarism, or hallucination? (3) What are the security and trust-related impacts of identity confusion? To address these, we developed an automated tool combining documentation analysis, self-identity recognition testing, and output similarity comparisons--established methods for LLM fingerprinting--and conducted a structured survey via Credamo to assess its impact on user trust. Our analysis of 27 LLMs revealed that 25.93% exhibit identity confusion. Output similarity analysis confirmed that these issues stem from hallucinations rather than replication or reuse. Survey results further highlighted that identity confusion significantly erodes trust, particularly in critical tasks like education and professional use, with declines exceeding those caused by logical errors or inconsistencies. Users attributed these failures to design flaws, incorrect training data, and perceived plagiarism, underscoring the systemic risks posed by identity confusion to LLM reliability and trustworthiness.

I'm Spartacus, No, I'm Spartacus: Measuring and Understanding LLM Identity Confusion

TL;DR

An automated tool combining documentation analysis, self-identity recognition testing, and output similarity comparisons, and a structured survey to assess the impact of identity confusion on user trust highlighted that identity confusion significantly erodes trust, particularly in critical tasks like education and professional use.

Abstract

Large Language Models (LLMs) excel in diverse tasks such as text generation, data analysis, and software development, making them indispensable across domains like education, business, and creative industries. However, the rapid proliferation of LLMs (with over 560 companies developing or deploying them as of 2024) has raised concerns about their originality and trustworthiness. A notable issue, termed identity confusion, has emerged, where LLMs misrepresent their origins or identities. This study systematically examines identity confusion through three research questions: (1) How prevalent is identity confusion among LLMs? (2) Does it arise from model reuse, plagiarism, or hallucination? (3) What are the security and trust-related impacts of identity confusion? To address these, we developed an automated tool combining documentation analysis, self-identity recognition testing, and output similarity comparisons--established methods for LLM fingerprinting--and conducted a structured survey via Credamo to assess its impact on user trust. Our analysis of 27 LLMs revealed that 25.93% exhibit identity confusion. Output similarity analysis confirmed that these issues stem from hallucinations rather than replication or reuse. Survey results further highlighted that identity confusion significantly erodes trust, particularly in critical tasks like education and professional use, with declines exceeding those caused by logical errors or inconsistencies. Users attributed these failures to design flaws, incorrect training data, and perceived plagiarism, underscoring the systemic risks posed by identity confusion to LLM reliability and trustworthiness.

Paper Structure

This paper contains 20 sections, 1 equation, 8 figures, 7 tables.

Figures (8)

  • Figure 1: The design of our measurement study.
  • Figure 2: Taxonomy of LLMs based on their architecture and dataset
  • Figure 3: Types of identity confusion in our experiment.
  • Figure 4: The heatmap illustrates the output similarity across different LLMs, with darker colors indicating higher levels of similarity. Please note that Hailuo AI was excluded from our evaluation due to the lack of an API for testing.
  • Figure 5: Radar chart of model pair output similarities across dataset subsets
  • ...and 3 more figures