Table of Contents
Fetching ...

Idiosyncrasies in Large Language Models

Mingjie Sun, Yida Yin, Zhiqiu Xu, J. Zico Kolter, Zhuang Liu

TL;DR

This work investigates the provenance of AI-generated text by demonstrating that outputs from different large language models (LLMs) carry distinctive, model-specific signatures. It introduces a synthetic N-way classification framework, using text embeddings (notably LLM2vec with LoRA) to identify the source LLM from generated text across chat, instruct, and base families, achieving high accuracy (up to 97.1%) and robust generalization, even under rewrites and semantic transformations. Analyses reveal that idiosyncrasies arise from word-level distributions, markdown formatting, and semantic content, with semantics playing a growing role when text is transformed. The study discusses implications for synthetic-data training, model-similarity estimation, and robust evaluation pipelines, including potential risks to leaderboard integrity and model-provenance efforts.

Abstract

In this work, we unveil and study idiosyncrasies in Large Language Models (LLMs) -- unique patterns in their outputs that can be used to distinguish the models. To do so, we consider a simple classification task: given a particular text output, the objective is to predict the source LLM that generates the text. We evaluate this synthetic task across various groups of LLMs and find that simply fine-tuning text embedding models on LLM-generated texts yields excellent classification accuracy. Notably, we achieve 97.1% accuracy on held-out validation data in the five-way classification problem involving ChatGPT, Claude, Grok, Gemini, and DeepSeek. Our further investigation reveals that these idiosyncrasies are rooted in word-level distributions. These patterns persist even when the texts are rewritten, translated, or summarized by an external LLM, suggesting that they are also encoded in the semantic content. Additionally, we leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies. Finally, we discuss the broader implications of our findings, including training on synthetic data, inferring model similarity, and robust evaluation of LLMs. Code is available at https://github.com/locuslab/llm-idiosyncrasies.

Idiosyncrasies in Large Language Models

TL;DR

This work investigates the provenance of AI-generated text by demonstrating that outputs from different large language models (LLMs) carry distinctive, model-specific signatures. It introduces a synthetic N-way classification framework, using text embeddings (notably LLM2vec with LoRA) to identify the source LLM from generated text across chat, instruct, and base families, achieving high accuracy (up to 97.1%) and robust generalization, even under rewrites and semantic transformations. Analyses reveal that idiosyncrasies arise from word-level distributions, markdown formatting, and semantic content, with semantics playing a growing role when text is transformed. The study discusses implications for synthetic-data training, model-similarity estimation, and robust evaluation pipelines, including potential risks to leaderboard integrity and model-provenance efforts.

Abstract

In this work, we unveil and study idiosyncrasies in Large Language Models (LLMs) -- unique patterns in their outputs that can be used to distinguish the models. To do so, we consider a simple classification task: given a particular text output, the objective is to predict the source LLM that generates the text. We evaluate this synthetic task across various groups of LLMs and find that simply fine-tuning text embedding models on LLM-generated texts yields excellent classification accuracy. Notably, we achieve 97.1% accuracy on held-out validation data in the five-way classification problem involving ChatGPT, Claude, Grok, Gemini, and DeepSeek. Our further investigation reveals that these idiosyncrasies are rooted in word-level distributions. These patterns persist even when the texts are rewritten, translated, or summarized by an external LLM, suggesting that they are also encoded in the semantic content. Additionally, we leverage LLM as judges to generate detailed, open-ended descriptions of each model's idiosyncrasies. Finally, we discuss the broader implications of our findings, including training on synthetic data, inferring model similarity, and robust evaluation of LLMs. Code is available at https://github.com/locuslab/llm-idiosyncrasies.

Paper Structure

This paper contains 23 sections, 18 figures, 30 tables.

Figures (18)

  • Figure 1: Our framework for studying idiosyncrasies in Large Language Models (LLMs). We show that each LLM is unique in its expression. In the example shown here on ChatGPT, Claude, Grok, Gemini, and DeepSeek, a neural network classifier is able to distinguish them with a near-perfect 97.1% accuracy.
  • Figure 2: Ablations on input length of text embedding models. Classification accuracies improve as the text embedding models capture more context. Performance begins to saturate beyond an input sequence length of 256. Note that the three lines represent different groups of LLMs and are not directly comparable.
  • Figure 3: Different numbers of training samples. Our sequence classifiers benefit from more training samples. The classification performance converges when using about 10K training samples.
  • Figure 4: Example responses from ChatGPT and Claude, showcasing their idiosyncrasies: characteristic phrases (left) and unique markdown formatting (right). For clarity, we highlight each characteristic phrase with underline and model-specific color.
  • Figure 5: Frequencies of words and letters. The top 20 most frequently used words of LLMs (left) exhibit distinct patterns for each model, but their letter frequencies (right) are very similar. Results are on the chat API models.
  • ...and 13 more figures