Table of Contents
Fetching ...

A Linguistic Comparison between Human and ChatGPT-Generated Conversations

Morgan Sandler, Hyesun Choung, Arun Ross, Prabu David

TL;DR

This paper addresses how linguistic patterns differ between human and LLM-generated dialogues, focusing on ChatGPT-3.5 versus the EmpathicDialogues corpus. It employs LIWC analysis across 118 categories on a large GPT-generated companion dataset (2GPTEmpathicDialogues) to quantify differences in social processing, cognition, and emotion, and it augments this with valence classification on embeddings using SVM, RF, and MLP alongside UMAP visualizations. A key contribution is the 2GPTEmpathicDialogues dataset, which mirrors human dialogues and enables rigorous cross-domain comparisons in language modeling. The findings indicate humans are more variable and authentic, while GPT-3.5 demonstrates stronger performance in social behaviors, analytical thinking, cognition, and attentional focus, with GPT embeddings showing latent affect cues; these results bear on AI-human interaction design and the ongoing effort to detect AI-generated text, though limitations such as model version and data-generation discrepancies warrant further study.

Abstract

This study explores linguistic differences between human and LLM-generated dialogues, using 19.5K dialogues generated by ChatGPT-3.5 as a companion to the EmpathicDialogues dataset. The research employs Linguistic Inquiry and Word Count (LIWC) analysis, comparing ChatGPT-generated conversations with human conversations across 118 linguistic categories. Results show greater variability and authenticity in human dialogues, but ChatGPT excels in categories such as social processes, analytical style, cognition, attentional focus, and positive emotional tone, reinforcing recent findings of LLMs being "more human than human." However, no significant difference was found in positive or negative affect between ChatGPT and human dialogues. Classifier analysis of dialogue embeddings indicates implicit coding of the valence of affect despite no explicit mention of affect in the conversations. The research also contributes a novel, companion ChatGPT-generated dataset of conversations between two independent chatbots, which were designed to replicate a corpus of human conversations available for open access and used widely in AI research on language modeling. Our findings enhance understanding of ChatGPT's linguistic capabilities and inform ongoing efforts to distinguish between human and LLM-generated text, which is critical in detecting AI-generated fakes, misinformation, and disinformation.

A Linguistic Comparison between Human and ChatGPT-Generated Conversations

TL;DR

This paper addresses how linguistic patterns differ between human and LLM-generated dialogues, focusing on ChatGPT-3.5 versus the EmpathicDialogues corpus. It employs LIWC analysis across 118 categories on a large GPT-generated companion dataset (2GPTEmpathicDialogues) to quantify differences in social processing, cognition, and emotion, and it augments this with valence classification on embeddings using SVM, RF, and MLP alongside UMAP visualizations. A key contribution is the 2GPTEmpathicDialogues dataset, which mirrors human dialogues and enables rigorous cross-domain comparisons in language modeling. The findings indicate humans are more variable and authentic, while GPT-3.5 demonstrates stronger performance in social behaviors, analytical thinking, cognition, and attentional focus, with GPT embeddings showing latent affect cues; these results bear on AI-human interaction design and the ongoing effort to detect AI-generated text, though limitations such as model version and data-generation discrepancies warrant further study.

Abstract

This study explores linguistic differences between human and LLM-generated dialogues, using 19.5K dialogues generated by ChatGPT-3.5 as a companion to the EmpathicDialogues dataset. The research employs Linguistic Inquiry and Word Count (LIWC) analysis, comparing ChatGPT-generated conversations with human conversations across 118 linguistic categories. Results show greater variability and authenticity in human dialogues, but ChatGPT excels in categories such as social processes, analytical style, cognition, attentional focus, and positive emotional tone, reinforcing recent findings of LLMs being "more human than human." However, no significant difference was found in positive or negative affect between ChatGPT and human dialogues. Classifier analysis of dialogue embeddings indicates implicit coding of the valence of affect despite no explicit mention of affect in the conversations. The research also contributes a novel, companion ChatGPT-generated dataset of conversations between two independent chatbots, which were designed to replicate a corpus of human conversations available for open access and used widely in AI research on language modeling. Our findings enhance understanding of ChatGPT's linguistic capabilities and inform ongoing efforts to distinguish between human and LLM-generated text, which is critical in detecting AI-generated fakes, misinformation, and disinformation.
Paper Structure (6 sections, 2 figures, 5 tables)

This paper contains 6 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Framework for generating the 2GPTEmpathicDialogues dataset, along with the prompts used. In this setup, two instances of the ChatGPT-3.5-Turbo API engage in conversation via a coordinating program. We observed instances of role confusion during some conversation turns, where the receiving speaker responded as though they were the initiating speaker, and vice versa. To address this issue, we modified the prompts to include the instruction, "Do not generate responses for the listener/speaker." However, this adjustment was not entirely effective due to the stochastic nature of LLMs, a point that is further discussed in the limitations section.
  • Figure 2: 3-D UMAP visualizations of human- and ChatGPT-generated dialogues. The dialogues are color-coded by positive or negative valence values, determined by each dialogue's underlying emotion category.