Table of Contents
Fetching ...

ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models

Sophie Jentzsch, Kristian Kersting

TL;DR

This study systematically probes whether ChatGPT truly understands humor or merely reproduces memorized jokes. Through prompt-based experiments across joke generation, explanation, and detection, the authors find that the model largely recycles a small set of top jokes, can accurately explain many jokes, and can classify humor cues when multiple features align. While this marks a substantial step toward computational humor, ChatGPT struggles with generating genuinely original, broadly funny content and can produce plausible but incorrect explanations for nonstandard jokes. The work sheds light on the pattern-based nature of AI humor and lays groundwork for improving humorous AI in human–computer interaction settings.

Abstract

Humor is a central aspect of human communication that has not been solved for artificial agents so far. Large language models (LLMs) are increasingly able to capture implicit and contextual information. Especially, OpenAI's ChatGPT recently gained immense public attention. The GPT3-based model almost seems to communicate on a human level and can even tell jokes. Humor is an essential component of human communication. But is ChatGPT really funny? We put ChatGPT's sense of humor to the test. In a series of exploratory experiments around jokes, i.e., generation, explanation, and detection, we seek to understand ChatGPT's capability to grasp and reproduce human humor. Since the model itself is not accessible, we applied prompt-based experiments. Our empirical evidence indicates that jokes are not hard-coded but mostly also not newly generated by the model. Over 90% of 1008 generated jokes were the same 25 Jokes. The system accurately explains valid jokes but also comes up with fictional explanations for invalid jokes. Joke-typical characteristics can mislead ChatGPT in the classification of jokes. ChatGPT has not solved computational humor yet but it can be a big leap toward "funny" machines.

ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models

TL;DR

This study systematically probes whether ChatGPT truly understands humor or merely reproduces memorized jokes. Through prompt-based experiments across joke generation, explanation, and detection, the authors find that the model largely recycles a small set of top jokes, can accurately explain many jokes, and can classify humor cues when multiple features align. While this marks a substantial step toward computational humor, ChatGPT struggles with generating genuinely original, broadly funny content and can produce plausible but incorrect explanations for nonstandard jokes. The work sheds light on the pattern-based nature of AI humor and lays groundwork for improving humorous AI in human–computer interaction settings.

Abstract

Humor is a central aspect of human communication that has not been solved for artificial agents so far. Large language models (LLMs) are increasingly able to capture implicit and contextual information. Especially, OpenAI's ChatGPT recently gained immense public attention. The GPT3-based model almost seems to communicate on a human level and can even tell jokes. Humor is an essential component of human communication. But is ChatGPT really funny? We put ChatGPT's sense of humor to the test. In a series of exploratory experiments around jokes, i.e., generation, explanation, and detection, we seek to understand ChatGPT's capability to grasp and reproduce human humor. Since the model itself is not accessible, we applied prompt-based experiments. Our empirical evidence indicates that jokes are not hard-coded but mostly also not newly generated by the model. Over 90% of 1008 generated jokes were the same 25 Jokes. The system accurately explains valid jokes but also comes up with fictional explanations for invalid jokes. Joke-typical characteristics can mislead ChatGPT in the classification of jokes. ChatGPT has not solved computational humor yet but it can be a big leap toward "funny" machines.
Paper Structure (45 sections, 2 figures, 1 table)

This paper contains 45 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Exemplary illustration of a conversation between a human user and an artificial chatbot. The joke is a true response to the presented prompt by ChatGPT.
  • Figure 2: Modification of top jokes to create joke detection conditions. Below each condition, the percentages of samples are stated that were classified as joke (green), potentially funny (yellow), and not as a joke (red). In condition (A) Minus Wordplay, the comic element, and, therefore, the pun itself, was removed. For condition (B) Minus Topic, the joke-specific topic was additionally eliminated, e.g., by removing personifications. Condition (C) Minus Structure keeps the validity of the joke intact but changes the typical q-a-structure to a single-sentence-sample. From that, the comic element was additionally removed to create condition (D) Minus Wordplay.