Table of Contents
Fetching ...

What Makes Programmers Laugh? Exploring the Submissions of the Subreddit r/ProgrammerHumor

Miikka Kuutila, Leevi Rantala, Junhao Li, Simo Hosio, Mika Mäntylä

TL;DR

This study investigates humor in programming through Reddit’s r/ProgrammerHumor, leveraging a large multimodal dataset of text and memes with OCR-derived captions. Using a combination of bag-of-words Lasso regression, LDA-based topic modeling, and human labeling, the authors quantify how word usage, topics, and humor-theory annotations relate to upvote-derived humor scores. They find that humor is difficult to predict (max $R^2$ around 0.10 for words and 0.081 for topics), though certain patterns emerge: image-based memes, winter-time posts, weekend timing around 2–3 pm UTC, and topics like Learning, Writing Code, and Fixing Bugs tend to yield higher scores. The work provides a replication package and highlights practical implications for understanding programmer mood and engagement, while noting limitations and avenues for more powerful, yet interpretable, NLP methods in future research.

Abstract

Background: Humor is a fundamental part of human communication, with prior work linking positive humor in the workplace to positive outcomes, such as improved performance and job satisfaction. Aims: This study aims to investigate programming-related humor in a large social media community. Methodology: We collected 139,718 submissions from Reddit subreddit r/ProgrammerHumor. Both textual and image-based (memes) submissions were considered. The image data was processed with OCR to extract text from images for NLP analysis. Multiple regression models were built to investigate what makes submissions humorous. Additionally, a random sample of 800 submissions was labeled by human annotators regarding their relation to theories of humor, suitability for the workplace, the need for programming knowledge to understand the submission, and whether images in image-based submissions added context to the submission. Results: Our results indicate that predicting the humor of software developers is difficult. Our best regression model was able to explain only 10% of the variance. However, statistically significant differences were observed between topics, submission times, and associated humor theories. Our analysis reveals that the highest submission scores are achieved by imagebased submissions that are created during the winter months in the northern hemisphere, between 2-3pm UTC on weekends, which are distinctly related to superiority and incongruity theories of humor, and are about the topic of "Learning". Conclusions: Predicting humor with natural language processing methods is challenging. We discuss the benefits and inherent difficulties in assessing perceived humor of submissions, as well as possible avenues for future work. Additionally, our replication package should help future studies and can act as a joke repository for the software industry and education.

What Makes Programmers Laugh? Exploring the Submissions of the Subreddit r/ProgrammerHumor

TL;DR

This study investigates humor in programming through Reddit’s r/ProgrammerHumor, leveraging a large multimodal dataset of text and memes with OCR-derived captions. Using a combination of bag-of-words Lasso regression, LDA-based topic modeling, and human labeling, the authors quantify how word usage, topics, and humor-theory annotations relate to upvote-derived humor scores. They find that humor is difficult to predict (max around 0.10 for words and 0.081 for topics), though certain patterns emerge: image-based memes, winter-time posts, weekend timing around 2–3 pm UTC, and topics like Learning, Writing Code, and Fixing Bugs tend to yield higher scores. The work provides a replication package and highlights practical implications for understanding programmer mood and engagement, while noting limitations and avenues for more powerful, yet interpretable, NLP methods in future research.

Abstract

Background: Humor is a fundamental part of human communication, with prior work linking positive humor in the workplace to positive outcomes, such as improved performance and job satisfaction. Aims: This study aims to investigate programming-related humor in a large social media community. Methodology: We collected 139,718 submissions from Reddit subreddit r/ProgrammerHumor. Both textual and image-based (memes) submissions were considered. The image data was processed with OCR to extract text from images for NLP analysis. Multiple regression models were built to investigate what makes submissions humorous. Additionally, a random sample of 800 submissions was labeled by human annotators regarding their relation to theories of humor, suitability for the workplace, the need for programming knowledge to understand the submission, and whether images in image-based submissions added context to the submission. Results: Our results indicate that predicting the humor of software developers is difficult. Our best regression model was able to explain only 10% of the variance. However, statistically significant differences were observed between topics, submission times, and associated humor theories. Our analysis reveals that the highest submission scores are achieved by imagebased submissions that are created during the winter months in the northern hemisphere, between 2-3pm UTC on weekends, which are distinctly related to superiority and incongruity theories of humor, and are about the topic of "Learning". Conclusions: Predicting humor with natural language processing methods is challenging. We discuss the benefits and inherent difficulties in assessing perceived humor of submissions, as well as possible avenues for future work. Additionally, our replication package should help future studies and can act as a joke repository for the software industry and education.

Paper Structure

This paper contains 29 sections, 7 tables.