Do LLMs Find Human Answers To Fact-Driven Questions Perplexing? A Case Study on Reddit
Parker Seegmiller, Joseph Gatto, Omar Sharif, Madhusudan Basak, Sarah Masud Preum
TL;DR
The paper investigates whether large language models (LLMs) can model the breadth of human, fact-driven answers posted on social media, focusing on Reddit's r/AskTopic communities. It builds a dataset of 409 fact-driven questions and 7,534 answers from 15 subreddits, and evaluates a 1.3B parameter version of Sheared LLaMA under out-of-the-box (SL) and fine-tuned (SLFT) regimes against human ratings using perplexity as the evaluation metric. The study finds that LLM perplexity correlates with human preference, with lower perplexities for highly-rated answers, and that fine-tuning further improves alignment across topics, though notable outliers reveal limitations and blind spots. These results provide a data-rich framework for probing socio-technical aspects of LLM behavior in online discourse and offer a dataset to spur future social science and NLP research. The work also highlights directions for targeted fine-tuning and cross-domain extension to better mirror the diversity of human responses on social platforms.
Abstract
Large language models (LLMs) have been shown to be proficient in correctly answering questions in the context of online discourse. However, the study of using LLMs to model human-like answers to fact-driven social media questions is still under-explored. In this work, we investigate how LLMs model the wide variety of human answers to fact-driven questions posed on several topic-specific Reddit communities, or subreddits. We collect and release a dataset of 409 fact-driven questions and 7,534 diverse, human-rated answers from 15 r/Ask{Topic} communities across 3 categories: profession, social identity, and geographic location. We find that LLMs are considerably better at modeling highly-rated human answers to such questions, as opposed to poorly-rated human answers. We present several directions for future research based on our initial findings.
