Large Language Model-Informed Feature Discovery Improves Prediction and Interpretation of Credibility Perceptions of Visual Content
Yilang Peng, Sijia Qian, Yingdan Lu, Cuihua Shen
TL;DR
Predicting perceived credibility of visual content and identifying driving features is addressed by an LLM-informed feature discovery workflow that uses GPT-4o to reason about visuals and captions, extract interpretable credibility-related features via targeted prompts, and incorporate these features into predictive models. The approach achieves a $r=0.76$ and $R^2=0.58$ with $MSE=0.28$, representing a $13\%$ relative improvement in $R^2$ over zero-shot predictions, and uncovers key cues such as information concreteness, caption-image alignment, and image format. SHAP analysis reveals GPT-rated credibility as a strong predictor but also highlights post- and image-level features that drive human judgments, providing interpretable insights into what makes visual content appear credible. Overall, the framework demonstrates a scalable, interpretable use of multimodal LLMs for social science, with practical implications for misinformation mitigation and visual credibility assessment.
Abstract
In today's visually dominated social media landscape, predicting the perceived credibility of visual content and understanding what drives human judgment are crucial for countering misinformation. However, these tasks are challenging due to the diversity and richness of visual features. We introduce a Large Language Model (LLM)-informed feature discovery framework that leverages multimodal LLMs, such as GPT-4o, to evaluate content credibility and explain its reasoning. We extract and quantify interpretable features using targeted prompts and integrate them into machine learning models to improve credibility predictions. We tested this approach on 4,191 visual social media posts across eight topics in science, health, and politics, using credibility ratings from 5,355 crowdsourced workers. Our method outperformed zero-shot GPT-based predictions by 13 percent in R2, and revealed key features like information concreteness and image format. We discuss the implications for misinformation mitigation, visual credibility, and the role of LLMs in social science.
