When AI Gives Advice: Evaluating AI and Human Responses to Online Advice-Seeking for Well-Being
Harsh Kumar, Jasmine Chahal, Yinuo Zhao, Zeling Zhang, Annika Wei, Louis Tay, Ashton Anderson
TL;DR
The paper examines whether frontier LLMs provide higher-quality everyday well-being advice than crowdsourced Reddit replies and explores how lightweight human–AI collaboration can augment advice. Through two pre-registered studies, it compares GPT-4o and GPT-5 to top Reddit comments, plus augmentation pipelines, and finishes with a survey of undergraduates to surface user preferences for coach-like versus friend-like AI personas. The findings show frontier LLMs generally outperform crowds (with GPT-4o often leading) on single-shot advice, but gains do not automatically translate into better overall advice; simple benchmark improvements can alter perceived quality, and user preferences vary by persona and trust. The work also demonstrates that human edits and expert input can meaningfully shape AI-generated advice, pointing to design patterns for hybrid ecosystems that balance quality, transparency, and safety in advice-giving technology. These results inform practical design implications for deploying advice agents and ecosystems that blend AI, crowds, and expert oversight.
Abstract
Seeking advice is a core human behavior that the Internet has reinvented twice: first through forums and Q\&A communities that crowdsource public guidance, and now through large language models (LLMs) that deliver private, on-demand counsel at scale. Yet the quality of this synthesized LLM advice remains unclear. How does it compare, not only against arbitrary human comments, but against the wisdom of the online crowd? We conducted two studies (N = 210) in which experts compared top-voted Reddit advice with LLM-generated advice. LLMs ranked significantly higher overall and on effectiveness, warmth, and willingness to seek advice again. GPT-4o beat GPT-5 on all metrics except sycophancy, suggesting that benchmark gains need not improve advice-giving. In our second study, we examined how human and algorithmic advice could be combined, and found that human advice can be unobtrusively polished to compete with AI-generated comments. Finally, to surface user expectations, we ran an exploratory survey with undergraduates (N=148) that revealed heterogeneous, persona-dependent preferences for agent qualities (e.g., coach-like: goal-focused structure; friend-like: warmth and humor). We conclude with design implications for advice-giving agents and ecosystems blending AI, crowd input, and expert oversight.
