LLMs Reproduce Human Purchase Intent via Semantic Similarity Elicitation of Likert Ratings
Benjamin F. Maier, Ulf Aslak, Luca Fiaschi, Nina Rismal, Kemble Fletcher, Christian C. Luhmann, Robbie Dow, Kli Pappas, Thomas V. Wiecki
TL;DR
This paper tackles the high cost and biases of traditional consumer panels by introducing semantic similarity rating (SSR), which prompts LLMs to produce free-text purchase-intent statements that are mapped to a 5-point Likert scale via embedding similarity to anchor statements. Across 57 personal-care product surveys (N=9,300 real respondents), SSR recovers about 90% of the maximum achievable correlation with human data and achieves realistic distribution similarity (KS similarity > 0.85), while also yielding rich qualitative rationales. Importantly, SSR does not require training data or fine-tuning, making it a scalable, interpretable plug-in for concept testing that preserves traditional survey metrics but with the added benefit of qualitative insights and a broader distribution of responses. The approach demonstrates substantial potential to augment or accelerate early-stage product research, with caveats around anchor design, demographic coverage, and domain knowledge encoded in the LLMs.
Abstract
Consumer research costs companies billions annually yet suffers from panel biases and limited scale. Large language models (LLMs) offer an alternative by simulating synthetic consumers, but produce unrealistic response distributions when asked directly for numerical ratings. We present semantic similarity rating (SSR), a method that elicits textual responses from LLMs and maps these to Likert distributions using embedding similarity to reference statements. Testing on an extensive dataset comprising 57 personal care product surveys conducted by a leading corporation in that market (9,300 human responses), SSR achieves 90% of human test-retest reliability while maintaining realistic response distributions (KS similarity > 0.85). Additionally, these synthetic respondents provide rich qualitative feedback explaining their ratings. This framework enables scalable consumer research simulations while preserving traditional survey metrics and interpretability.
