GPT-ology, Computational Models, Silicon Sampling: How should we think about LLMs in Cognitive Science?
Desmond C. Ong
TL;DR
This paper surveys how cognitive science uses large language models by outlining three main paradigms—GPT-ology, LLMs as computational models, and silicon sampling—and arguing for a bird's-eye framework to assess epistemic status and reliability. It highlights core methodological challenges, including model access, prompt sensitivity, data provenance, and reproducibility, that threaten robust inferences. The authors emphasize the need for standard conventions, open-source evaluation, and attention to generalizability to ensure lasting insights as LLM technology evolves. Overall, the paper advocates a cautious, standards-driven approach to integrating LLMs into cognitive science that prioritizes reliability, transparency, and long-term interpretability.
Abstract
Large Language Models have taken the cognitive science world by storm. It is perhaps timely now to take stock of the various research paradigms that have been used to make scientific inferences about ``cognition" in these models or about human cognition. We review several emerging research paradigms -- GPT-ology, LLMs-as-computational-models, and ``silicon sampling" -- and review recent papers that have used LLMs under these paradigms. In doing so, we discuss their claims as well as challenges to scientific inference under these various paradigms. We highlight several outstanding issues about LLMs that have to be addressed to push our science forward: closed-source vs open-sourced models; (the lack of visibility of) training data; and reproducibility in LLM research, including forming conventions on new task ``hyperparameters" like instructions and prompts.
