Can Large Language Models generalize analogy solving like children can?

Claire E. Stevenson; Alexandra Pafford; Han L. J. van der Maas; Melanie Mitchell

Can Large Language Models generalize analogy solving like children can?

Claire E. Stevenson, Alexandra Pafford, Han L. J. van der Maas, Melanie Mitchell

TL;DR

The study investigates whether large language models (LLMs) can generalize letter-string analogies to near (Greek) and far (Symbol) domains as humans do. By testing 42 children, 62 adults, and multiple LLMs on a controlled set of Latin, Greek, and Symbol alphabet problems with varied prompts and item variants, the authors show humans generalize well across domains, whereas LLMs exhibit robust degradation in near/far transfers. Analyses reveal that LLMs struggle with multi-rule generalization (e.g., predecessor and second successor) and rely less on flexible, on-the-fly alphabet representations, unlike children. The results suggest current LLMs lack robust, human-like analogical transfer and highlight scaling and representation gaps, with implications for evaluating AI generalization beyond surface similarity.

Abstract

In people, the ability to solve analogies such as "body : feet :: table : ?" emerges in childhood, and appears to transfer easily to other domains, such as the visual domain "( : ) :: < : ?". Recent research shows that large language models (LLMs) can solve various forms of analogies. However, can LLMs generalize analogy solving to new domains like people can? To investigate this, we had children, adults, and LLMs solve a series of letter-string analogies (e.g., a b : a c :: j k : ?) in the Latin alphabet, in a near transfer domain (Greek alphabet), and a far transfer domain (list of symbols). Children and adults easily generalized their knowledge to unfamiliar domains, whereas LLMs did not. This key difference between human and AI performance is evidence that these LLMs still struggle with robust human-like analogical transfer.

Can Large Language Models generalize analogy solving like children can?

TL;DR

Abstract

Can Large Language Models generalize analogy solving like children can?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)