Sequential Enumeration in Large Language Models

Kuinan Hou; Marco Zorzi; Alberto Testolin

Sequential Enumeration in Large Language Models

Kuinan Hou, Marco Zorzi, Alberto Testolin

TL;DR

The paper investigates whether state-of-the-art LLMs can perform exact sequential enumeration, using counting (naming) and production tasks across letters and five-letter words with homogeneous and heterogeneous stimuli. It examines multiple prompting strategies (explicit, spontaneous, mental, forbid) and analyzes internal representations via PCA on last-layer embeddings in a large model (Llama-70B), linking behavior to latent dynamics. Key findings show counting is reliable mainly when explicitly prompted, while spontaneous counting is rare; mental counting reveals counter-like internal dynamics, whereas explicit counting relies on surface token strategies. The results reveal a persistent gap between neural and symbolic approaches to numeracy in LLMs and highlight the need for grounding numerosity or developing architectures that support robust counting.

Abstract

Reliably counting and generating sequences of items remain a significant challenge for neural networks, including Large Language Models (LLMs). Indeed, although this capability is readily handled by rule-based symbolic systems based on serial computation, learning to systematically deploy counting procedures is difficult for neural models, which should acquire these skills through learning. Previous research has demonstrated that recurrent architectures can only approximately track and enumerate sequences of events, and it remains unclear whether modern deep learning systems, including LLMs, can deploy systematic counting procedures over sequences of discrete symbols. This paper aims to fill this gap by investigating the sequential enumeration abilities of five state-of-the-art LLMs, including proprietary, open-source, and reasoning models. We probe LLMs in sequential naming and production tasks involving lists of letters and words, adopting a variety of prompting instructions to explore the role of chain-of-thought in the spontaneous emerging of counting strategies. We also evaluate open-source models with the same architecture but increasing size to see whether the mastering of counting principles follows scaling laws, and we analyze the embedding dynamics during sequential enumeration to investigate the emergent encoding of numerosity. We find that some LLMs are indeed capable of deploying counting procedures when explicitly prompted to do so, but none of them spontaneously engage in counting when simply asked to enumerate the number of items in a sequence. Our results suggest that, despite their impressive emergent abilities, LLMs cannot yet robustly and systematically deploy counting procedures, highlighting a persistent gap between neural and symbolic approaches to compositional generalization.

Sequential Enumeration in Large Language Models

TL;DR

Abstract

Sequential Enumeration in Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)