Table of Contents
Fetching ...

A Small Math Model: Recasting Strategy Choice Theory in an LLM-Inspired Architecture

Roussel Rahman, Jeff Shrager

TL;DR

The paper reframes Strategy Choice Theory (SCT) within a contemporary LLM-inspired Small Math Model (SMM) to study how children adaptively select arithmetic strategies. It introduces number embeddings, a gated attention mechanism, and a Gaussian curriculum to train counting and addition with confidence-based switching between strategies. Key findings show counting can scaffold learning of addition, with timing of introduction causing transient interference that dissipates, and biases decreasing with experience, aligning with SCT predictions. This work provides a mechanistic, interpretable platform for exploring number sense and symbolic reasoning in AI, with potential to scale to broader numerical tasks and support adaptive strategy discovery.

Abstract

Strategy Choice Theory (SCT; Siegler and Shrager, 1984; Siegler, 2000) explains important aspects of children's arithmetic learning based upon principles including learning from developmentally naturalistic data, probabilistic representation, confidence-based retrieval, and the phase-like importance of scaffolding strategies, such as finger-counting. Here we recast SCT as a ``Small Math Model'' (SMM), employing a neural-network-based architecture analogous to LLMs. The SMM extends SCT to include counting practice, symbol (number) embedding, and gated attention. Similar to earlier work, the SMM demonstrates constructive and destructive interference between counting and addition, and the ``wave-like'' use of finger-counting as sum recall improves. We plan to extend the SMM to later aspects of the decades-long SCT program, including adaptive strategy choice and eventually strategy discovery, providing a unified platform to investigate the understanding of numerical characteristics and relationships essential for mathematical reasoning -- as it can emerge in LLM-based agents.

A Small Math Model: Recasting Strategy Choice Theory in an LLM-Inspired Architecture

TL;DR

The paper reframes Strategy Choice Theory (SCT) within a contemporary LLM-inspired Small Math Model (SMM) to study how children adaptively select arithmetic strategies. It introduces number embeddings, a gated attention mechanism, and a Gaussian curriculum to train counting and addition with confidence-based switching between strategies. Key findings show counting can scaffold learning of addition, with timing of introduction causing transient interference that dissipates, and biases decreasing with experience, aligning with SCT predictions. This work provides a mechanistic, interpretable platform for exploring number sense and symbolic reasoning in AI, with potential to scale to broader numerical tasks and support adaptive strategy discovery.

Abstract

Strategy Choice Theory (SCT; Siegler and Shrager, 1984; Siegler, 2000) explains important aspects of children's arithmetic learning based upon principles including learning from developmentally naturalistic data, probabilistic representation, confidence-based retrieval, and the phase-like importance of scaffolding strategies, such as finger-counting. Here we recast SCT as a ``Small Math Model'' (SMM), employing a neural-network-based architecture analogous to LLMs. The SMM extends SCT to include counting practice, symbol (number) embedding, and gated attention. Similar to earlier work, the SMM demonstrates constructive and destructive interference between counting and addition, and the ``wave-like'' use of finger-counting as sum recall improves. We plan to extend the SMM to later aspects of the decades-long SCT program, including adaptive strategy choice and eventually strategy discovery, providing a unified platform to investigate the understanding of numerical characteristics and relationships essential for mathematical reasoning -- as it can emerge in LLM-based agents.

Paper Structure

This paper contains 6 sections, 2 figures.

Figures (2)

  • Figure 1: Addition correctness (a) and use of finger-counting (b) as a function of training step, for different addition-start times. Counting is always trained from the beginning. Destructive interference takes place near the beginning of practice if addition is introduced too early, although this is rapidly overcome by training.
  • Figure 2: Addition answer confidence over the whole 50,000 training cycles, for the problem $3+4=?$, depicted in four quarters, early through late. In each quarter, the lines represent distributions for problems sampled once in every 10 instances. (The training itself is not divided into four periods.)