Characterizing Pattern Matching and Its Limits on Compositional Task Structures
Hoyeon Chang, Jinho Park, Hanseul Cho, Sohee Yang, Miyoung Ko, Hyeonbin Hwang, Seungpil Won, Dohaeng Lee, Youbin Ahn, Minjoon Seo
TL;DR
This work formalizes pattern matching in neural generalization as functional-equivalence-based generalization within a data-driven framework, introducing $k$-equivalence, $k$-coverage, and a substitution graph to define the boundary of pattern-matching capabilities. It shows that instance-wise success correlates with the amount of supporting contexts, and proves a tight data-scaling law for a two-hop structure, $N_{\mathrm{req}} = \tilde{\Theta}(n^c)$ with $c=2.5-0.5/k$, a result robust across architectures up to a 20x parameter increase and across tasks (2-Hop, 3-Hop, etc.). The study identifies path ambiguity as a structural barrier where multiple computation paths prevent unified intermediate-state representations, and demonstrates that Chain-of-Thought reduces data requirements but does not fully resolve this issue. A taxonomy of generalization mechanisms is proposed to distinguish functional-equivalence-based pattern matching from property-based and shared-operator generalization, offering a principled diagnostic for when pattern matching can account for generalization and guiding targeted data augmentation and future research on non-pattern-matching mechanisms.
Abstract
Despite impressive capabilities, LLMs' successes often rely on pattern-matching behaviors, yet these are also linked to OOD generalization failures in compositional tasks. However, behavioral studies commonly employ task setups that allow multiple generalization sources (e.g., algebraic invariances, structural repetition), obscuring a precise and testable account of how well LLMs perform generalization through pattern matching and their limitations. To address this ambiguity, we first formalize pattern matching as functional equivalence, i.e., identifying pairs of subsequences of inputs that consistently lead to identical results when the rest of the input is held constant. Then, we systematically study how decoder-only Transformer and Mamba behave in controlled tasks with compositional structures that isolate this mechanism. Our formalism yields predictive and quantitative insights: (1) Instance-wise success of pattern matching is well predicted by the number of contexts witnessing the relevant functional equivalence. (2) We prove a tight sample complexity bound of learning a two-hop structure by identifying the exponent of the data scaling law for perfect in-domain generalization. Our empirical results align with the theoretical prediction, under 20x parameter scaling and across architectures. (3) Path ambiguity is a structural barrier: when a variable influences the output via multiple paths, models fail to form unified intermediate state representations, impairing accuracy and interpretability. (4) Chain-of-Thought reduces data requirements yet does not resolve path ambiguity. Hence, we provide a predictive, falsifiable boundary for pattern matching and a foundational diagnostic for disentangling mixed generalization mechanisms.
