Representing Rule-based Chatbots with Transformers
Dan Friedman, Abhishek Panigrahi, Danqi Chen
TL;DR
This work investigates how Transformer-based chatbots could implement rule-based dialogue by modeling ELIZA as a formal target. It presents a decoder-only Transformer construction that realizess ELIZA via a modular pipeline: template matching with finite-state automata, two copying strategies (induction-head and position-based), cycling through reassembly rules, and memory-queue mechanisms, along with alternative substructures (gridworld vs intermediate outputs). Through synthetic data and controlled experiments, the authors show that Transformers can learn to replicate ELIZA behavior, with induction-head copying and intermediate-output scratchpad usage emerging as prevalent mechanisms, and that data distribution shapes which mechanisms are preferred. The results connect neural chatbots to interpretable, symbolic dynamics, offering a framework for mechanistic analysis, and propose ELIZA as a benchmark to study learning dynamics and interpretability in conversational models.
Abstract
What kind of internal mechanisms might Transformers use to conduct fluid, natural-sounding conversations? Prior work has illustrated by construction how Transformers can solve various synthetic tasks, such as sorting a list or recognizing formal languages, but it remains unclear how to extend this approach to a conversational setting. In this work, we propose using ELIZA, a classic rule-based chatbot, as a setting for formal, mechanistic analysis of Transformer-based chatbots. ELIZA allows us to formally model key aspects of conversation, including local pattern matching and long-term dialogue state tracking. We first present a theoretical construction of a Transformer that implements the ELIZA chatbot. Building on prior constructions, particularly those for simulating finite-state automata, we show how simpler mechanisms can be composed and extended to produce more sophisticated behavior. Next, we conduct a set of empirical analyses of Transformers trained on synthetically generated ELIZA conversations. Our analysis illustrates the kinds of mechanisms these models tend to prefer--for example, models favor an induction head mechanism over a more precise, position-based copying mechanism; and using intermediate generations to simulate recurrent data structures, akin to an implicit scratchpad or Chain-of-Thought. Overall, by drawing an explicit connection between neural chatbots and interpretable, symbolic mechanisms, our results provide a new framework for the mechanistic analysis of conversational agents.
