Detecting and explaining (in)equivalence of context-free grammars

Marko Schmellenkamp; Thomas Zeume; Sven Argo; Sandra Kiefer; Cedric Siems; Fynn Stebel

Detecting and explaining (in)equivalence of context-free grammars

Marko Schmellenkamp, Thomas Zeume, Sven Argo, Sandra Kiefer, Cedric Siems, Fynn Stebel

TL;DR

The paper presents a scalable framework to decide, prove, and explain (in)equivalence of context-free grammars in educational settings, addressing the undecidability of CFG equivalence with a practical combination of grammar canonization, a pattern-based transformation language, and theory for bounded CFGs. It introduces an implementation that integrates these components with caching and clustering to handle large datasets from educational tools, and demonstrates effectiveness on 55,662 student attempts with substantial automatic classification and meaningful high-level explanations for many inequivalent cases. Key contributions include a formalized pattern-based grammar transformation language, an explicit algorithmic treatment of bounded languages via Presburger arithmetic, and empirically validated reductions in manual grading workload. The work has practical impact for intelligent tutoring systems and CS education research, enabling rapid, interpretable feedback and insights into common student mistakes while outlining avenues for extension to broader syntax specification tasks.

Abstract

We propose a scalable framework for deciding, proving, and explaining (in)equivalence of context-free grammars. We present an implementation of the framework and evaluate it on large data sets collected within educational support systems. Even though the equivalence problem for context-free languages is undecidable in general, the framework is able to handle a large portion of these datasets. It introduces and combines techniques from several areas, such as an abstract grammar transformation language to identify equivalent grammars as well as sufficiently similar inequivalent grammars, theory-based comparison algorithms for a large class of context-free languages, and a graph-theory-inspired grammar canonization that allows to efficiently identify isomorphic grammars.

Detecting and explaining (in)equivalence of context-free grammars

TL;DR

Abstract

Paper Structure (44 sections, 7 equations, 1 figure, 5 tables, 4 algorithms)

This paper contains 44 sections, 7 equations, 1 figure, 5 tables, 4 algorithms.

Motivation and Introduction
Conceptual insights
Implementation
Evaluation
Theoretical foundations for educational support systems
Relation to other grammar frameworks
Conceptual framework and key ingredients
A pattern-based grammar transformation specification language
An algorithmic framework for bounded context-free languages
Testing boundedness with respect to boundedness witness
Constructing Presburger formulas
Testing equivalence against a bounded language
Natural descriptions of bounded languages in set notation
Evaluation
Data collection and preparation
...and 29 more sections

Figures (1)

Figure 1: Illustration of the high-level architecture of our framework and how individual modules interact.

Detecting and explaining (in)equivalence of context-free grammars

TL;DR

Abstract

Detecting and explaining (in)equivalence of context-free grammars

Authors

TL;DR

Abstract

Table of Contents

Figures (1)