Hashing Modulo Context-Sensitive $α$-Equivalence

Lasse Blaauwbroek; Miroslav Olšák; Herman Geuvers

Hashing Modulo Context-Sensitive $α$-Equivalence

Lasse Blaauwbroek, Miroslav Olšák, Herman Geuvers

TL;DR

This work addresses the challenge of comparing λ-terms when free variables are bound in a surrounding context, proposing context-sensitive α-equivalence. It formalizes this notion in two complementary ways—as fork equivalence and as bisimulation on graph representations—and proves their equivalence. A practical $O(n\log n)$ hashing algorithm is developed to assign equal hashes to context-sensitively α-equivalent subterms, enabling scalable subterm sharing in large λ-term graphs. The authors validate the approach with a large Coq-derived graph, achieving substantial reductions in node counts and demonstrating the method's potential for compiler optimizations and large-scale formal knowledge processing.

Abstract

The notion of $α$-equivalence between $λ$-terms is commonly used to identify terms that are considered equal. However, due to the primitive treatment of free variables, this notion falls short when comparing subterms occurring within a larger context. Depending on the usage of the Barendregt convention (choosing different variable names for all involved binders), it will equate either too few or too many subterms. We introduce a formal notion of context-sensitive $α$-equivalence, where two open terms can be compared within a context that resolves their free variables. We show that this equivalence coincides exactly with the notion of bisimulation equivalence. Furthermore, we present an efficient $O(n\log n)$ runtime hashing scheme that identifies $λ$-terms modulo context-sensitive $α$-equivalence, generalizing over traditional bisimulation partitioning algorithms and improving upon a previously established $O(n\log^2 n)$ bound for a hashing modulo ordinary $α$-equivalence by Maziarz et al. Hashing $λ$-terms is useful in many applications that require common subterm elimination and structure sharing. We have employed the algorithm to obtain a large-scale, densely packed, interconnected graph of mathematical knowledge from the Coq proof assistant for machine learning purposes.

Hashing Modulo Context-Sensitive $α$-Equivalence

TL;DR

hashing algorithm is developed to assign equal hashes to context-sensitively α-equivalent subterms, enabling scalable subterm sharing in large λ-term graphs. The authors validate the approach with a large Coq-derived graph, achieving substantial reductions in node counts and demonstrating the method's potential for compiler optimizations and large-scale formal knowledge processing.

Abstract

The notion of

-equivalence between

-terms is commonly used to identify terms that are considered equal. However, due to the primitive treatment of free variables, this notion falls short when comparing subterms occurring within a larger context. Depending on the usage of the Barendregt convention (choosing different variable names for all involved binders), it will equate either too few or too many subterms. We introduce a formal notion of context-sensitive

-equivalence, where two open terms can be compared within a context that resolves their free variables. We show that this equivalence coincides exactly with the notion of bisimulation equivalence. Furthermore, we present an efficient

runtime hashing scheme that identifies

-terms modulo context-sensitive

-equivalence, generalizing over traditional bisimulation partitioning algorithms and improving upon a previously established

bound for a hashing modulo ordinary

-equivalence by Maziarz et al. Hashing

-terms is useful in many applications that require common subterm elimination and structure sharing. We have employed the algorithm to obtain a large-scale, densely packed, interconnected graph of mathematical knowledge from the Coq proof assistant for machine learning purposes.

Paper Structure (23 sections, 8 theorems, 30 equations, 8 figures)

This paper contains 23 sections, 8 theorems, 30 equations, 8 figures.

Introduction
Problem Description
Fork Equivalence
Equivalence through Bisimulation
Hashing versus Partitioning Modulo Bisimulation
Context-Sensitive $\alpha$-Equivalence versus Ordinary $\alpha$-Equivalence
Applications
Contributions
Definitions
Terms, Positions and Indexing
Locally Closed Subterms
Term Nodes
Fork Equivalence
Bisimilarity
Deciding Context-Sensitive $\alpha$-Equivalence Through Globalization
...and 8 more sections

Key Result

Theorem 2.16

$t_1\llbracket p_1\rrbracket \sim_\text{b} t_2\llbracket p_2\rrbracket$ if and only if $t_1\llbracket p_1\rrbracket \sim_\text{f} t_2\llbracket p_2\rrbracket$.

Figures (8)

Figure 1: Illustrations of forks in terms built from de Bruijn indices. Equivalent sub-terms are related through a squiggly line. Back-edges from variables to binders are illustrative only.
Figure 2: Illustrations of $\lambda$-terms as unordered graphs with labeled edges. Subject terms are related through a squiggly line. De Bruijn indices in variables are for illustrative purposes only.
Figure 3: To decide whether two variables are bisimilar, we must examine bisimilarity between far-away terms $A$ and $B$.
Figure 4: Proof strategy.
Figure 5: A visualization of a maximally shared graph of CIC terms extracted from Coq's Prelude.
...and 3 more figures

Theorems & Definitions (31)

Definition 2.1: $\lambda$-terms
Definition 2.2: term indexing
Example 2.3
Definition 2.4: position sets
Definition 2.5
Example 2.6
Definition 2.7
Example 2.9
Definition 2.10: term node
Definition 2.11: term node transitions
...and 21 more

Hashing Modulo Context-Sensitive $α$-Equivalence

TL;DR

Abstract

Hashing Modulo Context-Sensitive $α$-Equivalence

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (31)