Table of Contents
Fetching ...

A Comparative Study of Rule Extraction for Recurrent Neural Networks

Qinglong Wang, Kaixuan Zhang, Alexander G. Ororbia, Xinyu Xing, Xue Liu, C. Lee Giles

TL;DR

This work investigates how deterministic finite automata (DFA) can be extracted from diverse recurrent neural networks (RNNs) trained on Tomita grammars, using two complexity measures, entropy $H(G)$ and average edit distance $D(G)$, to categorize grammar difficulty. It compares five architectures—Elman-RNN, second-order RNN, MI-RNN, LSTM, and GRU—under a compositional DFA-extraction pipeline based on hidden-state quantization, transition counting, and Hopcroft minimization. The results show that extraction performance generally declines with grammar complexity, with second-order RNN and MI-RNN delivering the most reliable DFAs, particularly on the high-complexity grammars where others fail. The findings highlight the value of the proposed grammar-complexity metrics for diagnosing extraction feasibility and suggest architectural choices (notably quadratic interaction models) that support robust rule extraction in sequence models, with implications for interpretability and verification of trained RNNs.

Abstract

Understanding recurrent networks through rule extraction has a long history. This has taken on new interests due to the need for interpreting or verifying neural networks. One basic form for representing stateful rules is deterministic finite automata (DFA). Previous research shows that extracting DFAs from trained second-order recurrent networks is not only possible but also relatively stable. Recently, several new types of recurrent networks with more complicated architectures have been introduced. These handle challenging learning tasks usually involving sequential data. However, it remains an open problem whether DFAs can be adequately extracted from these models. Specifically, it is not clear how DFA extraction will be affected when applied to different recurrent networks trained on data sets with different levels of complexity. Here, we investigate DFA extraction on several widely adopted recurrent networks that are trained to learn a set of seven regular Tomita grammars. We first formally analyze the complexity of Tomita grammars and categorize these grammars according to that complexity. Then we empirically evaluate different recurrent networks for their performance of DFA extraction on all Tomita grammars. Our experiments show that for most recurrent networks, their extraction performance decreases as the complexity of the underlying grammar increases. On grammars of lower complexity, most recurrent networks obtain desirable extraction performance. As for grammars with the highest level of complexity, while several complicated models fail with only certain recurrent networks having satisfactory extraction performance.

A Comparative Study of Rule Extraction for Recurrent Neural Networks

TL;DR

This work investigates how deterministic finite automata (DFA) can be extracted from diverse recurrent neural networks (RNNs) trained on Tomita grammars, using two complexity measures, entropy and average edit distance , to categorize grammar difficulty. It compares five architectures—Elman-RNN, second-order RNN, MI-RNN, LSTM, and GRU—under a compositional DFA-extraction pipeline based on hidden-state quantization, transition counting, and Hopcroft minimization. The results show that extraction performance generally declines with grammar complexity, with second-order RNN and MI-RNN delivering the most reliable DFAs, particularly on the high-complexity grammars where others fail. The findings highlight the value of the proposed grammar-complexity metrics for diagnosing extraction feasibility and suggest architectural choices (notably quadratic interaction models) that support robust rule extraction in sequence models, with implications for interpretability and verification of trained RNNs.

Abstract

Understanding recurrent networks through rule extraction has a long history. This has taken on new interests due to the need for interpreting or verifying neural networks. One basic form for representing stateful rules is deterministic finite automata (DFA). Previous research shows that extracting DFAs from trained second-order recurrent networks is not only possible but also relatively stable. Recently, several new types of recurrent networks with more complicated architectures have been introduced. These handle challenging learning tasks usually involving sequential data. However, it remains an open problem whether DFAs can be adequately extracted from these models. Specifically, it is not clear how DFA extraction will be affected when applied to different recurrent networks trained on data sets with different levels of complexity. Here, we investigate DFA extraction on several widely adopted recurrent networks that are trained to learn a set of seven regular Tomita grammars. We first formally analyze the complexity of Tomita grammars and categorize these grammars according to that complexity. Then we empirically evaluate different recurrent networks for their performance of DFA extraction on all Tomita grammars. Our experiments show that for most recurrent networks, their extraction performance decreases as the complexity of the underlying grammar increases. On grammars of lower complexity, most recurrent networks obtain desirable extraction performance. As for grammars with the highest level of complexity, while several complicated models fail with only certain recurrent networks having satisfactory extraction performance.

Paper Structure

This paper contains 28 sections, 3 theorems, 19 equations, 8 figures, 4 tables.

Key Result

Proposition 1

Figures (8)

  • Figure 1: Example DFA for Tomita grammar 2. Red arrow indicates the initial state, shaded circles indicate non-accept states. Dotted lines indicate input "0" and solid lines indicate input "1".
  • Figure 2: Graphic presentation of the distribution of strings of length $N$ ($1 \leq N \leq 8$) for grammars 2, 4 and 5. In each concentric ring of either graph, there are $2^N$ strings arranged in lexicographic order, starting at $\theta = 0$. White and black areas represent positive and negative strings respectively.
  • Figure 3: Mean and variance of the accuracy obtained by DFAs extracted from all models on grammar 2, 4 and 5. We denote second-order RNN with sigmoid and tanh activation function by 2nd-Sig and 2nd-Tanh. Similarly, Elman-RNN with these two activation functions are denoted as Elman-Sig and Elman-Tanh respectively.
  • Figure 4: Average accuracy of DFAs extracted from recurrent models on Tomita grammars. Left vertical axis: entropy. Right vertical axis: average accuracy of extracted DFAs.
  • Figure 5: Success rates of DFA extraction for all models on Tomita grammars.
  • ...and 3 more figures

Theorems & Definitions (12)

  • Definition 1: Rule Extraction Problem
  • Definition 2: Entropy
  • Proposition 1
  • Theorem 1
  • Definition 3: Shift Space
  • Definition 4: Entropy of Shift Space
  • Definition 5: Edit Distance
  • Definition 6: Average Edit Distance
  • Proposition 2
  • proof
  • ...and 2 more