Table of Contents
Fetching ...

Stability Analysis of Various Symbolic Rule Extraction Methods from Recurrent Neural Network

Neisarg Dave, Daniel Kifer, C. Lee Giles, Ankur Mali

TL;DR

This work addresses the stability of interpretable rule extraction from recurrent neural networks by comparing quantization-based DFA extraction with the $L^{*}$ equivalence-query method across Tomita and Dyck grammars using four RNN cells, including the second-order O2RNN. The authors conduct a large-scale empirical study (3600 RNNs, 18000 DFAs via quantization, 3600 via $L^{*}$) and show that quantization-based extraction paired with O2RNN delivers the most stable and accurate DFAs, even when networks are not perfectly trained. In contrast, $L^{*}$ exhibits significant instability, especially for partially trained networks, producing DFAs with far more states than ground-truth and inconsistent accuracy. The results consistently favor the quantization approach and the O2RNN architecture, across both Tomita and Dyck languages, underscoring a practical path toward robust neuro-symbolic rule extraction. This has implications for building reliable, interpretable AI systems that can infer concise formal-rule representations from neural models.

Abstract

This paper analyzes two competing rule extraction methodologies: quantization and equivalence query. We trained $3600$ RNN models, extracting $18000$ DFA with a quantization approach (k-means and SOM) and $3600$ DFA by equivalence query($L^{*}$) methods across $10$ initialization seeds. We sampled the datasets from $7$ Tomita and $4$ Dyck grammars and trained them on $4$ RNN cells: LSTM, GRU, O2RNN, and MIRNN. The observations from our experiments establish the superior performance of O2RNN and quantization-based rule extraction over others. $L^{*}$, primarily proposed for regular grammars, performs similarly to quantization methods for Tomita languages when neural networks are perfectly trained. However, for partially trained RNNs, $L^{*}$ shows instability in the number of states in DFA, e.g., for Tomita 5 and Tomita 6 languages, $L^{*}$ produced more than $100$ states. In contrast, quantization methods result in rules with number of states very close to ground truth DFA. Among RNN cells, O2RNN produces stable DFA consistently compared to other cells. For Dyck Languages, we observe that although GRU outperforms other RNNs in network performance, the DFA extracted by O2RNN has higher performance and better stability. The stability is computed as the standard deviation of accuracy on test sets on networks trained across $10$ seeds. On Dyck Languages, quantization methods outperformed $L^{*}$ with better stability in accuracy and the number of states. $L^{*}$ often showed instability in accuracy in the order of $16\% - 22\%$ for GRU and MIRNN while deviation for quantization methods varied in $5\% - 15\%$. In many instances with LSTM and GRU, DFA's extracted by $L^{*}$ even failed to beat chance accuracy ($50\%$), while those extracted by quantization method had standard deviation in the $7\%-17\%$ range. For O2RNN, both rule extraction methods had deviation in the $0.5\% - 3\%$ range.

Stability Analysis of Various Symbolic Rule Extraction Methods from Recurrent Neural Network

TL;DR

This work addresses the stability of interpretable rule extraction from recurrent neural networks by comparing quantization-based DFA extraction with the equivalence-query method across Tomita and Dyck grammars using four RNN cells, including the second-order O2RNN. The authors conduct a large-scale empirical study (3600 RNNs, 18000 DFAs via quantization, 3600 via ) and show that quantization-based extraction paired with O2RNN delivers the most stable and accurate DFAs, even when networks are not perfectly trained. In contrast, exhibits significant instability, especially for partially trained networks, producing DFAs with far more states than ground-truth and inconsistent accuracy. The results consistently favor the quantization approach and the O2RNN architecture, across both Tomita and Dyck languages, underscoring a practical path toward robust neuro-symbolic rule extraction. This has implications for building reliable, interpretable AI systems that can infer concise formal-rule representations from neural models.

Abstract

This paper analyzes two competing rule extraction methodologies: quantization and equivalence query. We trained RNN models, extracting DFA with a quantization approach (k-means and SOM) and DFA by equivalence query() methods across initialization seeds. We sampled the datasets from Tomita and Dyck grammars and trained them on RNN cells: LSTM, GRU, O2RNN, and MIRNN. The observations from our experiments establish the superior performance of O2RNN and quantization-based rule extraction over others. , primarily proposed for regular grammars, performs similarly to quantization methods for Tomita languages when neural networks are perfectly trained. However, for partially trained RNNs, shows instability in the number of states in DFA, e.g., for Tomita 5 and Tomita 6 languages, produced more than states. In contrast, quantization methods result in rules with number of states very close to ground truth DFA. Among RNN cells, O2RNN produces stable DFA consistently compared to other cells. For Dyck Languages, we observe that although GRU outperforms other RNNs in network performance, the DFA extracted by O2RNN has higher performance and better stability. The stability is computed as the standard deviation of accuracy on test sets on networks trained across seeds. On Dyck Languages, quantization methods outperformed with better stability in accuracy and the number of states. often showed instability in accuracy in the order of for GRU and MIRNN while deviation for quantization methods varied in . In many instances with LSTM and GRU, DFA's extracted by even failed to beat chance accuracy (), while those extracted by quantization method had standard deviation in the range. For O2RNN, both rule extraction methods had deviation in the range.
Paper Structure (17 sections, 3 theorems, 3 equations, 24 figures, 4 tables)

This paper contains 17 sections, 3 theorems, 3 equations, 24 figures, 4 tables.

Key Result

Theorem 3.1

mali2023computational Given a DFA $M$ with $n$ states and $m$ input symbols, there exists a $k$-neuron bounded precision O2RNN with sigmoid activation function ($h_H$), where $k = n+1$, initialized from an arbitrary distribution, that can simulate any DFA in real-time O($T$).

Figures (24)

  • Figure 1: Distribution of strings of Tomita grammar in lexicographical order
  • Figure 2: Distribution of strings of Dyck grammar in lexicographical order
  • Figure 3: Min, max, and mode of the number of states extracted from $1^{st}$ and $2^{nd}$ order RNNs trained on Tomita grammars
  • Figure 4: Mean Accuracy and Standard Deviation of $1^{st}$ and $2^{nd}$ order Recurrent Neural Networks with restricted training on Tomita languages. The training was stopped when validation accuracy crossed $85\%$.
  • Figure 5: Mean and Standard Deviation of the accuracy of DFAs extracted from $1^{st}$ and $2^{nd}$ order RNNs with restricted training ($85\%$ validation accuracy) on Tomita languages.
  • ...and 19 more figures

Theorems & Definitions (5)

  • Theorem 3.1
  • Theorem 3.2
  • Definition 3.3
  • Corollary 3.4
  • Definition 3.5