Table of Contents
Fetching ...

Incremental Context-free Grammar Inference in Black Box Settings

Feifei Li, Xiao Chen, Xi Xiao, Xiaoyu Sun, Chuan Chen, Shaohua Wang, Jitao Han

TL;DR

A novel method that segments example strings into smaller units and incrementally infers the grammar, named Kedavra, has demonstrated superior grammar quality, faster runtime, and improved readability through empirical comparison.

Abstract

Black-box context-free grammar inference presents a significant challenge in many practical settings due to limited access to example programs. The state-of-the-art methods, Arvada and Treevada, employ heuristic approaches to generalize grammar rules, initiating from flat parse trees and exploring diverse generalization sequences. We have observed that these approaches suffer from low quality and readability, primarily because they process entire example strings, adding to the complexity and substantially slowing down computations. To overcome these limitations, we propose a novel method that segments example strings into smaller units and incrementally infers the grammar. Our approach, named Kedavra, has demonstrated superior grammar quality (enhanced precision and recall), faster runtime, and improved readability through empirical comparison.

Incremental Context-free Grammar Inference in Black Box Settings

TL;DR

A novel method that segments example strings into smaller units and incrementally infers the grammar, named Kedavra, has demonstrated superior grammar quality, faster runtime, and improved readability through empirical comparison.

Abstract

Black-box context-free grammar inference presents a significant challenge in many practical settings due to limited access to example programs. The state-of-the-art methods, Arvada and Treevada, employ heuristic approaches to generalize grammar rules, initiating from flat parse trees and exploring diverse generalization sequences. We have observed that these approaches suffer from low quality and readability, primarily because they process entire example strings, adding to the complexity and substantially slowing down computations. To overcome these limitations, we propose a novel method that segments example strings into smaller units and incrementally infers the grammar. Our approach, named Kedavra, has demonstrated superior grammar quality (enhanced precision and recall), faster runtime, and improved readability through empirical comparison.
Paper Structure (25 sections, 5 figures, 5 tables, 5 algorithms)

This paper contains 25 sections, 5 figures, 5 tables, 5 algorithms.

Figures (5)

  • Figure 1: A simple example of Arvada workflow
  • Figure 2: Workflow of Kedavra
  • Figure 3: Results after pre-tokenization
  • Figure 4: avg F1 score of 10 runs of Arvada, Treevada and Kedavra on each dataset (R0, R1, R2, R5). Note that the horizontal bars in each of the sub-figures are manually added as a reference to better visualize the fluctuations in the F1 scores of the inference algorithm across different datasets.
  • Figure 5: Precision of Arvada,Treevada and Kedavra runs on different sample methods(A = Arvada's sample algorithm,T = Treevada's sample algorithm,LPP10 = LimitPerProd10). Note that the horizontal bars in each of the sub-figures are manually added as a reference to better visualize the fluctuations in the precision values of the inference algorithm across different sampling algorithms.