Table of Contents
Fetching ...

A Tree Sampler for Bounded Context-Free Languages

Breandan Considine

TL;DR

This paper tackles uniform sampling of parse trees from bounded context-free languages (BCFLs) defined by porous strings with holes. It introduces an algebraic, nested datatype framework using $\mathbb{T}_3$ and $\mathbb{T}_2$ to compactly represent candidate parse forests and compute a fixed point $M_\infty$ that encodes feasible derivations for a given template. For sampling, it develops two modes: with replacement via a Multinoulli recursive sampler related to Boltzmann sampling (avoiding rejection), and without replacement via a counting/ pairing approach that maps trees to indices and lazily decodes from uniformly drawn integers. The method is claimed to be sound and complete for BCFL sampling, supports bounded generation and parallelization, and has practical applications in code completion and program repair, with a Kotlin reference implementation of the $\mathbb{T}_2$ datatype provided.

Abstract

In the following paper, we present a simple method for sampling trees with or without replacement from BCFLs. A BCFL is a context-free language (CFL) corresponding to an incomplete string with holes, which can be completed by valid terminals. To solve this problem, we introduce an algebraic datatype that compactly represents candidate parse forests for porous strings. Once constructed, sampling trees is a straightforward matter of sampling integers uniformly without replacement, then lazily decoding them into trees.

A Tree Sampler for Bounded Context-Free Languages

TL;DR

This paper tackles uniform sampling of parse trees from bounded context-free languages (BCFLs) defined by porous strings with holes. It introduces an algebraic, nested datatype framework using and to compactly represent candidate parse forests and compute a fixed point that encodes feasible derivations for a given template. For sampling, it develops two modes: with replacement via a Multinoulli recursive sampler related to Boltzmann sampling (avoiding rejection), and without replacement via a counting/ pairing approach that maps trees to indices and lazily decodes from uniformly drawn integers. The method is claimed to be sound and complete for BCFL sampling, supports bounded generation and parallelization, and has practical applications in code completion and program repair, with a Kotlin reference implementation of the datatype provided.

Abstract

In the following paper, we present a simple method for sampling trees with or without replacement from BCFLs. A BCFL is a context-free language (CFL) corresponding to an incomplete string with holes, which can be completed by valid terminals. To solve this problem, we introduce an algebraic datatype that compactly represents candidate parse forests for porous strings. Once constructed, sampling trees is a straightforward matter of sampling integers uniformly without replacement, then lazily decoding them into trees.
Paper Structure (7 sections, 8 equations, 1 figure, 1 table)

This paper contains 7 sections, 8 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: A partial $\mathbb{T}_2$ for the grammar with productions $P=\{S \rightarrow BC \mid \ldots \mid AB, B\rightarrow RD \mid \ldots, A\rightarrow QC \mid \ldots\}$.

Theorems & Definitions (1)

  • definition 1: Completion