Table of Contents
Fetching ...

Simpler Context-Dependent Logical Forms via Model Projections

Reginald Long, Panupong Pasupat, Percy Liang

TL;DR

This work tackles the problem of learning context-dependent semantic parsing from denotations without a seed lexicon. It introduces a projection framework that starts from a full anchored-logical-form model (Model A) and successively collapses to simpler spaces (Models B and C) to balance expressivity and computation, aided by a left-to-right parser capable of handling ellipsis and anaphora. Across three new datasets (Alchemy, Scene, Tangrams) grounded in world states, Model C shows strong performance under constrained computation, while Model A excels with ample compute and bootstrapping from the simpler models improves practicality. The results demonstrate a meaningful computation-expressivity tradeoff and offer a viable path to scalable context-dependent semantic parsing via projection and bootstrapping, with reproducible data and code.

Abstract

We consider the task of learning a context-dependent mapping from utterances to denotations. With only denotations at training time, we must search over a combinatorially large space of logical forms, which is even larger with context-dependent utterances. To cope with this challenge, we perform successive projections of the full model onto simpler models that operate over equivalence classes of logical forms. Though less expressive, we find that these simpler models are much faster and can be surprisingly effective. Moreover, they can be used to bootstrap the full model. Finally, we collected three new context-dependent semantic parsing datasets, and develop a new left-to-right parser.

Simpler Context-Dependent Logical Forms via Model Projections

TL;DR

This work tackles the problem of learning context-dependent semantic parsing from denotations without a seed lexicon. It introduces a projection framework that starts from a full anchored-logical-form model (Model A) and successively collapses to simpler spaces (Models B and C) to balance expressivity and computation, aided by a left-to-right parser capable of handling ellipsis and anaphora. Across three new datasets (Alchemy, Scene, Tangrams) grounded in world states, Model C shows strong performance under constrained computation, while Model A excels with ample compute and bootstrapping from the simpler models improves practicality. The results demonstrate a meaningful computation-expressivity tradeoff and offer a viable path to scalable context-dependent semantic parsing via projection and bootstrapping, with reproducible data and code.

Abstract

We consider the task of learning a context-dependent mapping from utterances to denotations. With only denotations at training time, we must search over a combinatorially large space of logical forms, which is even larger with context-dependent utterances. To cope with this challenge, we perform successive projections of the full model onto simpler models that operate over equivalence classes of logical forms. Though less expressive, we find that these simpler models are much faster and can be surprisingly effective. Moreover, they can be used to bootstrap the full model. Finally, we collected three new context-dependent semantic parsing datasets, and develop a new left-to-right parser.

Paper Structure

This paper contains 25 sections, 4 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Our task is to learn to map a piece of text in some context to a denotation. An example from the Alchemy dataset is shown. In this paper, we ask: what intermediate logical form is suitable for modeling this mapping?
  • Figure 2: Derivations generated for the last utterance in Figure \ref{['fig:runningExample']}. All derivations above execute to mix(beaker2). Model A generates anchored logical forms (derivations) where words are aligned to predicates, which leads to multiple derivations with the same logical form. Model B discards these alignments, and Model C collapses the arguments of the logical forms to denotations.
  • Figure 3: Scene dataset: Each person has a shirt of some color and a hat of some color. They enter, leave, move around on a stage, and trade hats.
  • Figure 4: Tangrams dataset: One can add figures, remove figures, and swap the position of figures. All the figures slide to the left.
  • Figure 5: Suppose we have already constructed delete(pos(2)) for "Delete the second figure." Continuing, we shift the utterance "Repeat". Then, we build action[1] aligned to the word "Repeat." followed by args[1][1], which is unaligned. Finally, we combine the two logical forms.
  • ...and 3 more figures