Table of Contents
Fetching ...

Order-Based Pre-training Strategies for Procedural Text Understanding

Abhilash Nandy, Yash Kulkarni, Pawan Goyal, Niloy Ganguly

TL;DR

Procedural text understanding requires tracking how entities change across sequential steps. The paper introduces order-based pre-training with Permutation Classification, Embedding Regression, and Skip-Clip to inject sequential supervision into transformers. Empirical results on NPN-Cooking and ProPara show clear gains over baselines, with Skip-Clip strongest in-domain and permutation/embedding variants performing well cross-domain, sometimes surpassing GPT-3.5 in 1-shot settings. The work demonstrates that explicit encoding of step order can enhance procedural reasoning and generalize beyond recipes to other procedural domains, while acknowledging limitations and ethical considerations.

Abstract

In this paper, we propose sequence-based pretraining methods to enhance procedural understanding in natural language processing. Procedural text, containing sequential instructions to accomplish a task, is difficult to understand due to the changing attributes of entities in the context. We focus on recipes, which are commonly represented as ordered instructions, and use this order as a supervision signal. Our work is one of the first to compare several 'order as-supervision' transformer pre-training methods, including Permutation Classification, Embedding Regression, and Skip-Clip, and shows that these methods give improved results compared to the baselines and SoTA LLMs on two downstream Entity-Tracking datasets: NPN-Cooking dataset in recipe domain and ProPara dataset in open domain. Our proposed methods address the non-trivial Entity Tracking Task that requires prediction of entity states across procedure steps, which requires understanding the order of steps. These methods show an improvement over the best baseline by 1.6% and 7-9% on NPN-Cooking and ProPara Datasets respectively across metrics.

Order-Based Pre-training Strategies for Procedural Text Understanding

TL;DR

Procedural text understanding requires tracking how entities change across sequential steps. The paper introduces order-based pre-training with Permutation Classification, Embedding Regression, and Skip-Clip to inject sequential supervision into transformers. Empirical results on NPN-Cooking and ProPara show clear gains over baselines, with Skip-Clip strongest in-domain and permutation/embedding variants performing well cross-domain, sometimes surpassing GPT-3.5 in 1-shot settings. The work demonstrates that explicit encoding of step order can enhance procedural reasoning and generalize beyond recipes to other procedural domains, while acknowledging limitations and ethical considerations.

Abstract

In this paper, we propose sequence-based pretraining methods to enhance procedural understanding in natural language processing. Procedural text, containing sequential instructions to accomplish a task, is difficult to understand due to the changing attributes of entities in the context. We focus on recipes, which are commonly represented as ordered instructions, and use this order as a supervision signal. Our work is one of the first to compare several 'order as-supervision' transformer pre-training methods, including Permutation Classification, Embedding Regression, and Skip-Clip, and shows that these methods give improved results compared to the baselines and SoTA LLMs on two downstream Entity-Tracking datasets: NPN-Cooking dataset in recipe domain and ProPara dataset in open domain. Our proposed methods address the non-trivial Entity Tracking Task that requires prediction of entity states across procedure steps, which requires understanding the order of steps. These methods show an improvement over the best baseline by 1.6% and 7-9% on NPN-Cooking and ProPara Datasets respectively across metrics.
Paper Structure (21 sections, 2 figures, 9 tables)

This paper contains 21 sections, 2 figures, 9 tables.

Figures (2)

  • Figure 1: Permutation Classification and Embedding Regression for a 4-step recipe. Recipe steps are reordered via a randomly chosen permutation from a predefined permutation set and then fed to the transformer model. The Permutation Classification Task is to predict the index of the chosen permutation which in this case is 23, and Embedding Regression Task is to predict the corresponding Lehmer/Hamming Embedding.
  • Figure 2: Skip-Clip model with a 6-step context and 3 target steps. The task is to rank the target steps based on scores obtained from a scoring function and their order in the recipe using hinge rank loss.