Table of Contents
Fetching ...

PizzaCommonSense: Learning to Model Commonsense Reasoning about Intermediate Steps in Cooking Recipes

Aissatou Diallo, Antonis Bikakis, Luke Dickens, Anthony Hunter, Rob Miller

TL;DR

PizzaCommonsense introduces a dataset that grounds commonsense reasoning in procedural cooking by annotating intermediate step inputs and outputs for pizza recipes. The task reframes recipe following as predicting the explicit IO descriptions for each atomic action, evaluated across T5-family baselines and GPT-3.5/4 with prompting and fine-tuning. Results show substantial gaps relative to human performance, with GPT-3.5+FT delivering the strongest exact-match performance among AI models, while GPT-4+CoT remains competitive but not superior, underscoring the challenge of explicit, stepwise IO reasoning. The work highlights the need for better representations, preprocessing pipelines, and training paradigms to enable reliable, interpretable procedural understanding with potential applications in autonomous cooking agents and intelligent assistants.

Abstract

Understanding procedural texts, such as cooking recipes, is essential for enabling machines to follow instructions and reason about tasks, a key aspect of intelligent reasoning. In cooking, these instructions can be interpreted as a series of modifications to a food preparation. For a model to effectively reason about cooking recipes, it must accurately discern and understand the inputs and outputs of intermediate steps within the recipe. We present a new corpus of cooking recipes enriched with descriptions of intermediate steps that describe the input and output for each step. PizzaCommonsense serves as a benchmark for the reasoning capabilities of LLMs because it demands rigorous explicit input-output descriptions to demonstrate the acquisition of implicit commonsense knowledge, which is unlikely to be easily memorized. GPT-4 achieves only 26\% human-evaluated preference for generations, leaving room for future improvements.

PizzaCommonSense: Learning to Model Commonsense Reasoning about Intermediate Steps in Cooking Recipes

TL;DR

PizzaCommonsense introduces a dataset that grounds commonsense reasoning in procedural cooking by annotating intermediate step inputs and outputs for pizza recipes. The task reframes recipe following as predicting the explicit IO descriptions for each atomic action, evaluated across T5-family baselines and GPT-3.5/4 with prompting and fine-tuning. Results show substantial gaps relative to human performance, with GPT-3.5+FT delivering the strongest exact-match performance among AI models, while GPT-4+CoT remains competitive but not superior, underscoring the challenge of explicit, stepwise IO reasoning. The work highlights the need for better representations, preprocessing pipelines, and training paradigms to enable reliable, interpretable procedural understanding with potential applications in autonomous cooking agents and intelligent assistants.

Abstract

Understanding procedural texts, such as cooking recipes, is essential for enabling machines to follow instructions and reason about tasks, a key aspect of intelligent reasoning. In cooking, these instructions can be interpreted as a series of modifications to a food preparation. For a model to effectively reason about cooking recipes, it must accurately discern and understand the inputs and outputs of intermediate steps within the recipe. We present a new corpus of cooking recipes enriched with descriptions of intermediate steps that describe the input and output for each step. PizzaCommonsense serves as a benchmark for the reasoning capabilities of LLMs because it demands rigorous explicit input-output descriptions to demonstrate the acquisition of implicit commonsense knowledge, which is unlikely to be easily memorized. GPT-4 achieves only 26\% human-evaluated preference for generations, leaving room for future improvements.
Paper Structure (50 sections, 3 figures, 7 tables)

This paper contains 50 sections, 3 figures, 7 tables.

Figures (3)

  • Figure 1: A graphical depiction of the PizzaCommonsense underlying motivation. Models are required to learn knowledge about the input and output of each intermediate step and predict the correct sequencing of these comestibles given the corresponding instructions and cooking actions.
  • Figure 2: Our proposed pipeline to obtain PizzaCommonSense. Given a recipe among the selected ones from Recipe1M, we first apply POS tagging to identify the cooking actions and split the sentences such that each sentence contains only one main cooking action. The instructions and the identified cooking action are formatted into a table which becomes the HIT.The green box illustrates the annotation process, and the red box represents the training/inference phase.
  • Figure 3: Data collection interface on AMT.