Table of Contents
Fetching ...

Unsupervised Learning of Graph from Recipes

Aissatou Diallo, Antonis Bikakis, Luke Dickens, Anthony Hunter, Rob Miller

TL;DR

This work tackles unsupervised procedural understanding by converting cooking recipes into graphs that encode actions, ingredients, and locations to enable reasoning about sequences. It introduces a self-supervised pipeline with a text-to-graph component (Entity Identifier and Graph Structure Encoder) and a graph-to-text component (Transformer-based Decoder), trained via decoding graphs back into text and optimizing a joint loss $\mathcal{L}_{tot}= \mathcal{L}_{gse} + \mathcal{L}_{gen} + \lambda\|A\|_1$. A key innovation is the continuous relaxation of adjacency via a Sinkhorn-based sparsification, producing sparse, discrete-like graphs while learning node embeddings from a cooking-domain prior. The approach also includes a Recurrent Graph Embedding to capture temporal progression, enabling the model to build graphs incrementally as the recipe unfolds. Empirical results on Now You're Cooking and the English Flow Corpus demonstrate strong entity identification and competitive text↔graph performance, highlighting the potential of unsupervised graph learning for procedural knowledge extraction and reasoning in automated agents.

Abstract

Cooking recipes are one of the most readily available kinds of procedural text. They consist of natural language instructions that can be challenging to interpret. In this paper, we propose a model to identify relevant information from recipes and generate a graph to represent the sequence of actions in the recipe. In contrast with other approaches, we use an unsupervised approach. We iteratively learn the graph structure and the parameters of a $\mathsf{GNN}$ encoding the texts (text-to-graph) one sequence at a time while providing the supervision by decoding the graph into text (graph-to-text) and comparing the generated text to the input. We evaluate the approach by comparing the identified entities with annotated datasets, comparing the difference between the input and output texts, and comparing our generated graphs with those generated by state of the art methods.

Unsupervised Learning of Graph from Recipes

TL;DR

This work tackles unsupervised procedural understanding by converting cooking recipes into graphs that encode actions, ingredients, and locations to enable reasoning about sequences. It introduces a self-supervised pipeline with a text-to-graph component (Entity Identifier and Graph Structure Encoder) and a graph-to-text component (Transformer-based Decoder), trained via decoding graphs back into text and optimizing a joint loss . A key innovation is the continuous relaxation of adjacency via a Sinkhorn-based sparsification, producing sparse, discrete-like graphs while learning node embeddings from a cooking-domain prior. The approach also includes a Recurrent Graph Embedding to capture temporal progression, enabling the model to build graphs incrementally as the recipe unfolds. Empirical results on Now You're Cooking and the English Flow Corpus demonstrate strong entity identification and competitive text↔graph performance, highlighting the potential of unsupervised graph learning for procedural knowledge extraction and reasoning in automated agents.

Abstract

Cooking recipes are one of the most readily available kinds of procedural text. They consist of natural language instructions that can be challenging to interpret. In this paper, we propose a model to identify relevant information from recipes and generate a graph to represent the sequence of actions in the recipe. In contrast with other approaches, we use an unsupervised approach. We iteratively learn the graph structure and the parameters of a encoding the texts (text-to-graph) one sequence at a time while providing the supervision by decoding the graph into text (graph-to-text) and comparing the generated text to the input. We evaluate the approach by comparing the identified entities with annotated datasets, comparing the difference between the input and output texts, and comparing our generated graphs with those generated by state of the art methods.
Paper Structure (34 sections, 4 equations, 4 figures, 5 tables, 1 algorithm)

This paper contains 34 sections, 4 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Example of output graph with the associated recipe. The identified entities are in bold.
  • Figure 2: Overview of the proposed model. The first part, in blue, encodes the procedures and identifies the relevant entities (refer to Figure \ref{['fig:entity_identifier']} for more details). These are passed to the second part (in red) containing the Graph Structure Encoder module, which learns the adjacency matrix and the graph encoding (with cost $L_{gse}$) (refer to Figure \ref{['fig:recgraph']}) and the decoder that back-translate the graph into a recipe (with cost $L_{gen}$).
  • Figure 3: Details of the Entity Identifier. The recipe is parsed such that each sentence contains only one action whereas multiple ingredients and locations are permitted.
  • Figure 4: We exploit the ability of $\mathsf{GRU}$ at handling temporal dependencies to increase the expressive power of the iteratively constructed partial graphs. The $\mathsf{GRU}$ has two inputs: the current input (the (partial) graph at $t$ concatenated with $\mathbf{h}_{s_t}$) and the previous state (output of a $\mathsf{GRU}$ at $t-1$).