Table of Contents
Fetching ...

Box2Flow: Instance-based Action Flow Graphs from Videos

Jiatong Li, Kalliopi Basioti, Vladimir Pavlovic

TL;DR

Box2Flow is proposed, an instance-based method to predict a step flow graph from a given procedural video, and can extract bounding boxes from videos, predict pairwise edge probabilities between step pairs, and build the flow graph with a spanning tree algorithm.

Abstract

A large amount of procedural videos on the web show how to complete various tasks. These tasks can often be accomplished in different ways and step orderings, with some steps able to be performed simultaneously, while others are constrained to be completed in a specific order. Flow graphs can be used to illustrate the step relationships of a task. Current task-based methods try to learn a single flow graph for all available videos of a specific task. The extracted flow graphs tend to be too abstract, failing to capture detailed step descriptions. In this work, our aim is to learn accurate and rich flow graphs by extracting them from a single video. We propose Box2Flow, an instance-based method to predict a step flow graph from a given procedural video. In detail, we extract bounding boxes from videos, predict pairwise edge probabilities between step pairs, and build the flow graph with a spanning tree algorithm. Experiments on MM-ReS and YouCookII show our method can extract flow graphs effectively.

Box2Flow: Instance-based Action Flow Graphs from Videos

TL;DR

Box2Flow is proposed, an instance-based method to predict a step flow graph from a given procedural video, and can extract bounding boxes from videos, predict pairwise edge probabilities between step pairs, and build the flow graph with a spanning tree algorithm.

Abstract

A large amount of procedural videos on the web show how to complete various tasks. These tasks can often be accomplished in different ways and step orderings, with some steps able to be performed simultaneously, while others are constrained to be completed in a specific order. Flow graphs can be used to illustrate the step relationships of a task. Current task-based methods try to learn a single flow graph for all available videos of a specific task. The extracted flow graphs tend to be too abstract, failing to capture detailed step descriptions. In this work, our aim is to learn accurate and rich flow graphs by extracting them from a single video. We propose Box2Flow, an instance-based method to predict a step flow graph from a given procedural video. In detail, we extract bounding boxes from videos, predict pairwise edge probabilities between step pairs, and build the flow graph with a spanning tree algorithm. Experiments on MM-ReS and YouCookII show our method can extract flow graphs effectively.
Paper Structure (12 sections, 13 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 12 sections, 13 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: (a), (c): Two recipes for making tacos that differ in ingredients, actions, and number of steps. (b), (d): The corresponding flow graphs of the two recipes.
  • Figure 2: Overview of our method. We first predict the edge probabilities for all step segment pairs then create the flow graph using a spanning tree algorithm from the probability matrix.
  • Figure 3: An example of ground truth and predicted flow graphs where the recall and precision are high but very different structurally. The recipe is Peanut Butter and Jelly Sandwich from MM-Res.
  • Figure 4: An example from MM-ReS. The text-only model did not predict the graph correctly, while the image+text model did. The interpolated images are marked in blue.
  • Figure 5: An example from YouCookII. The predicted captions are in red in (a). Video captioning and the text model did not predict the graph correctly, while the video and video+text models did.
  • ...and 1 more figures