Table of Contents
Fetching ...

RecipeGen: A Benchmark for Real-World Recipe Image Generation

Ruoxuan Zhang, Hongxia Xie, Yi Yao, Jian-Yu Jiang-Lin, Bin Wen, Ling Lo, Hong-Han Shuai, Yung-Hui Li, Wen-Huang Cheng

TL;DR

This work tackles the lack of real-world data linking culinary goals, stepwise instructions, and corresponding images for recipe image generation. It introduces RecipeGen Benchmark (RGB), a large-scale, real-world dataset with 21,944 recipes and 139,872 step images, guided by 158 keywords to cover diverse cuisines and techniques, enabling robust text-to-image evaluation. A rigorous construction pipeline combines large-scale collection from user-generated recipes, GPT-4o-driven step merging and captioning, and human verification to ensure faithfulness of goals and steps. GF and SF metrics show RGB provides superior alignment between instructions and visuals compared with prior datasets, demonstrating its value for training and evaluating illustrated instruction models in real-world cooking scenarios. Overall, RGB is poised to advance realistic, interactive culinary AI applications by supporting long-step sequences and broad regional diversity in recipe imagery.

Abstract

Recipe image generation is an important challenge in food computing, with applications from culinary education to interactive recipe platforms. However, there is currently no real-world dataset that comprehensively connects recipe goals, sequential steps, and corresponding images. To address this, we introduce RecipeGen, the first real-world goal-step-image benchmark for recipe generation, featuring diverse ingredients, varied recipe steps, multiple cooking styles, and a broad collection of food categories. Data is in https://github.com/zhangdaxia22/RecipeGen.

RecipeGen: A Benchmark for Real-World Recipe Image Generation

TL;DR

This work tackles the lack of real-world data linking culinary goals, stepwise instructions, and corresponding images for recipe image generation. It introduces RecipeGen Benchmark (RGB), a large-scale, real-world dataset with 21,944 recipes and 139,872 step images, guided by 158 keywords to cover diverse cuisines and techniques, enabling robust text-to-image evaluation. A rigorous construction pipeline combines large-scale collection from user-generated recipes, GPT-4o-driven step merging and captioning, and human verification to ensure faithfulness of goals and steps. GF and SF metrics show RGB provides superior alignment between instructions and visuals compared with prior datasets, demonstrating its value for training and evaluating illustrated instruction models in real-world cooking scenarios. Overall, RGB is poised to advance realistic, interactive culinary AI applications by supporting long-step sequences and broad regional diversity in recipe imagery.

Abstract

Recipe image generation is an important challenge in food computing, with applications from culinary education to interactive recipe platforms. However, there is currently no real-world dataset that comprehensively connects recipe goals, sequential steps, and corresponding images. To address this, we introduce RecipeGen, the first real-world goal-step-image benchmark for recipe generation, featuring diverse ingredients, varied recipe steps, multiple cooking styles, and a broad collection of food categories. Data is in https://github.com/zhangdaxia22/RecipeGen.

Paper Structure

This paper contains 7 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The distribution of region and dessert keywords in RecipeGen Benchmark.
  • Figure 2: Some examples in RecipeGen. Each sample contains its goal, steps, and the images corresponding to each step.
  • Figure 3: Quality Control Prompt for GPT-4o. We utilize GPT-4o to merge steps and generate captions.
  • Figure 4: Dataset Construction Procedure. We first analyze the characteristics of the dishes and select 158 keywords. Subsequently, we utilize GPT-4o to perform quality control by omitting irrelevant steps, merging adjacent simple actions, and generating captions. Finally, we calculate metrics and conduct human checks to ensure the usability of the dataset.