RecipeGen: A Benchmark for Real-World Recipe Image Generation
Ruoxuan Zhang, Hongxia Xie, Yi Yao, Jian-Yu Jiang-Lin, Bin Wen, Ling Lo, Hong-Han Shuai, Yung-Hui Li, Wen-Huang Cheng
TL;DR
This work tackles the lack of real-world data linking culinary goals, stepwise instructions, and corresponding images for recipe image generation. It introduces RecipeGen Benchmark (RGB), a large-scale, real-world dataset with 21,944 recipes and 139,872 step images, guided by 158 keywords to cover diverse cuisines and techniques, enabling robust text-to-image evaluation. A rigorous construction pipeline combines large-scale collection from user-generated recipes, GPT-4o-driven step merging and captioning, and human verification to ensure faithfulness of goals and steps. GF and SF metrics show RGB provides superior alignment between instructions and visuals compared with prior datasets, demonstrating its value for training and evaluating illustrated instruction models in real-world cooking scenarios. Overall, RGB is poised to advance realistic, interactive culinary AI applications by supporting long-step sequences and broad regional diversity in recipe imagery.
Abstract
Recipe image generation is an important challenge in food computing, with applications from culinary education to interactive recipe platforms. However, there is currently no real-world dataset that comprehensively connects recipe goals, sequential steps, and corresponding images. To address this, we introduce RecipeGen, the first real-world goal-step-image benchmark for recipe generation, featuring diverse ingredients, varied recipe steps, multiple cooking styles, and a broad collection of food categories. Data is in https://github.com/zhangdaxia22/RecipeGen.
