Real-Time Cooked Food Image Synthesis and Visual Cooking Progress Monitoring on Edge Devices

Jigyasa Gupta; Soumya Goyal; Anil Kumar; Ishan Jindal

Real-Time Cooked Food Image Synthesis and Visual Cooking Progress Monitoring on Edge Devices

Jigyasa Gupta, Soumya Goyal, Anil Kumar, Ishan Jindal

TL;DR

This work tackles real-time, on-device synthesis of cooked-food images conditioned on raw inputs, recipe, and desired doneness. It introduces an edge-efficient FiLM-conditioned U-Net generator guided by sinusoidal recipe-state embeddings and trained with a domain-specific Culinary Image Similarity (CIS) metric, enabling temporally coherent visual progression and a stopping signal for cooking. A novel oven-based progression dataset (1708 sessions, 30 recipes) supports evaluation, where the proposed method achieves state-of-the-art FID/LPIPS with about 8.68M parameters and real-time inference, while CIS provides a robust, on-device progress indicator and training signal. The results demonstrate practical potential for intelligent kitchen appliances, offering interpretable, user-preference-driven visual feedback and a foundation for broader multimodal cooking intelligence.

Abstract

Synthesizing realistic cooked food images from raw inputs on edge devices is a challenging generative task, requiring models to capture complex changes in texture, color and structure during cooking. Existing image-to-image generation methods often produce unrealistic results or are too resource-intensive for edge deployment. We introduce the first oven-based cooking-progression dataset with chef-annotated doneness levels and propose an edge-efficient recipe and cooking state guided generator that synthesizes realistic food images conditioned on raw food image. This formulation enables user-preferred visual targets rather than fixed presets. To ensure temporal consistency and culinary plausibility, we introduce a domain-specific \textit{Culinary Image Similarity (CIS)} metric, which serves both as a training loss and a progress-monitoring signal. Our model outperforms existing baselines with significant reductions in FID scores (30\% improvement on our dataset; 60\% on public datasets)

Real-Time Cooked Food Image Synthesis and Visual Cooking Progress Monitoring on Edge Devices

TL;DR

Abstract

Real-Time Cooked Food Image Synthesis and Visual Cooking Progress Monitoring on Edge Devices

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)