Shape-Preserving Generation of Food Images for Automatic Dietary Assessment
Guangzong Chen, Zhi-Hong Mao, Mingui Sun, Kangni Liu, Wenyan Jia
TL;DR
This paper tackles the data bottleneck in automatic dietary assessment by presenting a shape-preserving, conditional GAN that can generate realistic food images while retaining the shape of a reference and allowing category control. The architecture combines an encoder to extract shape features, a generator conditioned on a latent texture code and category label, and a discriminator with texture references, trained via adversarial and reconstruction losses. Key contributions include a compact three-component model that preserves food and container shapes, supports category conditioning, and demonstrates superior realism (FID) and shape fidelity (IoU) compared to baselines on multiple datasets. The method can augment training data for both food recognition and volume estimation, potentially improving the accuracy and scalability of image-based dietary assessment systems.
Abstract
Traditional dietary assessment methods heavily rely on self-reporting, which is time-consuming and prone to bias. Recent advancements in Artificial Intelligence (AI) have revealed new possibilities for dietary assessment, particularly through analysis of food images. Recognizing foods and estimating food volumes from images are known as the key procedures for automatic dietary assessment. However, both procedures required large amounts of training images labeled with food names and volumes, which are currently unavailable. Alternatively, recent studies have indicated that training images can be artificially generated using Generative Adversarial Networks (GANs). Nonetheless, convenient generation of large amounts of food images with known volumes remain a challenge with the existing techniques. In this work, we present a simple GAN-based neural network architecture for conditional food image generation. The shapes of the food and container in the generated images closely resemble those in the reference input image. Our experiments demonstrate the realism of the generated images and shape-preserving capabilities of the proposed framework.
