Table of Contents
Fetching ...

Shape-Preserving Generation of Food Images for Automatic Dietary Assessment

Guangzong Chen, Zhi-Hong Mao, Mingui Sun, Kangni Liu, Wenyan Jia

TL;DR

This paper tackles the data bottleneck in automatic dietary assessment by presenting a shape-preserving, conditional GAN that can generate realistic food images while retaining the shape of a reference and allowing category control. The architecture combines an encoder to extract shape features, a generator conditioned on a latent texture code and category label, and a discriminator with texture references, trained via adversarial and reconstruction losses. Key contributions include a compact three-component model that preserves food and container shapes, supports category conditioning, and demonstrates superior realism (FID) and shape fidelity (IoU) compared to baselines on multiple datasets. The method can augment training data for both food recognition and volume estimation, potentially improving the accuracy and scalability of image-based dietary assessment systems.

Abstract

Traditional dietary assessment methods heavily rely on self-reporting, which is time-consuming and prone to bias. Recent advancements in Artificial Intelligence (AI) have revealed new possibilities for dietary assessment, particularly through analysis of food images. Recognizing foods and estimating food volumes from images are known as the key procedures for automatic dietary assessment. However, both procedures required large amounts of training images labeled with food names and volumes, which are currently unavailable. Alternatively, recent studies have indicated that training images can be artificially generated using Generative Adversarial Networks (GANs). Nonetheless, convenient generation of large amounts of food images with known volumes remain a challenge with the existing techniques. In this work, we present a simple GAN-based neural network architecture for conditional food image generation. The shapes of the food and container in the generated images closely resemble those in the reference input image. Our experiments demonstrate the realism of the generated images and shape-preserving capabilities of the proposed framework.

Shape-Preserving Generation of Food Images for Automatic Dietary Assessment

TL;DR

This paper tackles the data bottleneck in automatic dietary assessment by presenting a shape-preserving, conditional GAN that can generate realistic food images while retaining the shape of a reference and allowing category control. The architecture combines an encoder to extract shape features, a generator conditioned on a latent texture code and category label, and a discriminator with texture references, trained via adversarial and reconstruction losses. Key contributions include a compact three-component model that preserves food and container shapes, supports category conditioning, and demonstrates superior realism (FID) and shape fidelity (IoU) compared to baselines on multiple datasets. The method can augment training data for both food recognition and volume estimation, potentially improving the accuracy and scalability of image-based dietary assessment systems.

Abstract

Traditional dietary assessment methods heavily rely on self-reporting, which is time-consuming and prone to bias. Recent advancements in Artificial Intelligence (AI) have revealed new possibilities for dietary assessment, particularly through analysis of food images. Recognizing foods and estimating food volumes from images are known as the key procedures for automatic dietary assessment. However, both procedures required large amounts of training images labeled with food names and volumes, which are currently unavailable. Alternatively, recent studies have indicated that training images can be artificially generated using Generative Adversarial Networks (GANs). Nonetheless, convenient generation of large amounts of food images with known volumes remain a challenge with the existing techniques. In this work, we present a simple GAN-based neural network architecture for conditional food image generation. The shapes of the food and container in the generated images closely resemble those in the reference input image. Our experiments demonstrate the realism of the generated images and shape-preserving capabilities of the proposed framework.
Paper Structure (14 sections, 3 equations, 9 figures, 3 tables)

This paper contains 14 sections, 3 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Our network architecture includes three major components, encoder $E$, generator $G$, and discriminator $D$. The encoder produces shape-related features $f$ from the image $I^{\text{s}}$. The generator takes features $f$, latent variable $z$, and category label $c$ as conditional inputs and create output image $y$. The discriminator is used to evaluate the realism of the output image. Loss functions $L_{\text{adv}}$ and $L_{\text{R}}$ are used for training the network.
  • Figure 2: The network structure of the encoder.
  • Figure 3: Image examples generated by our network using VireoFood-172 dataset: The first column shows the original input images, and subsequent columns display images created by varying the latent variable $z$ while keeping the corresponding input image from the first column fixed.
  • Figure 4: Image examples generated by StyleGAN3: (a) with round-shaped containers and (b) with irregular-shaped containers.
  • Figure 5: Image examples generated by our model: The first column is the input images, and the rest are generated images.
  • ...and 4 more figures