Food Portion Estimation via 3D Object Scaling

Gautham Vinod; Jiangpeng He; Zeman Shao; Fengqing Zhu

Food Portion Estimation via 3D Object Scaling

Gautham Vinod, Jiangpeng He, Zeman Shao, Fengqing Zhu

TL;DR

This paper proposes a new framework to estimate both food volume and energy from 2D images by leveraging the power of 3D food models and physical reference in the eating scene by leveraging the power of 3D food models and physical reference in the eating scene.

Abstract

Image-based methods to analyze food images have alleviated the user burden and biases associated with traditional methods. However, accurate portion estimation remains a major challenge due to the loss of 3D information in the 2D representation of foods captured by smartphone cameras or wearable devices. In this paper, we propose a new framework to estimate both food volume and energy from 2D images by leveraging the power of 3D food models and physical reference in the eating scene. Our method estimates the pose of the camera and the food object in the input image and recreates the eating occasion by rendering an image of a 3D model of the food with the estimated poses. We also introduce a new dataset, SimpleFood45, which contains 2D images of 45 food items and associated annotations including food volume, weight, and energy. Our method achieves an average error of 31.10 kCal (17.67%) on this dataset, outperforming existing portion estimation methods. The dataset can be accessed at: https://lorenz.ecn.purdue.edu/~gvinod/simplefood45/ and the code can be accessed at: https://gitlab.com/viper-purdue/monocular-food-volume-3d

Food Portion Estimation via 3D Object Scaling

TL;DR

Abstract

Paper Structure (16 sections, 15 equations, 3 figures, 4 tables)

This paper contains 16 sections, 15 equations, 3 figures, 4 tables.

Introduction
Related Works
Method
Object Detection and Segmentation Module
Pose Estimation Module
Camera Pose Estimation
Object Pose Estimation
Rendering Module
Volume Estimation
SimpleFood45 Dataset Collection
Experimental Results
Comparison With Other Methods
Generalization to Other Datasets
Ablation Analysis
Discussion
...and 1 more sections

Figures (3)

Figure 1: Overview of proposed method. The system is divided into 3 modules, the Object Detection and Segmentation Module which uses the 2D image as an input and outputs a segmentation mask. The Pose Estimation Module estimates the camera pose and the orientation and translation of the food. The Rendering Module loads the 3D model based on the class label provided by the Object Detection and Segmentation Module and the pose parameters from the Pose Estimation Module to render an image of the food(s) in the input image. The size of the binary masks of the rendered image and the input image are compared to re-scale the 3D model to obtain the estimated volume.
Figure 2: The result of rendering an image of an apple based on the estimated pose of the input image using the 3D model of the apple.
Figure 3: Image samples from the SimpleFood45 dataset. The samples feature different food types and different camera and object poses

Food Portion Estimation via 3D Object Scaling

TL;DR

Abstract

Food Portion Estimation via 3D Object Scaling

Authors

TL;DR

Abstract

Table of Contents

Figures (3)