Table of Contents
Fetching ...

MFP3D: Monocular Food Portion Estimation Leveraging 3D Point Clouds

Jinge Ma, Xiaoyan Zhang, Gautham Vinod, Siddeshwar Raghavan, Jiangpeng He, Fengqing Zhu

TL;DR

This paper introduces MFP3D, a new framework for accurate food portion estimation using only a single monocular image that is evaluated on MetaFood3D dataset, demonstrating its significant improvement in accurate portion estimation over existing methods.

Abstract

Food portion estimation is crucial for monitoring health and tracking dietary intake. Image-based dietary assessment, which involves analyzing eating occasion images using computer vision techniques, is increasingly replacing traditional methods such as 24-hour recalls. However, accurately estimating the nutritional content from images remains challenging due to the loss of 3D information when projecting to the 2D image plane. Existing portion estimation methods are challenging to deploy in real-world scenarios due to their reliance on specific requirements, such as physical reference objects, high-quality depth information, or multi-view images and videos. In this paper, we introduce MFP3D, a new framework for accurate food portion estimation using only a single monocular image. Specifically, MFP3D consists of three key modules: (1) a 3D Reconstruction Module that generates a 3D point cloud representation of the food from the 2D image, (2) a Feature Extraction Module that extracts and concatenates features from both the 3D point cloud and the 2D RGB image, and (3) a Portion Regression Module that employs a deep regression model to estimate the food's volume and energy content based on the extracted features. Our MFP3D is evaluated on MetaFood3D dataset, demonstrating its significant improvement in accurate portion estimation over existing methods.

MFP3D: Monocular Food Portion Estimation Leveraging 3D Point Clouds

TL;DR

This paper introduces MFP3D, a new framework for accurate food portion estimation using only a single monocular image that is evaluated on MetaFood3D dataset, demonstrating its significant improvement in accurate portion estimation over existing methods.

Abstract

Food portion estimation is crucial for monitoring health and tracking dietary intake. Image-based dietary assessment, which involves analyzing eating occasion images using computer vision techniques, is increasingly replacing traditional methods such as 24-hour recalls. However, accurately estimating the nutritional content from images remains challenging due to the loss of 3D information when projecting to the 2D image plane. Existing portion estimation methods are challenging to deploy in real-world scenarios due to their reliance on specific requirements, such as physical reference objects, high-quality depth information, or multi-view images and videos. In this paper, we introduce MFP3D, a new framework for accurate food portion estimation using only a single monocular image. Specifically, MFP3D consists of three key modules: (1) a 3D Reconstruction Module that generates a 3D point cloud representation of the food from the 2D image, (2) a Feature Extraction Module that extracts and concatenates features from both the 3D point cloud and the 2D RGB image, and (3) a Portion Regression Module that employs a deep regression model to estimate the food's volume and energy content based on the extracted features. Our MFP3D is evaluated on MetaFood3D dataset, demonstrating its significant improvement in accurate portion estimation over existing methods.

Paper Structure

This paper contains 10 sections, 7 equations, 2 figures, 5 tables.

Figures (2)

  • Figure 1: An overview of the MFP3D framework: The input image $x_I$ goes through a three-stage pipeline for accurate portion estimation. In Stage 1, a 3D reconstructor is used to generate the point clouds from the input image. In Stage 2, the 3D features ($f_P$) of the point cloud and the 2D features ($f_I$) of the input image are extracted using networks $\delta_P$ and $\delta_I$, respectively. In Stage 3, these features are concatenated and passed through a regression network ($\varphi$) to estimate the food portion.
  • Figure 2: An overview of (a) Ground Truth Point Clouds (GTPC), (b) Normalized GTPCs and (c) Reconstructed Point Clouds, utilized in our experiments.