Table of Contents
Fetching ...

NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning

Zheyuan Zhang, Yiyang Li, Nhi Ha Lan Le, Zehong Wang, Tianyi Ma, Vincent Galassi, Keerthiram Murugesan, Nuno Moniz, Werner Geyer, Nitesh V Chawla, Chuxu Zhang, Yanfang Ye

TL;DR

NGQA tackles the challenge of personalized nutritional reasoning by introducing a graph-based QA benchmark built from NHANES and FNDDS data. By modeling foods, health indicators, and user profiles as a knowledge graph, NGQA enables reasoning about whether a food is healthy for a given user, with explanations of key contributing nutrients. The framework defines three question complexities and three downstream tasks (-B, -ML, -TG), accompanied by a thorough experimental study across multiple Graph-RAG baselines and LLM backbones, revealing strengths and limitations of current approaches in domain-specific, health-aware contexts. The benchmark provides a complete codebase and emphasizes realistic evaluation, aiming to advance GraphQA research and personalized nutrition reasoning for real-world health applications.

Abstract

Diet plays a critical role in human health, yet tailoring dietary reasoning to individual health conditions remains a major challenge. Nutrition Question Answering (QA) has emerged as a popular method for addressing this problem. However, current research faces two critical limitations. On one hand, the absence of datasets involving user-specific medical information severely limits \textit{personalization}. This challenge is further compounded by the wide variability in individual health needs. On the other hand, while large language models (LLMs), a popular solution for this task, demonstrate strong reasoning abilities, they struggle with the domain-specific complexities of personalized healthy dietary reasoning, and existing benchmarks fail to capture these challenges. To address these gaps, we introduce the Nutritional Graph Question Answering (NGQA) benchmark, the first graph question answering dataset designed for personalized nutritional health reasoning. NGQA leverages data from the National Health and Nutrition Examination Survey (NHANES) and the Food and Nutrient Database for Dietary Studies (FNDDS) to evaluate whether a food is healthy for a specific user, supported by explanations of the key contributing nutrients. The benchmark incorporates three question complexity settings and evaluates reasoning across three downstream tasks. Extensive experiments with LLM backbones and baseline models demonstrate that the NGQA benchmark effectively challenges existing models. In sum, NGQA addresses a critical real-world problem while advancing GraphQA research with a novel domain-specific benchmark.

NGQA: A Nutritional Graph Question Answering Benchmark for Personalized Health-aware Nutritional Reasoning

TL;DR

NGQA tackles the challenge of personalized nutritional reasoning by introducing a graph-based QA benchmark built from NHANES and FNDDS data. By modeling foods, health indicators, and user profiles as a knowledge graph, NGQA enables reasoning about whether a food is healthy for a given user, with explanations of key contributing nutrients. The framework defines three question complexities and three downstream tasks (-B, -ML, -TG), accompanied by a thorough experimental study across multiple Graph-RAG baselines and LLM backbones, revealing strengths and limitations of current approaches in domain-specific, health-aware contexts. The benchmark provides a complete codebase and emphasizes realistic evaluation, aiming to advance GraphQA research and personalized nutrition reasoning for real-world health applications.

Abstract

Diet plays a critical role in human health, yet tailoring dietary reasoning to individual health conditions remains a major challenge. Nutrition Question Answering (QA) has emerged as a popular method for addressing this problem. However, current research faces two critical limitations. On one hand, the absence of datasets involving user-specific medical information severely limits \textit{personalization}. This challenge is further compounded by the wide variability in individual health needs. On the other hand, while large language models (LLMs), a popular solution for this task, demonstrate strong reasoning abilities, they struggle with the domain-specific complexities of personalized healthy dietary reasoning, and existing benchmarks fail to capture these challenges. To address these gaps, we introduce the Nutritional Graph Question Answering (NGQA) benchmark, the first graph question answering dataset designed for personalized nutritional health reasoning. NGQA leverages data from the National Health and Nutrition Examination Survey (NHANES) and the Food and Nutrient Database for Dietary Studies (FNDDS) to evaluate whether a food is healthy for a specific user, supported by explanations of the key contributing nutrients. The benchmark incorporates three question complexity settings and evaluates reasoning across three downstream tasks. Extensive experiments with LLM backbones and baseline models demonstrate that the NGQA benchmark effectively challenges existing models. In sum, NGQA addresses a critical real-world problem while advancing GraphQA research with a novel domain-specific benchmark.

Paper Structure

This paper contains 33 sections, 12 figures, 20 tables.

Figures (12)

  • Figure 1: An Overview of NGQA Benchmark (a) along with a data showcase: (b) an example of the knowledge graph used for a standard level question and (c) the question and the answer of that question under the multi-label classification task (-ML) settings.
  • Figure 2: The NGQA benchmark construction process. Each stage shown in the figure is detailed in Section 3.For example, "User Data Collection" block, is introduced in Section 3.1 under the paragraph titled User Data Collection.
  • Figure 3: The illustration of different question levels and task levels.
  • Figure 4: Efficiency analysis of the five baseline methods across three tasks.
  • Figure 5: Retrieval quality of ToG vs. Plain across three types of questions on recall, precision and F1.
  • ...and 7 more figures