FoodTaxo: Generating Food Taxonomies with Large Language Models
Pascal Wullschleger, Majid Zarharan, Donnacha Daly, Marc Pouly, Jennifer Foster
TL;DR
FoodTaxo investigates automated taxonomy generation and completion for the food-technology domain using Large Language Models. The approach employs seed-based completion and seed-free generation in an iterative, bottom-up framework with prompting, retrieval, and NLI-based verification, augmented by backtracking to enforce structural validity. Empirical results across five taxonomies show the LLM-based methods can match or exceed state-of-the-art completion on several datasets and produce competitive, reference-free assessments for generated taxonomies, though placement of non-leaf concepts remains challenging. The work demonstrates potential for scalable, plug-and-play taxonomy construction beyond food, but highlights the need for improved non-leaf placement accuracy and further methodological refinements to reach production-level utility.
Abstract
We investigate the utility of Large Language Models for automated taxonomy generation and completion specifically applied to taxonomies from the food technology industry. We explore the extent to which taxonomies can be completed from a seed taxonomy or generated without a seed from a set of known concepts, in an iterative fashion using recent prompting techniques. Experiments on five taxonomies using an open-source LLM (Llama-3), while promising, point to the difficulty of correctly placing inner nodes.
