NutriTransform: Estimating Nutritional Information From Online Food Posts
Thorsten Ruprechter, Marion Garaus, Ivo Ponocny, Denis Helic
TL;DR
NutriTransform tackles the problem of estimating macro-nutrient content from short online post titles, where explicit nutrition data is unavailable. It combines a public USDA food database with SentenceTransformer embeddings to map titles to semantically similar foods and aggregates their nutrition, tuned on a labeled recipe dataset. The approach achieves competitive RMSE relative to an API-based baseline and is applied to over 500k Reddit r/food posts to uncover longitudinal dietary trends. The work provides a practical, scalable tool for nutrition inference from text and opens avenues for computational social science and health research using minimal textual data.
Abstract
Deriving nutritional information from online food posts is challenging, particularly when users do not explicitly log the macro-nutrients of a shared meal. In this work, we present an efficient and straightforward approach to approximating macro-nutrients based solely on the titles of food posts. Our method combines a public food database from the U.S. Department of Agriculture with advanced text embedding techniques. We evaluate the approach on a labeled food dataset, demonstrating its effectiveness, and apply it to over 500,000 real-world posts from Reddit's popular /r/food subreddit to uncover trends in food-sharing behavior based on the estimated macro-nutrient content. Altogether, this work lays a foundation for researchers and practitioners aiming to estimate caloric and nutritional content using only text data.
