Food for thought: How can machine learning help better predict and understand changes in food prices?
Kristina L. Kupferschmidt, James Requiema, Mya Simpson, Zohrah Varsallay, Ethan Jackson, Cody Kupferschmidt, Sara El-Shawa, Graham W. Taylor
TL;DR
The paper tackles predicting Canadian food price inflation and understanding how context and data curation affect forecast accuracy in CFPR 2025. It proposes a data-centric, human-in-the-loop forecasting framework combining domain-expert data curation and LLM-assisted curation across multiple model families (statistical, DL, Transformer, foundation, and LLMs). Key results show that climate and geopolitical regressors often yield the strongest gains, ensembles of top-performing models outperform baselines, and LLM inputs can improve or harm stability depending on prompting and input design. The work demonstrates practical improvements for forecasting Canada's food prices and outlines directions for integrating context-aware prompting and routing strategies to further enhance accuracy.
Abstract
In this work, we address a lack of systematic understanding of fluctuations in food affordability in Canada. Canada's Food Price Report (CPFR) is an annual publication that predicts food inflation over the next calendar year. The published predictions are a collaborative effort between forecasting teams that each employ their own approach at Canadian Universities: Dalhousie University, the University of British Columbia, the University of Saskatchewan, and the University of Guelph/Vector Institute. While the University of Guelph/Vector Institute forecasting team has leveraged machine learning (ML) in previous reports, the most recent editions (2024--2025) have also included a human-in-the-loop approach. For the 2025 report, this focus was expanded to evaluate several different data-centric approaches to improve forecast accuracy. In this study, we evaluate how different types of forecasting models perform when estimating food price fluctuations. We also examine the sensitivity of models that curate time series data representing key factors in food pricing.
