French wine: Combination of multiple open data sources to mapping the expected harvest value
Martial Phélippé-Guinvarc'h
TL;DR
This work tackles the problem of mapping the harvest value of French wine by integrating multiple open data sources through a constrained optimization framework. It leverages official datasets (the vineyard registry and INAO quality-origin data) and auxiliary sources to recover surfaces by appellation and county, converting them to expected harvest values using Olympic-average yields and official price scales. The approach yields a large, open harvest-value database and regionally interpretable maps, revealing substantial heterogeneity across appellations and regions, with practical implications for risk assessment and crop-insurance planning. Limitations include data confidentiality constraints and the potential benefits of incorporating multi-year data and production volumes for richer analyses.
Abstract
The purpose of this paper is to estimate a representative and detailed map of the harvest value in wine using structured and unstructured open data sources. With climate change and new environmental and ecological policies, wine producers are facing new challenges. The ability to model the evolution of these risks is strategic for wine producers and research in order to adapt. Many research projects require the values exposed to risk. For example, to assess the economic impact of risks or the premium of crop insurance, or to choose between different agroecological solutions in a cost-benefit approach. The high spatial heterogeneity and complexity of wine characteristics add to the challenge of these production values and the need to improve our spatial assessment of these harvest-expected values.Structured, exhaustive and detailed historical data are collected by the customs services, but they are not open. To achieve this, we combine the aggregate of the vineyard register and the data of the Public Body for Products of Official Quality and Origin. There are several techniques available to merge, combine or complete missing data. We have chosen to use optimization methods to re-estimate the area by appellation and by county, which can then be converted into expected harvest values using olympic average yields by appellation and crop insurance prices. This approach allows us to capture the heterogeneity in production values faced by different vineyards, thereby facilitating further research on risk assessment in the wine industry.
