Table of Contents
Fetching ...

Privacy-Preserving Data Linkage Across Private and Public Datasets for Collaborative Agriculture Research

Osama Zafar, Rosemarie Santa Gonzalez, Gabriel Wilkins, Alfonso Morales, Erman Ayday

TL;DR

The paper tackles privacy risks in sharing environmental, pricing, and sales data for digital agriculture and proposes a privacy-preserving framework enabling secure linkage between private farmer-market data and public datasets. The core method deploys a centralized sandbox with a global PCA model $M_s$ trained on public data $D_s$, where private data are transformed to $O_i$ and protected by Laplacian noise to yield $C_i$ under $\\epsilon$-LDP, allowing identification of matching farmers without exposing raw data. Researchers then query aggregates inside the sandbox and apply clustering on the DP-transformed space (e.g., $K$-means on the PCA-transformed data) to relate private pricing signals to public datasets such as food insecurity. Empirical evaluation on a Wisconsin Farmer's Market dataset demonstrates a privacy-utility trade-off with optimal $\\epsilon$ values (e.g., $\\epsilon$ = 25 for Logistic Regression, 35 for Naive Bayes and SVM) and shows the framework enables ML-driven insights for pricing, sales, and policy analysis, advancing secure data integration in digital agriculture.

Abstract

Digital agriculture leverages technology to enhance crop yield, disease resilience, and soil health, playing a critical role in agricultural research. However, it raises privacy concerns such as adverse pricing, price discrimination, higher insurance costs, and manipulation of resources, deterring farm operators from sharing data due to potential misuse. This study introduces a privacy-preserving framework that addresses these risks while allowing secure data sharing for digital agriculture. Our framework enables comprehensive data analysis while protecting privacy. It allows stakeholders to harness research-driven policies that link public and private datasets. The proposed algorithm achieves this by: (1) identifying similar farmers based on private datasets, (2) providing aggregate information like time and location, (3) determining trends in price and product availability, and (4) correlating trends with public policy data, such as food insecurity statistics. We validate the framework with real-world Farmer's Market datasets, demonstrating its efficacy through machine learning models trained on linked privacy-preserved data. The results support policymakers and researchers in addressing food insecurity and pricing issues. This work significantly contributes to digital agriculture by providing a secure method for integrating and analyzing data, driving advancements in agricultural technology and development.

Privacy-Preserving Data Linkage Across Private and Public Datasets for Collaborative Agriculture Research

TL;DR

The paper tackles privacy risks in sharing environmental, pricing, and sales data for digital agriculture and proposes a privacy-preserving framework enabling secure linkage between private farmer-market data and public datasets. The core method deploys a centralized sandbox with a global PCA model trained on public data , where private data are transformed to and protected by Laplacian noise to yield under -LDP, allowing identification of matching farmers without exposing raw data. Researchers then query aggregates inside the sandbox and apply clustering on the DP-transformed space (e.g., -means on the PCA-transformed data) to relate private pricing signals to public datasets such as food insecurity. Empirical evaluation on a Wisconsin Farmer's Market dataset demonstrates a privacy-utility trade-off with optimal values (e.g., = 25 for Logistic Regression, 35 for Naive Bayes and SVM) and shows the framework enables ML-driven insights for pricing, sales, and policy analysis, advancing secure data integration in digital agriculture.

Abstract

Digital agriculture leverages technology to enhance crop yield, disease resilience, and soil health, playing a critical role in agricultural research. However, it raises privacy concerns such as adverse pricing, price discrimination, higher insurance costs, and manipulation of resources, deterring farm operators from sharing data due to potential misuse. This study introduces a privacy-preserving framework that addresses these risks while allowing secure data sharing for digital agriculture. Our framework enables comprehensive data analysis while protecting privacy. It allows stakeholders to harness research-driven policies that link public and private datasets. The proposed algorithm achieves this by: (1) identifying similar farmers based on private datasets, (2) providing aggregate information like time and location, (3) determining trends in price and product availability, and (4) correlating trends with public policy data, such as food insecurity statistics. We validate the framework with real-world Farmer's Market datasets, demonstrating its efficacy through machine learning models trained on linked privacy-preserved data. The results support policymakers and researchers in addressing food insecurity and pricing issues. This work significantly contributes to digital agriculture by providing a secure method for integrating and analyzing data, driving advancements in agricultural technology and development.
Paper Structure (11 sections, 3 equations, 9 figures, 2 tables)

This paper contains 11 sections, 3 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Illustrates a high-level architecture of the framework: Researcher sends a trained PCA model to farmers, who then transform their data and share it back with the researcher.
  • Figure 2: Detailed architecture of the framework.
  • Figure 3: PCA Transformed Farmer's Market Dataset Clustering.
  • Figure 4: Price of Potatoes in 2018.
  • Figure 5: Insecurity level measured in population (percentage) over time.
  • ...and 4 more figures