Table of Contents
Fetching ...

Privacy-Preserving Data Sharing in Agriculture: Enforcing Policy Rules for Secure and Confidential Data Synthesis

Anantaa Kotal, Lavanya Elluri, Deepti Gupta, Varun Mandalapu, Anupam Joshi

TL;DR

The paper tackles the challenge of sharing agricultural big data while complying with privacy regulations such as GDPR, EU Code of Conduct, and the EU AI Act. It proposes a policy-enforced privacy-preserving data generation framework that converts regulatory rules into machine-enforceable constraints via deontic-logic rule extraction and applies them to a tabular-data GAN framework (Privetab) with $t$-closeness. Empirical results on the ITM4Impact dataset show high fidelity and downstream utility with modest accuracy losses, along with reduced susceptibility to attribute inference and re-identification attacks when policy enforcement is applied. This work provides a practical approach to regulatory-compliant data sharing in agriculture and suggests avenues for cross-domain extension to health and other sectors.

Abstract

Big Data empowers the farming community with the information needed to optimize resource usage, increase productivity, and enhance the sustainability of agricultural practices. The use of Big Data in farming requires the collection and analysis of data from various sources such as sensors, satellites, and farmer surveys. While Big Data can provide the farming community with valuable insights and improve efficiency, there is significant concern regarding the security of this data as well as the privacy of the participants. Privacy regulations, such as the EU GDPR, the EU Code of Conduct on agricultural data sharing by contractual agreement, and the proposed EU AI law, have been created to address the issue of data privacy and provide specific guidelines on when and how data can be shared between organizations. To make confidential agricultural data widely available for Big Data analysis without violating the privacy of the data subjects, we consider privacy-preserving methods of data sharing in agriculture. Deep learning-based synthetic data generation has been proposed for privacy-preserving data sharing. However, there is a lack of compliance with documented data privacy policies in such privacy-preserving efforts. In this study, we propose a novel framework for enforcing privacy policy rules in privacy-preserving data generation algorithms. We explore several available agricultural codes of conduct, extract knowledge related to the privacy constraints in data, and use the extracted knowledge to define privacy bounds in a privacy-preserving generative model. We use our framework to generate synthetic agricultural data and present experimental results that demonstrate the utility of the synthetic dataset in downstream tasks. We also show that our framework can evade potential threats and secure data based on applicable regulatory policy rules.

Privacy-Preserving Data Sharing in Agriculture: Enforcing Policy Rules for Secure and Confidential Data Synthesis

TL;DR

The paper tackles the challenge of sharing agricultural big data while complying with privacy regulations such as GDPR, EU Code of Conduct, and the EU AI Act. It proposes a policy-enforced privacy-preserving data generation framework that converts regulatory rules into machine-enforceable constraints via deontic-logic rule extraction and applies them to a tabular-data GAN framework (Privetab) with -closeness. Empirical results on the ITM4Impact dataset show high fidelity and downstream utility with modest accuracy losses, along with reduced susceptibility to attribute inference and re-identification attacks when policy enforcement is applied. This work provides a practical approach to regulatory-compliant data sharing in agriculture and suggests avenues for cross-domain extension to health and other sectors.

Abstract

Big Data empowers the farming community with the information needed to optimize resource usage, increase productivity, and enhance the sustainability of agricultural practices. The use of Big Data in farming requires the collection and analysis of data from various sources such as sensors, satellites, and farmer surveys. While Big Data can provide the farming community with valuable insights and improve efficiency, there is significant concern regarding the security of this data as well as the privacy of the participants. Privacy regulations, such as the EU GDPR, the EU Code of Conduct on agricultural data sharing by contractual agreement, and the proposed EU AI law, have been created to address the issue of data privacy and provide specific guidelines on when and how data can be shared between organizations. To make confidential agricultural data widely available for Big Data analysis without violating the privacy of the data subjects, we consider privacy-preserving methods of data sharing in agriculture. Deep learning-based synthetic data generation has been proposed for privacy-preserving data sharing. However, there is a lack of compliance with documented data privacy policies in such privacy-preserving efforts. In this study, we propose a novel framework for enforcing privacy policy rules in privacy-preserving data generation algorithms. We explore several available agricultural codes of conduct, extract knowledge related to the privacy constraints in data, and use the extracted knowledge to define privacy bounds in a privacy-preserving generative model. We use our framework to generate synthetic agricultural data and present experimental results that demonstrate the utility of the synthetic dataset in downstream tasks. We also show that our framework can evade potential threats and secure data based on applicable regulatory policy rules.
Paper Structure (23 sections, 2 equations, 10 figures, 2 tables)

This paper contains 23 sections, 2 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Overview of Potential Threats.
  • Figure 2: EU Code of Conduct Vs. AI Act
  • Figure 3: Framework for Privacy-Preserving Data Generation in Agriculture with Privacy Policy Enforcement
  • Figure 4: Comparison of T-SNE projection of Original and Privacy-preserving Synthetic Data
  • Figure 5: Comparison of CDF for Original vs Synthetic data in Low Risk Attribute
  • ...and 5 more figures