P3LS: Partial Least Squares under Privacy Preservation
Du Nguyen Duy, Ramin Nikzad-Langerodi
TL;DR
The paper tackles privacy barriers in cross-organizational data analytics for value chains. It proposes Privacy-Preserving Partial Least Squares (P3LS), a federated, SVD-based PLS algorithm that uses trusted-authority generated random masks to enable vertical data integration while protecting data contributors. The authors demonstrate cross-party integration across three hypothetical partners and show improved prediction of process KPIs, with numerical equivalence to standard PLS components on simulated data and a formal privacy analysis. Additionally, the paper introduces a data-contribution relevance metric to quantify each participant's value to the modeling task, supporting fair data-sharing incentives and governance.
Abstract
Modern manufacturing value chains require intelligent orchestration of processes across company borders in order to maximize profits while fostering social and environmental sustainability. However, the implementation of integrated, systems-level approaches for data-informed decision-making along value chains is currently hampered by privacy concerns associated with cross-organizational data exchange and integration. We here propose Privacy-Preserving Partial Least Squares (P3LS) regression, a novel federated learning technique that enables cross-organizational data integration and process modeling with privacy guarantees. P3LS involves a singular value decomposition (SVD) based PLS algorithm and employs removable, random masks generated by a trusted authority in order to protect the privacy of the data contributed by each data holder. We demonstrate the capability of P3LS to vertically integrate process data along a hypothetical value chain consisting of three parties and to improve the prediction performance on several process-related key performance indicators. Furthermore, we show the numerical equivalence of P3LS and PLS model components on simulated data and provide a thorough privacy analysis of the former. Moreover, we propose a mechanism for determining the relevance of the contributed data to the problem being addressed, thus creating a basis for quantifying the contribution of participants.
