Table of Contents
Fetching ...

New Solutions Based on the Generalized Eigenvalue Problem for the Data Collaboration Analysis

Yuta Kawakami, Yuichi Takano, Akira Imakura

TL;DR

The paper addresses confidential cross-institution data analysis by refining Data Collaboration Analysis (DCA) through a column-wise, generalized eigenvalue formulation for the collaborative function. It introduces a weighting scheme and a QR+SVD solution path to efficiently solve the resulting eigenproblem, with complexity trade-offs analyzed. Empirical results on Mice, QSAR, Gene Expression, and CIFAR-10 show improved predictive accuracy over prior DCA methods, particularly when weighting is applied with Kernel SVC. The work demonstrates scalable, accurate confidential data analysis across diverse data types and outlines directions for nonlinear abstractions and faster large-scale deployment.

Abstract

In recent years, the accumulation of data across various institutions has garnered attention for the technology of confidential data analysis, which improves analytical accuracy by sharing data between multiple institutions while protecting sensitive information. Among these methods, Data Collaboration Analysis (DCA) is noted for its efficiency in terms of computational cost and communication load, facilitating data sharing and analysis across different institutions while safeguarding confidential information. However, existing optimization problems for determining the necessary collaborative functions have faced challenges, such as the optimal solution for the collaborative representation often being a zero matrix and the difficulty in understanding the process of deriving solutions. This research addresses these issues by formulating the optimization problem through the segmentation of matrices into column vectors and proposing a solution method based on the generalized eigenvalue problem. Additionally, we demonstrate methods for constructing collaborative functions more effectively through weighting and the selection of efficient algorithms suited to specific situations. Experiments using real-world datasets have shown that our proposed formulation and solution for the collaborative function optimization problem achieve superior predictive accuracy compared to existing methods.

New Solutions Based on the Generalized Eigenvalue Problem for the Data Collaboration Analysis

TL;DR

The paper addresses confidential cross-institution data analysis by refining Data Collaboration Analysis (DCA) through a column-wise, generalized eigenvalue formulation for the collaborative function. It introduces a weighting scheme and a QR+SVD solution path to efficiently solve the resulting eigenproblem, with complexity trade-offs analyzed. Empirical results on Mice, QSAR, Gene Expression, and CIFAR-10 show improved predictive accuracy over prior DCA methods, particularly when weighting is applied with Kernel SVC. The work demonstrates scalable, accurate confidential data analysis across diverse data types and outlines directions for nonlinear abstractions and faster large-scale deployment.

Abstract

In recent years, the accumulation of data across various institutions has garnered attention for the technology of confidential data analysis, which improves analytical accuracy by sharing data between multiple institutions while protecting sensitive information. Among these methods, Data Collaboration Analysis (DCA) is noted for its efficiency in terms of computational cost and communication load, facilitating data sharing and analysis across different institutions while safeguarding confidential information. However, existing optimization problems for determining the necessary collaborative functions have faced challenges, such as the optimal solution for the collaborative representation often being a zero matrix and the difficulty in understanding the process of deriving solutions. This research addresses these issues by formulating the optimization problem through the segmentation of matrices into column vectors and proposing a solution method based on the generalized eigenvalue problem. Additionally, we demonstrate methods for constructing collaborative functions more effectively through weighting and the selection of efficient algorithms suited to specific situations. Experiments using real-world datasets have shown that our proposed formulation and solution for the collaborative function optimization problem achieve superior predictive accuracy compared to existing methods.
Paper Structure (31 sections, 21 equations, 26 figures)

This paper contains 31 sections, 21 equations, 26 figures.

Figures (26)

  • Figure 1: Institutions Variation
  • Figure 2: Anchor Data Variation
  • Figure 3: Dimension Number Variation
  • Figure 5: Institutions Variation
  • Figure 6: Anchor Data Variation
  • ...and 21 more figures