Data Collaboration Analysis with Orthonormal Basis Selection and Alignment
Keiyu Nosaka, Yamato Suetake, Yuichi Takano, Akiko Yoshise
TL;DR
This work identifies practical instability in existing Data Collaboration (DC) basis alignment due to target-basis choice and introduces Orthonormal Data Collaboration (ODC), which enforces orthonormal secret and target bases. By reducing alignment to the Orthogonal Procrustes Problem, ODC achieves a closed-form solution and orthogonal concordance, ensuring downstream performance is invariant to the target basis. The approach yields substantial computational speedups (up to or exceeding 100x) and preserves DC's one-shot communication and semi-honest privacy model, with robust performance across a variety of tasks and anchor constructions. Empirical results demonstrate ODC’s speed, stability, and favorable privacy-utility trade-offs relative to Imakura-DC, Kawakami-DC, differential privacy baselines, and federated learning. The work also offers practical deployment guidance, including anchor design strategies and governance considerations for cross-sector collaborations.
Abstract
Data Collaboration (DC) enables multiple parties to jointly train a model by sharing only linear projections of their private datasets. The core challenge in DC is to align the bases of these projections without revealing each party's secret basis. While existing theory suggests that any target basis spanning the common subspace should suffice, in practice, the choice of basis can substantially affect both accuracy and numerical stability. We introduce Orthonormal Data Collaboration (ODC), which enforces orthonormal secret and target bases, thereby reducing alignment to the classical Orthogonal Procrustes problem, which admits a closed-form solution. We prove that the resulting change-of-basis matrices achieve \emph{orthogonal concordance}, aligning all parties' representations up to a shared orthogonal transform and rendering downstream performance invariant to the target basis. Computationally, ODC reduces the alignment complexity from O(\min{a(cl)^2,a^2c}) to O(acl^2), and empirical evaluations show up to \(100\times\) speed-ups with equal or better accuracy across benchmarks. ODC preserves DC's one-round communication pattern and privacy assumptions, providing a simple and efficient drop-in improvement to existing DC pipelines.
