Binary Linear Tree Commitment-based Ownership Protection for Distributed Machine Learning
Tianxiu Xie, Keke Gai, Jing Yu, Liehuang Zhu
TL;DR
This work tackles ownership verification and computational integrity in distributed ML by introducing BLTC-DOP, a vector commitment scheme that attaches a concise, watermarked proof to the weight vector $\mathbf{W}$ and records certificates on a distributed ledger. The core mechanism, a Binary Linear Tree Commitment, enables sub-linear proof updates and aggregation via inner-product arguments, while watermarking thwarts forgery. Theoretical analysis establishes correctness, soundness, aggregability, and maintainability, and experiments show BLTC-DOP outperforms SNARK-based schemes in aggregation efficiency, with practical verification and update costs in DML settings. The approach offers a scalable, tamper-resistant framework for model ownership protection that integrates cryptographic commitments with blockchain-based auditing.
Abstract
Distributed machine learning enables parallel training of extensive datasets by delegating computing tasks across multiple workers. Despite the cost reduction benefits of distributed machine learning, the dissemination of final model weights often leads to potential conflicts over model ownership as workers struggle to substantiate their involvement in the training computation. To address the above ownership issues and prevent accidental failures and malicious attacks, verifying the computational integrity and effectiveness of workers becomes particularly crucial in distributed machine learning. In this paper, we proposed a novel binary linear tree commitment-based ownership protection model to ensure computational integrity with limited overhead and concise proof. Due to the frequent updates of parameters during training, our commitment scheme introduces a maintainable tree structure to reduce the costs of updating proofs. Distinguished from SNARK-based verifiable computation, our model achieves efficient proof aggregation by leveraging inner product arguments. Furthermore, proofs of model weights are watermarked by worker identity keys to prevent commitments from being forged or duplicated. The performance analysis and comparison with SNARK-based hash commitments validate the efficacy of our model in preserving computational integrity within distributed machine learning.
