CrowdAL: Towards a Blockchain-empowered Active Learning System in Crowd Data Labeling
Shaojie Hou, Yuandou Wang, Zhiming Zhao
TL;DR
This work tackles the hard problem of merging Active Learning with crowdsourced data labeling while maintaining data quality and worker privacy. It proposes CrowdAL, a blockchain-enabled framework that uses smart contracts for transparent label aggregation and tamper-proof incentives, complemented by zero-knowledge proofs to protect worker privacy. Key contributions include a two-component architecture (contract factory and AL server), on-chain majority voting for label aggregation, model-driven truth-based reward distribution, and a Groth-16-based privacy protocol with a commit-nullify scheme. Preliminary experiments on a local Ethereum network show feasibility, but reveal non-trivial gas costs for ZKP components that motivate optimization for real-world deployment.
Abstract
Active Learning (AL) is a machine learning technique where the model selectively queries the most informative data points for labeling by human experts. Integrating AL with crowdsourcing leverages crowd diversity to enhance data labeling but introduces challenges in consensus and privacy. This poster presents CrowdAL, a blockchain-empowered crowd AL system designed to address these challenges. CrowdAL integrates blockchain for transparency and a tamper-proof incentive mechanism, using smart contracts to evaluate crowd workers' performance and aggregate labeling results, and employs zero-knowledge proofs to protect worker privacy.
