Mixture of Experts based Multi-task Supervise Learning from Crowds
Tao Han, Huaixuan Shi, Xinyi Ding, Xiao Ma, Huamao Gu, Yili Fang
TL;DR
This work reframes truth inference in crowdsourcing as a multi-task supervised learning problem, introducing the Multi-task Supervised Learning-from-Crowds (MLC) paradigm and a Mixture of Experts based variant (MMLC) that models worker behavior at the item-feature level rather than relying on latent ground truth. It proposes two aggregation strategies: MMLC-owf, which identifies an oracle worker in the learned spectral space to generate ground truth, and MMLC-df, which fills sparse crowdsourced data with MMLC predictions to improve downstream truth inference. Empirical results on three diverse datasets show that MMLC-owf achieves state-of-the-art accuracy, closely approaching an theoretical upper bound via optimal clustering, while MMLC-df consistently enhances existing truth-inference methods by leveraging data filling. The proposed framework offers a principled way to capture feature-level worker behavior, improve label quality, and robustly handle sparse crowdsourcing data with practical implications for scalable, accurate aggregation in crowdsourcing systems.
Abstract
Existing truth inference methods in crowdsourcing aim to map redundant labels and items to the ground truth. They treat the ground truth as hidden variables and use statistical or deep learning-based worker behavior models to infer the ground truth. However, worker behavior models that rely on ground truth hidden variables overlook workers' behavior at the item feature level, leading to imprecise characterizations and negatively impacting the quality of truth inference. This paper proposes a new paradigm of multi-task supervised learning from crowds, which eliminates the need for modeling of items's ground truth in worker behavior models. Within this paradigm, we propose a worker behavior model at the item feature level called Mixture of Experts based Multi-task Supervised Learning from Crowds (MMLC). Two truth inference strategies are proposed within MMLC. The first strategy, named MMLC-owf, utilizes clustering methods in the worker spectral space to identify the projection vector of the oracle worker. Subsequently, the labels generated based on this vector are considered as the inferred truth. The second strategy, called MMLC-df, employs the MMLC model to fill the crowdsourced data, which can enhance the effectiveness of existing truth inference methods. Experimental results demonstrate that MMLC-owf outperforms state-of-the-art methods and MMLC-df enhances the quality of existing truth inference methods.
