(Im)possibility of Collective Intelligence
Krikamol Muandet
TL;DR
This work reframes learning across heterogeneous environments as a choice problem on a hypothesis space and imposes primitive axioms—$\text{PO}$, $\text{IIH}$, and $\text{IR}$—plus a collective intelligence requirement to study cross-environment learning. It proves that, when there are at least $3$ hypotheses and $n\ge 2$ environments, the only rational algorithm compatible with these axioms is empirical risk minimization (ERM) that optimizes a single environment, yielding a fundamental CI impossibility. The analysis connects to Arrow’s impossibility theorem and reveals the role of informational incomparability in limiting cross-environment generalization, federated learning, and multi-modal settings. The paper further discusses practical implications and potential escape routes, such as relaxing internal consistency or permitting information sharing under privacy and governance constraints, to enable more collective learning outcomes in practice.
Abstract
Modern applications of AI involve training and deploying machine learning models across heterogeneous and potentially massive environments. Emerging diversity of data not only brings about new possibilities to advance AI systems, but also restricts the extent to which information can be shared across environments due to pressing concerns such as privacy, security, and equity. Based on a novel characterization of learning algorithms as choice correspondences on a hypothesis space, this work provides a minimum requirement in terms of intuitive and reasonable axioms under which the only rational learning algorithm in heterogeneous environments is an empirical risk minimization (ERM) that unilaterally learns from a single environment without information sharing across environments. Our (im)possibility result underscores the fundamental trade-off that any algorithms will face in order to achieve Collective Intelligence (CI), i.e., the ability to learn across heterogeneous environments. Ultimately, collective learning in heterogeneous environments are inherently hard because, in critical areas of machine learning such as out-of-distribution generalization, federated/collaborative learning, algorithmic fairness, and multi-modal learning, it can be infeasible to make meaningful comparisons of model predictive performance across environments.
