Table of Contents
Fetching ...

(Im)possibility of Collective Intelligence

Krikamol Muandet

TL;DR

This work reframes learning across heterogeneous environments as a choice problem on a hypothesis space and imposes primitive axioms—$\text{PO}$, $\text{IIH}$, and $\text{IR}$—plus a collective intelligence requirement to study cross-environment learning. It proves that, when there are at least $3$ hypotheses and $n\ge 2$ environments, the only rational algorithm compatible with these axioms is empirical risk minimization (ERM) that optimizes a single environment, yielding a fundamental CI impossibility. The analysis connects to Arrow’s impossibility theorem and reveals the role of informational incomparability in limiting cross-environment generalization, federated learning, and multi-modal settings. The paper further discusses practical implications and potential escape routes, such as relaxing internal consistency or permitting information sharing under privacy and governance constraints, to enable more collective learning outcomes in practice.

Abstract

Modern applications of AI involve training and deploying machine learning models across heterogeneous and potentially massive environments. Emerging diversity of data not only brings about new possibilities to advance AI systems, but also restricts the extent to which information can be shared across environments due to pressing concerns such as privacy, security, and equity. Based on a novel characterization of learning algorithms as choice correspondences on a hypothesis space, this work provides a minimum requirement in terms of intuitive and reasonable axioms under which the only rational learning algorithm in heterogeneous environments is an empirical risk minimization (ERM) that unilaterally learns from a single environment without information sharing across environments. Our (im)possibility result underscores the fundamental trade-off that any algorithms will face in order to achieve Collective Intelligence (CI), i.e., the ability to learn across heterogeneous environments. Ultimately, collective learning in heterogeneous environments are inherently hard because, in critical areas of machine learning such as out-of-distribution generalization, federated/collaborative learning, algorithmic fairness, and multi-modal learning, it can be infeasible to make meaningful comparisons of model predictive performance across environments.

(Im)possibility of Collective Intelligence

TL;DR

This work reframes learning across heterogeneous environments as a choice problem on a hypothesis space and imposes primitive axioms—, , and —plus a collective intelligence requirement to study cross-environment learning. It proves that, when there are at least hypotheses and environments, the only rational algorithm compatible with these axioms is empirical risk minimization (ERM) that optimizes a single environment, yielding a fundamental CI impossibility. The analysis connects to Arrow’s impossibility theorem and reveals the role of informational incomparability in limiting cross-environment generalization, federated learning, and multi-modal settings. The paper further discusses practical implications and potential escape routes, such as relaxing internal consistency or permitting information sharing under privacy and governance constraints, to enable more collective learning outcomes in practice.

Abstract

Modern applications of AI involve training and deploying machine learning models across heterogeneous and potentially massive environments. Emerging diversity of data not only brings about new possibilities to advance AI systems, but also restricts the extent to which information can be shared across environments due to pressing concerns such as privacy, security, and equity. Based on a novel characterization of learning algorithms as choice correspondences on a hypothesis space, this work provides a minimum requirement in terms of intuitive and reasonable axioms under which the only rational learning algorithm in heterogeneous environments is an empirical risk minimization (ERM) that unilaterally learns from a single environment without information sharing across environments. Our (im)possibility result underscores the fundamental trade-off that any algorithms will face in order to achieve Collective Intelligence (CI), i.e., the ability to learn across heterogeneous environments. Ultimately, collective learning in heterogeneous environments are inherently hard because, in critical areas of machine learning such as out-of-distribution generalization, federated/collaborative learning, algorithmic fairness, and multi-modal learning, it can be infeasible to make meaningful comparisons of model predictive performance across environments.
Paper Structure (26 sections, 6 theorems, 20 equations, 4 figures)

This paper contains 26 sections, 6 theorems, 20 equations, 4 figures.

Key Result

Proposition 1

Let $\{(H_{\omega},B,\mathbb{A}_{\xi})\,:\, \omega\in\Omega, \xi\in\Xi\}$ be a collection of learning structures for some index sets $\omega\in\Omega$ and $\xi\in\Xi$. Suppose that $\mathbb{A}_\xi$ is a risk minimizer (RM), i.e., for some real-valued risk functional $r_\xi:H_\omega\to\mathbb{R}$, $\xi\in\Xi$. Then, $\{(H_{\omega},B,\mathbb{A}_{\xi})\,:\, \omega\in\Omega, \xi\in\Xi\}$ satisfies in

Figures (4)

  • Figure 1: A learning structure $(H,B,\mathbb{A})$. For each $\mathcal{H}\in B$, the set $\mathbb{A}(\mathcal{H})$ consists of the optimal hypotheses $h^*$ learned by the algorithm $\mathbb{A}$. The hypothesis space $\mathscr{H}$ is determined by the learning problem at hand, whereas the learning structure $(H,B,\mathbb{A})$ is a design choice.
  • Figure 2: A two-stage model of machine learning: We model a typical machine learning pipeline as a two-stage choice. Given a hypothesis space $\mathscr{H}$, one must first come up with a model and a learning algorithm. The model induces a collection of hypothesis class $H_{\omega}$ parametrized by some hyperparameter $\omega\in\Omega$, while the learning algorithm $\mathbb{A}_{\xi}$ is a choice correspondence parametrized by $\xi\in\Xi$. The algorithm $\mathbb{A}_\xi$ prescribes a series of instructions that will be executed to choose the best solutions from the hypothesis class $H_{\omega}$ or any subsets thereof. The result of this design process is a learning structure$(H_{\omega},B,\mathbb{A}_{\xi})$ where $B$ is a collection of nonempty subsets of $H_{\omega}$ for which $\mathbb{A}_{\xi}(\mathcal{H})\subseteq \mathcal{H}$ and $\mathbb{A}_{\xi}(\mathcal{H})\neq\emptyset$ for all $\mathcal{H}\in B$. A model selection structure $(\text{LS}(\Omega,\Xi),C,\mathbb{S})$ consists of a collection of learning structure (leftmost solid oval), a collection of subsets of $\text{LS}(\Omega,\Xi)$ (dashed circles), and a model selection procedure $\mathbb{S}$. In Stage I, the model selection $\mathbb{S}$ chooses the learning structure $(H_{\omega_{\diamond}},B,\mathbb{A}_{\xi_{\diamond}})$ from a subset $\mathcal{E}$ in $C$ (green solid circle). For example, one may choose the best learning structure by either manually setting the values of the hyperparameters (i.e., $C$ is a collection of singletons) or by adopting data-dependent model selection procedures (i.e., $C$ is composed of nontrivial subsets). In Stage II, the choice is delegated subsequently to the learning algorithm $\mathbb{A}_{\xi_{\diamond}}$ which chooses from the model class $H_{\omega}$ or any subsets thereof. The final outcome of this process is a set of optimal solutions denoted in the figure by $\bigstar$.
  • Figure 3: (\ref{['subfig:alpha-violation']}) Property $\alpha$ is violated when a contraction of hypothesis class, e.g., by fixing the values of some parameters of the model with the same hyperparameters, can change the behaviour of $\mathbb{A}_\xi$. Here, $\mathbb{A}_{\xi}$ chooses from $\mathcal{F}$, but from $\mathcal{G} \subset \mathcal{F}$ although it still contains . (\ref{['subfig:beta-violation']}) Property $\beta$ is violated when an expansion of the hypothesis class, e.g., by adding more parameters of the model with the same hyperparameters, can change how the optimal hypotheses are chosen by $\mathbb{A}_{\xi}$. Here, $\mathbb{A}_{\xi}$ chooses {, } from $\mathcal{G}$, but neglects from its optimal choice when choosing from $\mathcal{F} \supset \mathcal{G}$.
  • Figure 4: In heterogeneous environments, the task of machine learners is to design an aggregation rule that takes a risk profile $(r_1,r_2,\ldots,r_n)$ representing the performance measures of hypotheses across $n$ environments and produces a learning structure $(H,B,\mathbb{A}_{\mathbf{r}})$. For each hypothesis class $\mathcal{H}\in B$, the algorithm $\mathbb{A}_\mathbf{r}$ is implemented to choose the best hypotheses from $\mathcal{H}$. The lower arrow in the figure represents the deployment process.

Theorems & Definitions (21)

  • Definition 1
  • Definition 2
  • Remark 1
  • Example 1: Kernel machines
  • Example 2: Deep learning
  • Definition 3: Internal consistency
  • Remark 2
  • Proposition 1
  • proof : Proof of Proposition \ref{['prop:risk-minimizers']}
  • Remark 3
  • ...and 11 more