Mitigating Biases in Collective Decision-Making: Enhancing Performance in the Face of Fake News

Axel Abels; Elias Fernandez Domingos; Ann Nowé; Tom Lenaerts

Mitigating Biases in Collective Decision-Making: Enhancing Performance in the Face of Fake News

Axel Abels, Elias Fernandez Domingos, Ann Nowé, Tom Lenaerts

TL;DR

This work addresses biases that undermine collective judgment in the spread of fake news by focusing on headlines involving sensitive attributes. It builds a large dataset of human judgments and evaluates static and adaptive aggregation algorithms (e.g., WMV, EXP4, MetaCMAB, ExpertiseTree) to measure improvements in accuracy and reductions in bias. The study finds that adaptive aggregation, especially ExpertiseTree, can mitigate framing effects and group biases while enabling collective intelligence that often surpasses the best individual expert, particularly in larger groups. These findings have practical implications for enhancing crowdsourced fact-checking and developing fair, robust decision-makers in the presence of biased information. The work also provides methods and data to spur future research on bias mitigation through machine intelligence in social-scale decision tasks.

Abstract

Individual and social biases undermine the effectiveness of human advisers by inducing judgment errors which can disadvantage protected groups. In this paper, we study the influence these biases can have in the pervasive problem of fake news by evaluating human participants' capacity to identify false headlines. By focusing on headlines involving sensitive characteristics, we gather a comprehensive dataset to explore how human responses are shaped by their biases. Our analysis reveals recurring individual biases and their permeation into collective decisions. We show that demographic factors, headline categories, and the manner in which information is presented significantly influence errors in human judgment. We then use our collected data as a benchmark problem on which we evaluate the efficacy of adaptive aggregation algorithms. In addition to their improved accuracy, our results highlight the interactions between the emergence of collective intelligence and the mitigation of participant biases.

Mitigating Biases in Collective Decision-Making: Enhancing Performance in the Face of Fake News

TL;DR

Abstract

Paper Structure (50 sections, 3 equations, 11 figures, 4 tables)

This paper contains 50 sections, 3 equations, 11 figures, 4 tables.

Overview
Results
Influence of Aggregation Algorithms on Collective Decision-Making Processes
Accuracy
Round-by-Round Analysis of Instantaneous Regret
Win Percentage
Dissecting Participant Biases
Demographic Differences in Performance
Framing Effect
Group Biases
Collective Decision-Making Models Mitigate Biases
Reduced Framing Effects
Reduced Group Biases
Discussion
Collective Intelligence through Online Machine Learning
...and 35 more sections

Figures (11)

Figure 1: Overview of the Collective Decision-Making problem studied in this work. Participants (identified here by the colors green, red and blue) in the experiment were presented with a sequence of headlines and were asked to estimate the likelihood that they were true. These diverging opinions are then aggregated to reach a collective decision. The aggregation is iteratively optimized by comparing it to a ground truth. Different aggregations weigh opinions differently, see \ref{['sec:algorithms']} for algorithmic details. Example weights for the three participants are given for each approach. A majority vote values all experts uniformly, resulting in even bars. EXP4 auer2002nonstochastic selects a single participant (i.e., weight is concentrated on a single participant) whose opinion is followed. MetaCMAB abels2023dealing distributes weights more evenly, correlating them with performance to enhance the collective decision-making process. ExpertiseTree pmlr-v202-abels23a similarly distributes weights, but will learn distinct models for each headline category if this proves beneficial.
Figure 2: Performance as a function of the number of participants. Shaded areas around the lines in panels a-d represent $95\%$ confidence intervals. a, displays the accuracy (i.e., the proportion of correctly identified headlines, see Methods) of each algorithm for each group size. b, shows, for each group size, the terminal instantaneous regret (see Methods): the difference in performance between the best group member and the algorithm for the final decision. Negative values in b show when the collective outperforms the single best group member. Panels c and d, depict for, respectively, groups of size $4$ and $36$ the instantaneous regret as learning progresses --- the difference in performance between the best group member and the algorithm for the decision at that time (see Methods). These two plots show at which point algorithms surpass the single best group member. e, presents the algorithmic improvement on expertise for different algorithms and different group sizes ($N$). Within each heatmap, group members are ranked by accuracy, and the x-axis indicates the member's performance quantiles. In particular, the lower the quantile, the more accurate the group member. For each row within a heatmap, each cell represents the proportion of cases for which the respective algorithm surpasses the corresponding group member's performance. In particular, the left-most cell in each diagram corresponds to the best group member. Non-zero values for this cell suggest the emergence of collective intelligence.
Figure 3: Participants' accuracy as a function of their demographic information and the type of headline they respond to. The figures, from left to right, depict the accuracy in responding to each headline category in function of three demographic variables: age group (threshold at the median participant age of $35$), ethnicity (partitioned into majority and minority ethnicities, see \ref{['fig:demographic_histplots']}), and gender. The GEEs fitting reveals statistically significant differences (Bonferroni adjusted p-values $p'<0.05$) in age and gender related questions. Specifically, older participants showed slightly lower accuracy in responding to age-related questions, and male participants exhibited lower accuracy in responding to questions related to gender.
Figure 4: Effects of framing in participants' responses. a-c, the quadrants represent false stereotypes (Q1), positive framing effects (Q2), common knowledge (Q3), and negative framing effects (Q4). a, Each point represents a distinct headline; the x-coordinate displays the average response to its original form, and the y-coordinate shows the average response to its altered form. b, The x and y coordinates indicate MetaCMAB's model predictions for the original and altered headlines, respectively. c, In a similar manner, x and y coordinates reveal the predictions of ExpertiseTree's model for the original and altered headlines, respectively. d, Response distributions for highlighted points (i), (ii), (iii) and (iv) in panels a-c are given as histograms. Each point's headline and response distribution are given in its original (orange), and [altered] (blue) form.
Figure 5: Error rates categorized by sensitive attributes in headlines. Brackets indicate that there are significant differences (*, **, and *** indicating respectively p-values below $0.05$,$0.01$, and $0.001$) between pairs of boxplots as identified by Kruskal-Wallis H-tests (ethnicity: $H=33.995$, $p<0.001$, $df=3$, $\eta^2=0.408$, age: $H=34.167$, $p<0.001$, $df=3$, $\eta^2=0.410$ gender: $H=10.421$, $p=0.0153$, $df=3$, $\eta^2=0.098$) followed by Dunn's tests (see Methods).
...and 6 more figures

Mitigating Biases in Collective Decision-Making: Enhancing Performance in the Face of Fake News

TL;DR

Abstract

Mitigating Biases in Collective Decision-Making: Enhancing Performance in the Face of Fake News

Authors

TL;DR

Abstract

Table of Contents

Figures (11)