Confidence-weighted integration of human and machine judgments for superior decision-making
Felipe Yáñez, Xiaoliang Luo, Omar Valerio Minero, Bradley C. Love
TL;DR
The paper tackles whether humans can meaningfully contribute to decisions when machines, including LLMs, outperform them. It proposes a confidence-weighted logistic regression framework to integrate judgments from any number of teammates, extending prior Bayesian approaches with a simple, fast, and interpretable method. Across two forecasting benchmarks—noisy ImageNet16H object recognition and BrainBench neuroscience forecasting—the authors show that well-calibrated confidence and diversity among teammates yield complementarity, with human–machine teams outperforming either party alone. The approach generalizes to arbitrary agent sets and offers a practical pathway for productive human–machine collaboration in perceptual and knowledge-intensive tasks, supported by LOOCV validation and accessible data/code.
Abstract
Large language models (LLMs) can surpass humans in certain forecasting tasks. What role does this leave for humans in the overall decision process? One possibility is that humans, despite performing worse than LLMs, can still add value when teamed with them. A human and machine team can surpass each individual teammate when team members' confidence is well-calibrated and team members diverge in which tasks they find difficult (i.e., calibration and diversity are needed). We simplified and extended a Bayesian approach to combining judgments using a logistic regression framework that integrates confidence-weighted judgments for any number of team members. Using this straightforward method, we demonstrated its effectiveness in both image classification and neuroscience forecasting tasks. Combining human judgments with one or more machines consistently improved overall team performance. Our hope is that this simple and effective strategy for integrating the judgments of humans and machines will lead to productive collaborations.
