Table of Contents
Fetching ...

No-regret incentive-compatible online learning under exact truthfulness with non-myopic experts

Junpei Komiyama, Nishant A. Mehta, Ali Mortazavi

TL;DR

This work resolves the open problem of no-regret, truthful online forecasting with non-myopic experts by introducing FPL-ELF for full information and FPL-ELF-$\varepsilon$ for bandit feedback. Modeling the mechanism as Follow the Perturbed Leader with noise tied to random walks and Poisson-binomial perturbations, the authors achieve $\tilde{O}(\sqrt{TN})$ regret in full information and $\tilde{O}(T^{2/3}N^{1/3})$ in bandit settings, under online incentive compatibility with belief independence. Key technical advances include tailored tail bounds for Poisson-binomial distributions and a lead-pack analysis that controls leader changes, enabling robust regret guarantees for a non-myopic incentive-compatible regime. The results establish exactly truthful mechanisms in online forecasting with non-myopic experts, with potential extensions to multiple outcomes and decoupled exploration, and identify fundamental limits via conjectured lower bounds. Overall, the paper advances the design of incentive-compatible online learning systems where strategic experts cannot gain by misreporting beliefs, with implications for forecasting competitions and mechanism design under adversarial settings.

Abstract

We study an online forecasting setting in which, over $T$ rounds, $N$ strategic experts each report a forecast to a mechanism, the mechanism selects one forecast, and then the outcome is revealed. In any given round, each expert has a belief about the outcome, but the expert wishes to select its report so as to maximize the total number of times it is selected. The goal of the mechanism is to obtain low belief regret: the difference between its cumulative loss (based on its selected forecasts) and the cumulative loss of the best expert in hindsight (as measured by the experts' beliefs). We consider exactly truthful mechanisms for non-myopic experts, meaning that truthfully reporting its belief strictly maximizes the expert's subjective probability of being selected in any future round. Even in the full-information setting, it is an open problem to obtain the first no-regret exactly truthful mechanism in this setting. We develop the first no-regret mechanism for this setting via an online extension of the Independent-Event Lotteries Forecasting Competition Mechanism (I-ELF). By viewing this online I-ELF as a novel instance of Follow the Perturbed Leader (FPL) with noise based on random walks with loss-dependent perturbations, we obtain $\tilde{O}(\sqrt{T N})$ regret. Our results are fueled by new tail bounds for Poisson binomial random variables that we develop. We extend our results to the bandit setting, where we give an exactly truthful mechanism obtaining $\tilde{O}(T^{2/3} N^{1/3})$ regret; this is the first no-regret result even among approximately truthful mechanisms.

No-regret incentive-compatible online learning under exact truthfulness with non-myopic experts

TL;DR

This work resolves the open problem of no-regret, truthful online forecasting with non-myopic experts by introducing FPL-ELF for full information and FPL-ELF- for bandit feedback. Modeling the mechanism as Follow the Perturbed Leader with noise tied to random walks and Poisson-binomial perturbations, the authors achieve regret in full information and in bandit settings, under online incentive compatibility with belief independence. Key technical advances include tailored tail bounds for Poisson-binomial distributions and a lead-pack analysis that controls leader changes, enabling robust regret guarantees for a non-myopic incentive-compatible regime. The results establish exactly truthful mechanisms in online forecasting with non-myopic experts, with potential extensions to multiple outcomes and decoupled exploration, and identify fundamental limits via conjectured lower bounds. Overall, the paper advances the design of incentive-compatible online learning systems where strategic experts cannot gain by misreporting beliefs, with implications for forecasting competitions and mechanism design under adversarial settings.

Abstract

We study an online forecasting setting in which, over rounds, strategic experts each report a forecast to a mechanism, the mechanism selects one forecast, and then the outcome is revealed. In any given round, each expert has a belief about the outcome, but the expert wishes to select its report so as to maximize the total number of times it is selected. The goal of the mechanism is to obtain low belief regret: the difference between its cumulative loss (based on its selected forecasts) and the cumulative loss of the best expert in hindsight (as measured by the experts' beliefs). We consider exactly truthful mechanisms for non-myopic experts, meaning that truthfully reporting its belief strictly maximizes the expert's subjective probability of being selected in any future round. Even in the full-information setting, it is an open problem to obtain the first no-regret exactly truthful mechanism in this setting. We develop the first no-regret mechanism for this setting via an online extension of the Independent-Event Lotteries Forecasting Competition Mechanism (I-ELF). By viewing this online I-ELF as a novel instance of Follow the Perturbed Leader (FPL) with noise based on random walks with loss-dependent perturbations, we obtain regret. Our results are fueled by new tail bounds for Poisson binomial random variables that we develop. We extend our results to the bandit setting, where we give an exactly truthful mechanism obtaining regret; this is the first no-regret result even among approximately truthful mechanisms.

Paper Structure

This paper contains 33 sections, 30 theorems, 140 equations, 4 figures, 5 algorithms.

Key Result

Theorem 1

FPL-ELF is Online IC-BI.

Figures (4)

  • Figure 1: Illustration of the separation lemma, which is formalized in Lemma \ref{['lem_operation']} in the Appendix \ref{['sec_poisson']}. Black dots represent the Poisson binomial parameters in non-decreasing order. We move two parameters $\theta_{\downarrow},\theta_{\uparrow}$ for the same distance until one of them hits the floor ($l_{\mathrm{Poi}}$) or the ceiling ($u_{\mathrm{Poi}}$).
  • Figure 2: Illustration of Lemma \ref{['lem_operation']}. We move two parameters that were originally $\theta_\downarrow$ and $\theta_\uparrow$ (green dots) towards the direction of the blue arrows until one of them hits $l_{\mathrm{Poi}}$ or $u_{\mathrm{Poi}}$. The two nodes move exactly the same distance so that the summation of the parameters are preserved.
  • Figure : FPL-ELF
  • Figure : FPL-ELF-$\varepsilon$

Theorems & Definitions (59)

  • Definition 1: Belief Independence witkowski2023incentive
  • Definition 2: Incentive Compatibility under Belief Independence witkowski2023incentive
  • Definition 3: Online Mechanism
  • Definition 4: Online Incentive Compatibility under Belief Independence
  • Definition 5: Bandit Mechanism
  • Definition 6: Bandit Online Incentive Compatibility under Belief Independence
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Proposition 1
  • ...and 49 more