Table of Contents
Fetching ...

To Spend or to Gain: Online Learning in Repeated Karma Auctions

Damien Berriaud, Ezzat Elokda, Devansh Jalota, Emilio Frazzoli, Marco Pavone, Florian Dörfler

TL;DR

This work tackles online learning in karma-based repeated resource auctions where artificial currency is redistributed each period. It develops adaptive karma pacing, an online dual gradient ascent-like strategy, and proves that it achieves asymptotic optimality for a single bidder, induces convergent learning when all bidders adopt it, and forms an approximate Nash equilibrium in large-population, parallel auctions. The analysis addresses unique challenges from currency gains and redistribution, including nontruthfulness upon losing, budget balance under Karma, and non-uniqueness of stationary multipliers, requiring a novel relaxed-dual with a shiftedprojection. The results provide principled, scalable bidding rules for practical karma mechanisms, with welfare implications suggesting efficient allocation without external money and robustness to heterogeneity across agents. The work also charts future directions for alternative karma redistribution schemes and adaptive step-size strategies to mitigate the vanishing-box problem and enhance convergence in realistic settings.

Abstract

Recent years have seen a surge of artificial currency-based mechanisms in contexts where monetary instruments are deemed unfair or inappropriate, e.g., in allocating food donations to food banks, course seats to students, and, more recently, even for traffic congestion management. Yet the applicability of these mechanisms remains limited in repeated auction settings, as it is challenging for users to learn how to bid an artificial currency that has no value outside the auctions. Indeed, users must jointly learn the value of the currency in addition to how to spend it optimally. Moreover, in the prominent class of karma mechanisms, in which artificial karma payments are redistributed to users at each time step, users do not only spend karma to obtain public resources but also gain karma for yielding them. For this novel class of karma auctions, we propose an adaptive karma pacing strategy that learns to bid optimally, and show that this strategy a) is asymptotically optimal for a single user bidding against competing bids drawn from a stationary distribution; b) leads to convergent learning dynamics when all users adopt it; and c) constitutes an approximate Nash equilibrium as the number of users grows. Our results require a novel analysis in comparison to adaptive pacing strategies in monetary auctions, since we depart from the classical assumption that the currency has known value outside the auctions, and consider that the currency is both spent and gained through the redistribution of payments.

To Spend or to Gain: Online Learning in Repeated Karma Auctions

TL;DR

This work tackles online learning in karma-based repeated resource auctions where artificial currency is redistributed each period. It develops adaptive karma pacing, an online dual gradient ascent-like strategy, and proves that it achieves asymptotic optimality for a single bidder, induces convergent learning when all bidders adopt it, and forms an approximate Nash equilibrium in large-population, parallel auctions. The analysis addresses unique challenges from currency gains and redistribution, including nontruthfulness upon losing, budget balance under Karma, and non-uniqueness of stationary multipliers, requiring a novel relaxed-dual with a shiftedprojection. The results provide principled, scalable bidding rules for practical karma mechanisms, with welfare implications suggesting efficient allocation without external money and robustness to heterogeneity across agents. The work also charts future directions for alternative karma redistribution schemes and adaptive step-size strategies to mitigate the vanishing-box problem and enhance convergence in realistic settings.

Abstract

Recent years have seen a surge of artificial currency-based mechanisms in contexts where monetary instruments are deemed unfair or inappropriate, e.g., in allocating food donations to food banks, course seats to students, and, more recently, even for traffic congestion management. Yet the applicability of these mechanisms remains limited in repeated auction settings, as it is challenging for users to learn how to bid an artificial currency that has no value outside the auctions. Indeed, users must jointly learn the value of the currency in addition to how to spend it optimally. Moreover, in the prominent class of karma mechanisms, in which artificial karma payments are redistributed to users at each time step, users do not only spend karma to obtain public resources but also gain karma for yielding them. For this novel class of karma auctions, we propose an adaptive karma pacing strategy that learns to bid optimally, and show that this strategy a) is asymptotically optimal for a single user bidding against competing bids drawn from a stationary distribution; b) leads to convergent learning dynamics when all users adopt it; and c) constitutes an approximate Nash equilibrium as the number of users grows. Our results require a novel analysis in comparison to adaptive pacing strategies in monetary auctions, since we depart from the classical assumption that the currency has known value outside the auctions, and consider that the currency is both spent and gained through the redistribution of payments.
Paper Structure (54 sections, 10 theorems, 113 equations, 4 figures, 2 algorithms)

This paper contains 54 sections, 10 theorems, 113 equations, 4 figures, 2 algorithms.

Key Result

theorem 1

There exists a constant $C \in \mathbb{R}_+$ such that the average expected regret of an agent $i \in \mathcal{N}$ for following strategy $K$ in the stationary competition setting satisfies Moreover, for suitably chosen parameters, strategy $K$ asymptotically converges to an $O(\hat{\varepsilon})$-neighborhood of the optimal expected cost with the benefit of hindsight, i.e.,

Figures (4)

  • Figure 1: Schematic representation of repeated resource allocation using karma.
  • Figure 2: Numerical validation of the main theorems. Figures \ref{['fig:P_vs_H_no_redistrib']} and \ref{['fig:P_vs_H_with_redistrib']} show the convergence of costs to the minimum with the benefit of hindsight, without and with budget gains due to payment redistribution. Figures \ref{['fig:cv_mu_no_redistrib']} and \ref{['fig:cv_mu_with_redistrib']} show the convergence of multipliers under simultaneous learning, without and with budget gains.
  • Figure 3: Numerical validation of the absolute continuity of valuations and Assumption \ref{['ass:G-sim-learn-hitting-time']}, and potential solution for the vanishing box problem (Figure \ref{['fig:fixed_budget']}). Figure \ref{['fig:hit_time']} shows that Assumption \ref{['ass:G-sim-learn-hitting-time-4']} is satisfied in practice. Figures \ref{['fig:non_cont_v_perfs_vs_H']} and \ref{['fig:non_cont_v_cv_mu']} respectively show that the results of Theorems \ref{['thm:G_stat_comp']} and \ref{['thm:G_sim_lear_cv']} hold also when the distribution of valuations is not continuous.
  • Figure 4: Comparison of different strategies over an individual episode. By "Best in Hindsight" we refer to the solution of Equation \ref{['equ:G_lowerbound_cost_H']} which allows for temporary violations of the budget throughout the episode. By "Strategy $K$" and "Strategy $A$" we refer to Algorithms \ref{['alg:G']} and \ref{['alg:A']} respectively. By "Strategy $A$ using $g_t$ and $b_t \propto \frac{v_t}{1+\mu_t}$" we refer to the extension of the adaptive strategy proposed in balseiro_learning_2019 for classical monetary settings to handle budget increases. It places bids using $b_{i,t} = \min\{ \Delta v_{i,t} / (1 + \mu_{i,t}) , k_{i,t} \}$ and uses the multiplier update $\mu_{i,t+1} = P_{[0, \overline{\mu} ]} \left( \mu_{i,t} + \epsilon ( z_{i,t} - g_{i,t} - \rho_i) \right).$ Subfigure $(a)$ shows that both variations of strategy $A$ perform suboptimally compared to $K$, and subfigure $(b)$ explains their performances. On one hand, Strategy $A$ simply does not take gains into account and converges to a stationary multiplier that only depletes the initial budget. On the other hand, the alteration of Strategy $A$ aims to deplete the initial budget and the gains at the end of the episode, but it cannot achieve it because the denominator $1 + \mu_{i,t}$ in the bid formulation stays larger than one.

Theorems & Definitions (10)

  • theorem 1: Asymptotic Optimality under Stationary Competition
  • theorem 2: Convergence under Simultaneous Learning
  • theorem 3: Approximate Nash Equilibrium
  • theorem 4: Asymptotic Optimality under Stationary Competition
  • theorem 5: Convergence under Simultaneous Learning
  • theorem 6: $\varepsilon$-Nash Equilibrium
  • lemma 1: Upper bound on the Expected Mean Squared Error
  • lemma 2: Expected Mean Squared Error with Unilateral Deviation
  • proposition 1
  • proposition 2