Dueling over Multiple Pieces of Dessert
Simina Brânzei, Reed Phillips
TL;DR
This work studies repeated two-player cake-cutting where Alice, as the Stackelberg leader, partitions the cake and Bob chooses a piece each round. It reveals sharp learnability boundaries: with fully measurable cuts, Alice cannot achieve strongly sublinear regret against even a myopic Bob; but when restricting to at most $k$ cuts, the learning landscape becomes tractable and yields distinct regret regimes depending on Bob’s suspected learning rate (public vs private) and his strategic sophistication (myopic vs non-myopic). The results provide tight (up to polylog factors) upper and lower bounds for $k=2$ and $k\ge3$ cuts, including regimes with public $\alpha$-regret budgets and adaptive strategies, and extend to a Robertson-Webb query-model corollary detailing $\varepsilon$-Stackelberg allocations via $O(k/\varepsilon)$ queries. Collectively, the findings illuminate the fundamental trade-offs between partitioning flexibility and learnability in repeated Stackelberg-style cake cutting, with implications for online learning in strategic division problems and for query-complexity in classical cake-cutting models.
Abstract
We study the dynamics of repeated fair division between two players, Alice and Bob, where Alice partitions a cake into two subsets and Bob chooses his preferred one over $T$ rounds. Alice aims to minimize her regret relative to the Stackelberg value -- the maximum utility she could achieve if she knew Bob's private valuation. We show that if Alice uses arbitrary measurable partitions, achieving strongly sublinear regret is impossible; she suffers a regret of $Ω\Bigl(\frac{T}{\log^2 T}\Bigr)$ regret even against a myopic Bob. However, when Alice uses at most $k$ cuts, the learning landscape becomes tractable. We analyze Alice's performance based on her knowledge of Bob's strategic sophistication (his regret budget). When Bob's learning rate is public, we establish a hierarchy of polynomial regret bounds determined by $k$ and Bob's regret budget. In contrast, when this learning rate is private, Alice can universally guarantee $O\Bigl(\frac{T}{\log T}\Bigr)$ regret, but any attempt to secure a polynomial rate $O(T^β)$ (for $β< 1$) leaves her vulnerable to incurring strictly linear regret against some Bob. Finally, as a corollary of our online learning dynamics, we characterize the randomized query complexity of finding approximate Stackelberg allocations with a constant number of cuts in the Robertson-Webb model.
