Table of Contents
Fetching ...

Cascading Bandits With Feedback

R Sri Prakash, Nikhil Karamchandani, Sharayu Moharir

TL;DR

This work studies cascade bandits for edge inference, where each arm corresponds to a model with accuracy and an error probability, and where actions involve ordering cascade models to maximize binary rewards. It proves that the optimal static ordering sorts arms by increasing error probability, and analyzes four policies: Explore-then-Commit, Action Elimination, Lower Confidence Bound, and Thompson Sampling. The main results show that EC and AE incur logarithmic regret $\Omega(\log T)$ due to fixating on a cascade order after exploration, while LCB and TS adapt continuously to observed feedback and achieve $O(1)$ regret. Simulations on synthetic edge-inference settings corroborate the theory, highlighting the practical advantage of adaptive strategies for efficient edge inference under uncertainty.

Abstract

Motivated by the challenges of edge inference, we study a variant of the cascade bandit model in which each arm corresponds to an inference model with an associated accuracy and error probability. We analyse four decision-making policies-Explore-then-Commit, Action Elimination, Lower Confidence Bound (LCB), and Thompson Sampling-and provide sharp theoretical regret guarantees for each. Unlike in classical bandit settings, Explore-then-Commit and Action Elimination incur suboptimal regret because they commit to a fixed ordering after the exploration phase, limiting their ability to adapt. In contrast, LCB and Thompson Sampling continuously update their decisions based on observed feedback, achieving constant O(1) regret. Simulations corroborate these theoretical findings, highlighting the crucial role of adaptivity for efficient edge inference under uncertainty.

Cascading Bandits With Feedback

TL;DR

This work studies cascade bandits for edge inference, where each arm corresponds to a model with accuracy and an error probability, and where actions involve ordering cascade models to maximize binary rewards. It proves that the optimal static ordering sorts arms by increasing error probability, and analyzes four policies: Explore-then-Commit, Action Elimination, Lower Confidence Bound, and Thompson Sampling. The main results show that EC and AE incur logarithmic regret due to fixating on a cascade order after exploration, while LCB and TS adapt continuously to observed feedback and achieve regret. Simulations on synthetic edge-inference settings corroborate the theory, highlighting the practical advantage of adaptive strategies for efficient edge inference under uncertainty.

Abstract

Motivated by the challenges of edge inference, we study a variant of the cascade bandit model in which each arm corresponds to an inference model with an associated accuracy and error probability. We analyse four decision-making policies-Explore-then-Commit, Action Elimination, Lower Confidence Bound (LCB), and Thompson Sampling-and provide sharp theoretical regret guarantees for each. Unlike in classical bandit settings, Explore-then-Commit and Action Elimination incur suboptimal regret because they commit to a fixed ordering after the exploration phase, limiting their ability to adapt. In contrast, LCB and Thompson Sampling continuously update their decisions based on observed feedback, achieving constant O(1) regret. Simulations corroborate these theoretical findings, highlighting the crucial role of adaptivity for efficient edge inference under uncertainty.

Paper Structure

This paper contains 10 sections, 18 theorems, 67 equations, 3 figures, 4 algorithms.

Key Result

Theorem 1

The optimal static policy will order the arms in increasing order of their error probabilities $(p_i)$

Figures (3)

  • Figure 1: Cascade ML models with scoring modules
  • Figure 2: Comparison of regret for different policies
  • Figure 3: Comparison of cumulative regret of different polices

Theorems & Definitions (37)

  • Theorem 1
  • Remark 1
  • Lemma 1
  • Theorem 2
  • proof
  • Theorem 3
  • Theorem 4
  • proof
  • Lemma 2
  • Lemma 3
  • ...and 27 more