Bi-Criteria Optimization for Combinatorial Bandits: Sublinear Regret and Constraint Violation under Bandit Feedback

Vaneet Aggarwal; Shweta Jain; Subham Pokhriyal; Christopher John Quinn

Bi-Criteria Optimization for Combinatorial Bandits: Sublinear Regret and Constraint Violation under Bandit Feedback

Vaneet Aggarwal, Shweta Jain, Subham Pokhriyal, Christopher John Quinn

TL;DR

The paper tackles bi-criteria optimization in combinatorial CMAB with bandit feedback by introducing a black-box offline-to-online framework that converts $(\alpha,\beta,\delta,\texttt{N})$-resilient bi-criteria offline algorithms into online CMAB strategies. The online algorithm achieves sublinear regret and cumulative constraint violation with bounds $\mathcal{O}(\delta^{2/3} \texttt{N}^{1/3} T^{2/3} \log^{1/3}(T))$, while requiring only $\texttt{N}$ oracle calls and handling $\epsilon$-level oracle noise via a $\delta$-resilience parameter. The framework is demonstrated on three canonical problems—Submodular Cover, Submodular Cost Submodular Cover, and Fair Submodular Maximization—showing how offline resilience translates into online guarantees without relying on problem-specific structure. This approach broadens practical applicability for online bi-criteria optimization under bandit feedback, enabling robust performance in budgeted, fair, and utility-constrained settings. The results offer a versatile tool for translating offline approximation guarantees into scalable online policies across a wide range of combinatorial optimization problems.

Abstract

In this paper, we study bi-criteria optimization for combinatorial multi-armed bandits (CMAB) with bandit feedback. We propose a general framework that transforms discrete bi-criteria offline approximation algorithms into online algorithms with sublinear regret and cumulative constraint violation (CCV) guarantees. Our framework requires the offline algorithm to provide an $(α, β)$-bi-criteria approximation ratio with $δ$-resilience and utilize $\texttt{N}$ oracle calls to evaluate the objective and constraint functions. We prove that the proposed framework achieves sub-linear regret and CCV, with both bounds scaling as ${O}\left(δ^{2/3} \texttt{N}^{1/3}T^{2/3}\log^{1/3}(T)\right)$. Crucially, the framework treats the offline algorithm with $δ$-resilience as a black box, enabling flexible integration of existing approximation algorithms into the CMAB setting. To demonstrate its versatility, we apply our framework to several combinatorial problems, including submodular cover, submodular cost covering, and fair submodular maximization. These applications highlight the framework's broad utility in adapting offline guarantees to online bi-criteria optimization under bandit feedback.

Bi-Criteria Optimization for Combinatorial Bandits: Sublinear Regret and Constraint Violation under Bandit Feedback

TL;DR

The paper tackles bi-criteria optimization in combinatorial CMAB with bandit feedback by introducing a black-box offline-to-online framework that converts

-resilient bi-criteria offline algorithms into online CMAB strategies. The online algorithm achieves sublinear regret and cumulative constraint violation with bounds

, while requiring only

oracle calls and handling

-level oracle noise via a

-resilience parameter. The framework is demonstrated on three canonical problems—Submodular Cover, Submodular Cost Submodular Cover, and Fair Submodular Maximization—showing how offline resilience translates into online guarantees without relying on problem-specific structure. This approach broadens practical applicability for online bi-criteria optimization under bandit feedback, enabling robust performance in budgeted, fair, and utility-constrained settings. The results offer a versatile tool for translating offline approximation guarantees into scalable online policies across a wide range of combinatorial optimization problems.

Abstract

-bi-criteria approximation ratio with

-resilience and utilize

oracle calls to evaluate the objective and constraint functions. We prove that the proposed framework achieves sub-linear regret and CCV, with both bounds scaling as

. Crucially, the framework treats the offline algorithm with

-resilience as a black box, enabling flexible integration of existing approximation algorithms into the CMAB setting. To demonstrate its versatility, we apply our framework to several combinatorial problems, including submodular cover, submodular cost covering, and fair submodular maximization. These applications highlight the framework's broad utility in adapting offline guarantees to online bi-criteria optimization under bandit feedback.

Bi-Criteria Optimization for Combinatorial Bandits: Sublinear Regret and Constraint Violation under Bandit Feedback

TL;DR

Abstract

Bi-Criteria Optimization for Combinatorial Bandits: Sublinear Regret and Constraint Violation under Bandit Feedback

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (27)