A Refined Analysis of UCBVI

Simone Drago; Marco Mussi; Alberto Maria Metelli

A Refined Analysis of UCBVI

Simone Drago, Marco Mussi, Alberto Maria Metelli

TL;DR

This work refines the analysis of UCBVI for finite-horizon, tabular RL by deriving tighter constants for both Chernoff-Hoeffding and Bernstein-Freedman exploration bonuses. The authors show that the same $\widetilde{\mathcal{O}}(\sqrt{HSAT})$ regret rate can be achieved with substantially smaller constants, and they demonstrate that the improved constants translate into meaningful empirical gains relative to the original UCBVI and the MVP algorithm. Theoretical results are complemented by numerical validation in illustrative environments and the RiverSwim benchmark, where the BF-I variant often yields the lowest regret. By preserving the same asymptotic rate while reducing constants, the refined UCBVI remains a practical and competitive choice for finite-horizon tabular RL.

Abstract

In this work, we provide a refined analysis of the UCBVI algorithm (Azar et al., 2017), improving both the bonus terms and the regret analysis. Additionally, we compare our version of UCBVI with both its original version and the state-of-the-art MVP algorithm. Our empirical validation demonstrates that improving the multiplicative constants in the bounds has significant positive effects on the empirical performance of the algorithms.

A Refined Analysis of UCBVI

TL;DR

regret rate can be achieved with substantially smaller constants, and they demonstrate that the improved constants translate into meaningful empirical gains relative to the original UCBVI and the MVP algorithm. Theoretical results are complemented by numerical validation in illustrative environments and the RiverSwim benchmark, where the BF-I variant often yields the lowest regret. By preserving the same asymptotic rate while reducing constants, the refined UCBVI remains a practical and competitive choice for finite-horizon tabular RL.

A Refined Analysis of UCBVI

TL;DR

Abstract

A Refined Analysis of UCBVI

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (15)