Towards More Efficient, Robust, Instance-adaptive, and Generalizable Sequential Decision making

Zhiyong Wang

Towards More Efficient, Robust, Instance-adaptive, and Generalizable Sequential Decision making

Zhiyong Wang

TL;DR

The primary goal of this Ph.D. study is to develop provably efficient and practical algorithms for data-driven sequential decision-making under uncertainty for both general reinforcement learning and bandits.

Abstract

The primary goal of my Ph.D. study is to develop provably efficient and practical algorithms for data-driven sequential decision-making under uncertainty. My work focuses on reinforcement learning (RL), multi-armed bandits, and their applications, including recommendation systems, computer networks, video analytics, and large language models (LLMs). Sequential decision-making methods, such as bandits and RL, have demonstrated remarkable success - ranging from outperforming human players in complex games like Atari and Go to advancing robotics, recommendation systems, and fine-tuning LLMs. Despite these successes, many established algorithms rely on idealized models that can fail under model misspecifications or adversarial perturbations, particularly in settings where accurate prior knowledge of the underlying model class is unavailable or where malicious users operate within dynamic systems. These challenges are pervasive in real-world applications, where robust and adaptive solutions are critical. Furthermore, while worst-case guarantees provide theoretical reliability, they often fail to capture instance-dependent performance, which can lead to more efficient and practical solutions. Another key challenge lies in generalizing to new, unseen environments, a crucial requirement for deploying these methods in dynamic and unpredictable settings. To address these limitations, my research aims to develop more efficient, robust, instance-adaptive, and generalizable sequential decision-making algorithms for both reinforcement learning and bandits. Towards this end, I focus on developing more efficient, robust, instance-adaptive, and generalizable for both general reinforcement learning (RL) and bandits.

Towards More Efficient, Robust, Instance-adaptive, and Generalizable Sequential Decision making

TL;DR

Abstract

Towards More Efficient, Robust, Instance-adaptive, and Generalizable Sequential Decision making

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (125)