Towards Cost Sensitive Decision Making

Yang Li; Junier Oliva

Towards Cost Sensitive Decision Making

Yang Li, Junier Oliva

TL;DR

This work develops a model-based approach, where a deep generative model is utilized to capture the dependencies of the features and impute the unobserved features, and develops hierarchical RL algorithms to resolve both types of the AA-POMDPs.

Abstract

Many real-world situations allow for the acquisition of additional relevant information when making decisions with limited or uncertain data. However, traditional RL approaches either require all features to be acquired beforehand (e.g. in a MDP) or regard part of them as missing data that cannot be acquired (e.g. in a POMDP). In this work, we consider RL models that may actively acquire features from the environment to improve the decision quality and certainty, while automatically balancing the cost of feature acquisition process and the reward of task decision process. We propose the Active-Acquisition POMDP and identify two types of the acquisition process for different application domains. In order to assist the agent in the actively-acquired partially-observed environment and alleviate the exploration-exploitation dilemma, we develop a model-based approach, where a deep generative model is utilized to capture the dependencies of the features and impute the unobserved features. The imputations essentially represent the beliefs of the agent. Equipped with the dynamics model, we develop hierarchical RL algorithms to resolve both types of the AA-POMDPs. Empirical results demonstrate that our approach achieves considerably better performance than existing POMDP-RL solutions.

Towards Cost Sensitive Decision Making

TL;DR

Abstract

Paper Structure (25 sections, 10 equations, 7 figures, 2 tables)

This paper contains 25 sections, 10 equations, 7 figures, 2 tables.

Introduction
Active Acquisition Partially-Observed Markov Decision Process
Methods
Partially Observed Sequence Modeling
Belief State Estimation
Cost Sensitive Reinforcement Learning
Implementation
Related Works
Experiments
Conclusion
Partially Observed Set Models for Sequences (POSS)
Experiment
Fully Observed
Random Acquisition
Batch Acquisition
...and 10 more sections

Figures (7)

Figure 1: Graphical model for modeling a trajectory with 3 time steps. The dashed arrows indicate the inference process, and the solid arrows indicate the generation process.
Figure 2: Illustrations of the batch acquisition process and the sequential acquisition process. The dashed lines indicate the update of the belief. The red arrows represent the hierarchical policy execution processes.
Figure 3: CartPole results
Figure 4: Sepsis results.
Figure 5: Training curve.
...and 2 more figures

Towards Cost Sensitive Decision Making

TL;DR

Abstract

Towards Cost Sensitive Decision Making

Authors

TL;DR

Abstract

Table of Contents

Figures (7)