Table of Contents
Fetching ...

Towards Cost Sensitive Decision Making

Yang Li, Junier Oliva

TL;DR

This work develops a model-based approach, where a deep generative model is utilized to capture the dependencies of the features and impute the unobserved features, and develops hierarchical RL algorithms to resolve both types of the AA-POMDPs.

Abstract

Many real-world situations allow for the acquisition of additional relevant information when making decisions with limited or uncertain data. However, traditional RL approaches either require all features to be acquired beforehand (e.g. in a MDP) or regard part of them as missing data that cannot be acquired (e.g. in a POMDP). In this work, we consider RL models that may actively acquire features from the environment to improve the decision quality and certainty, while automatically balancing the cost of feature acquisition process and the reward of task decision process. We propose the Active-Acquisition POMDP and identify two types of the acquisition process for different application domains. In order to assist the agent in the actively-acquired partially-observed environment and alleviate the exploration-exploitation dilemma, we develop a model-based approach, where a deep generative model is utilized to capture the dependencies of the features and impute the unobserved features. The imputations essentially represent the beliefs of the agent. Equipped with the dynamics model, we develop hierarchical RL algorithms to resolve both types of the AA-POMDPs. Empirical results demonstrate that our approach achieves considerably better performance than existing POMDP-RL solutions.

Towards Cost Sensitive Decision Making

TL;DR

This work develops a model-based approach, where a deep generative model is utilized to capture the dependencies of the features and impute the unobserved features, and develops hierarchical RL algorithms to resolve both types of the AA-POMDPs.

Abstract

Many real-world situations allow for the acquisition of additional relevant information when making decisions with limited or uncertain data. However, traditional RL approaches either require all features to be acquired beforehand (e.g. in a MDP) or regard part of them as missing data that cannot be acquired (e.g. in a POMDP). In this work, we consider RL models that may actively acquire features from the environment to improve the decision quality and certainty, while automatically balancing the cost of feature acquisition process and the reward of task decision process. We propose the Active-Acquisition POMDP and identify two types of the acquisition process for different application domains. In order to assist the agent in the actively-acquired partially-observed environment and alleviate the exploration-exploitation dilemma, we develop a model-based approach, where a deep generative model is utilized to capture the dependencies of the features and impute the unobserved features. The imputations essentially represent the beliefs of the agent. Equipped with the dynamics model, we develop hierarchical RL algorithms to resolve both types of the AA-POMDPs. Empirical results demonstrate that our approach achieves considerably better performance than existing POMDP-RL solutions.
Paper Structure (25 sections, 10 equations, 7 figures, 2 tables)

This paper contains 25 sections, 10 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Graphical model for modeling a trajectory with 3 time steps. The dashed arrows indicate the inference process, and the solid arrows indicate the generation process.
  • Figure 2: Illustrations of the batch acquisition process and the sequential acquisition process. The dashed lines indicate the update of the belief. The red arrows represent the hierarchical policy execution processes.
  • Figure 3: CartPole results
  • Figure 4: Sepsis results.
  • Figure 5: Training curve.
  • ...and 2 more figures