Table of Contents
Fetching ...

Optimal Sensor and Actuator Selection for Factored Markov Decision Processes: Complexity, Approximability and Algorithms

Jayanth Bhargav, Mahsa Ghasemi, Shreyas Sundaram

TL;DR

This paper forms the problem of selecting a set of sensors for fMDPs (under a budget) to maximize the infinite-horizon discounted return provided by the optimal policy, and shows that it is NP-hard to approximate this problem to within any non-trivial factor.

Abstract

Factored Markov Decision Processes (fMDPs) are a class of Markov Decision Processes (MDPs) in which the states (and actions) can be factored into a set of state (and action) variables and can be encoded compactly using a factored representation. In this paper, we consider a setting where the state of the fMDP is not directly observable, and the agent relies on a set of potential sensors to gather information. Each sensor has a selection cost and the designer must select a subset of sensors under a limited budget. We formulate the problem of selecting a set of sensors for fMDPs (under a budget) to maximize the infinite-horizon discounted return provided by the optimal policy. We show the fundamental result that it is NP-hard to approximate this problem to within any non-trivial factor. Our inapproximability results for optimal sensor selection also extend to a general class of Partially Observable MDPs (POMDPs). We then study the dual problem of budgeted actuator selection (at design-time) to maximize the expected return under the optimal policy. Again, we show that it is NP-hard to approximate this problem to within any non-trivial factor. Furthermore, with explicit examples, we show the failure of greedy algorithms for both the sensor and actuator selection problems and provide insights into the factors that cause these problems to be challenging. Despite this, through extensive simulations, we show the practical effectiveness and near-optimal performance of the greedy algorithm for actuator and sensor selection in many real-world and randomly generated instances.

Optimal Sensor and Actuator Selection for Factored Markov Decision Processes: Complexity, Approximability and Algorithms

TL;DR

This paper forms the problem of selecting a set of sensors for fMDPs (under a budget) to maximize the infinite-horizon discounted return provided by the optimal policy, and shows that it is NP-hard to approximate this problem to within any non-trivial factor.

Abstract

Factored Markov Decision Processes (fMDPs) are a class of Markov Decision Processes (MDPs) in which the states (and actions) can be factored into a set of state (and action) variables and can be encoded compactly using a factored representation. In this paper, we consider a setting where the state of the fMDP is not directly observable, and the agent relies on a set of potential sensors to gather information. Each sensor has a selection cost and the designer must select a subset of sensors under a limited budget. We formulate the problem of selecting a set of sensors for fMDPs (under a budget) to maximize the infinite-horizon discounted return provided by the optimal policy. We show the fundamental result that it is NP-hard to approximate this problem to within any non-trivial factor. Our inapproximability results for optimal sensor selection also extend to a general class of Partially Observable MDPs (POMDPs). We then study the dual problem of budgeted actuator selection (at design-time) to maximize the expected return under the optimal policy. Again, we show that it is NP-hard to approximate this problem to within any non-trivial factor. Furthermore, with explicit examples, we show the failure of greedy algorithms for both the sensor and actuator selection problems and provide insights into the factors that cause these problems to be challenging. Despite this, through extensive simulations, we show the practical effectiveness and near-optimal performance of the greedy algorithm for actuator and sensor selection in many real-world and randomly generated instances.
Paper Structure (19 sections, 7 theorems, 13 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 19 sections, 7 theorems, 13 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Lemma 1

For the MDP $\Bar{\mathcal{M}}$ defined in Example 1, the following holds for any $\gamma\in (0,1)$:

Figures (5)

  • Figure 1: Sensors deployed in an environment where a team of mobile robots are collectively performing tasks.
  • Figure 2: A distributed micro-grid network islam2021control.
  • Figure 3: State transition diagram of $\Bar{\mathcal{M}}$.
  • Figure 4: Reduction from SetCover to fMDP-SS/ fMDP-AS: The reward of an MDP in Layer 1 depends on its own states and actions. The rewards of the MDPs in Layers 2 and 3 depend on the states of all the MDPs in Layer 1.
  • Figure 5: Empirical evaluation of greedy algorithm for fMDP sensor and actuator selection problems

Theorems & Definitions (21)

  • Remark 1
  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • ...and 11 more