Table of Contents
Fetching ...

Learning Explainable and Better Performing Representations of POMDP Strategies

Alexander Bork, Debraj Chakraborty, Kush Grover, Jan Kretinsky, Stefanie Mohr

TL;DR

This work presents a method to learn an automaton representation of a strategy using a modification of the L*-algorithm, which is dramatically smaller and thus also more explainable than the tabular representation of a strategy.

Abstract

Strategies for partially observable Markov decision processes (POMDP) typically require memory. One way to represent this memory is via automata. We present a method to learn an automaton representation of a strategy using a modification of the L*-algorithm. Compared to the tabular representation of a strategy, the resulting automaton is dramatically smaller and thus also more explainable. Moreover, in the learning process, our heuristics may even improve the strategy's performance. In contrast to approaches that synthesize an automaton directly from the POMDP thereby solving it, our approach is incomparably more scalable.

Learning Explainable and Better Performing Representations of POMDP Strategies

TL;DR

This work presents a method to learn an automaton representation of a strategy using a modification of the L*-algorithm, which is dramatically smaller and thus also more explainable than the tabular representation of a strategy.

Abstract

Strategies for partially observable Markov decision processes (POMDP) typically require memory. One way to represent this memory is via automata. We present a method to learn an automaton representation of a strategy using a modification of the L*-algorithm. Compared to the tabular representation of a strategy, the resulting automaton is dramatically smaller and thus also more explainable. Moreover, in the learning process, our heuristics may even improve the strategy's performance. In contrast to approaches that synthesize an automaton directly from the POMDP thereby solving it, our approach is incomparably more scalable.
Paper Structure (24 sections, 11 figures, 7 tables, 1 algorithm)

This paper contains 24 sections, 11 figures, 7 tables, 1 algorithm.

Figures (11)

  • Figure 1: Running example: POMDP
  • Figure 1: Example strategy table for the POMDP in \ref{['ex:pomdp']}. It only contains observation sequences of length at most 2.
  • Figure 2: Depiction of the FSC learning framework
  • Figure 3: FSC representing the strategy table of \ref{['tab:lookup-table']}.
  • Figure 4: Running example - initial table
  • ...and 6 more figures

Theorems & Definitions (9)

  • definition 1: MDP
  • definition 2: POMDP
  • definition 3: Strategy
  • definition 4: Finite-State Controller
  • definition 5: Strategy Table
  • definition 6: Output Query (OQ)
  • definition 7: Equivalence Query (EQ)
  • definition 8: Learning Table
  • definition 9: Learned FSC