A Model-free Biomimetics Algorithm for Deterministic Partially Observable Markov Decision Process

Yide Yu; Yue Liu; Xiaochen Yuan; Dennis Wong; Huijie Li; Yan Ma

A Model-free Biomimetics Algorithm for Deterministic Partially Observable Markov Decision Process

Yide Yu, Yue Liu, Xiaochen Yuan, Dennis Wong, Huijie Li, Yan Ma

TL;DR

The paper addresses decision making under uncertainty in DET-POMDPs, a deterministic-state, many-to-one observation subclass of POMDPs that challenges model-free solutions. It introduces BIOMAP, a Desert-Ant inspired biomimetic algorithm that builds a Compact Vector Graph and uses a MDP-Graph-Automaton framework to transform DET-POMDPs into fully observable MDPs, enabling stable policy derivation via shortest-path methods. The work formalizes the problem, analyzes the Cognitive Fog bias through Q-value variance, and provides a Boundary Arbiter to prevent unbounded exploration, culminating in a four-phase BIOMAP pipeline with complexity guarantees. Experimental results on a Masking Cliff Walking environment show BIOMAP achieving competitive or superior performance to several model-based solvers, demonstrating robustness to environmental deception and potential for reliable deployment in DET-POMDP-prone domains.

Abstract

Partially Observable Markov Decision Process (POMDP) is a mathematical framework for modeling decision-making under uncertainty, where the agent's observations are incomplete and the underlying system dynamics are probabilistic. Solving the POMDP problem within the model-free paradigm is challenging for agents due to the inherent difficulty in accurately identifying and distinguishing between states and observations. We define such a difficult problem as a DETerministic Partially Observable Markov Decision Process (DET-POMDP) problem, which is a specific setting of POMDP. In this problem, states and observations are in a many-to-one relationship. The state is obscured, and its relationship is less apparent to the agent. This creates obstacles for the agent to infer the state through observations. To effectively address this problem, we convert DET-POMDP into a fully observable MDP using a model-free biomimetics algorithm called BIOMAP. BIOMAP is based on the MDP Graph Automaton framework to distinguish authentic environmental information from fraudulent data. Thus, it enhances the agent's ability to develop stable policies against DET-POMDP. The experimental results highlight the superior capabilities of BIOMAP in maintaining operational effectiveness and environmental reparability in the presence of environmental deceptions when compared with existing POMDP solvers. This research opens up new avenues for the deployment of reliable POMDP-based systems in fields that are particularly susceptible to DET-POMDP problems.

A Model-free Biomimetics Algorithm for Deterministic Partially Observable Markov Decision Process

TL;DR

Abstract

A Model-free Biomimetics Algorithm for Deterministic Partially Observable Markov Decision Process

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (19)