Neural Value Iteration
Yang You, Ufuk Çakır, Alex Schutz, Robert Skilton, Nick Hawes
TL;DR
This work tackles the scalability barrier of offline POMDP planning by exploiting the PWLC structure to represent the value function as a finite set of neural networks, each encoding an α-vector. It introduces the Finite Network Controller (FNC) and Neural Value Iteration (NVI), which perform Bellman backups on neural α-vectors, enabling high-performance planning in domains with hundreds of millions of states (e.g., RockSample$(20,20)$). Empirically, NVI matches or outperforms existing offline methods on challenging benchmarks, offering near-optimal policies with compact representations and avoiding full belief-space discretization. The results suggest a viable path toward deep offline POMDP planning that scales to real-world, large-scale problems while preserving theoretical value-iteration foundations.
Abstract
The value function of a POMDP exhibits the piecewise-linear-convex (PWLC) property and can be represented as a finite set of hyperplanes, known as $α$-vectors. Most state-of-the-art POMDP solvers (offline planners) follow the point-based value iteration scheme, which performs Bellman backups on $α$-vectors at reachable belief points until convergence. However, since each $α$-vector is $|S|$-dimensional, these methods quickly become intractable for large-scale problems due to the prohibitive computational cost of Bellman backups. In this work, we demonstrate that the PWLC property allows a POMDP's value function to be alternatively represented as a finite set of neural networks. This insight enables a novel POMDP planning algorithm called \emph{Neural Value Iteration}, which combines the generalization capability of neural networks with the classical value iteration framework. Our approach achieves near-optimal solutions even in extremely large POMDPs that are intractable for existing offline solvers.
