An Active Parameter Learning Approach to The Identification of Safe Regions
Aneesh Raghavan, Karl H Johansson
TL;DR
The paper addresses identifying safe regions in an unknown environment by actively learning region-specific trust probabilities $p_j$ through visits to Voronoi centers. It models observations as Bernoulli variables and uses a large deviations framework to quantify the estimation error, formulating a finite-horizon stochastic control problem that is then relaxed to a tractable one-step optimization. An active parameter-learning algorithm combined with a robust, threshold-based classification rule is developed, guaranteeing finite visits to unsafe regions under reasonable assumptions. A numerical example demonstrates the method’s ability to correctly classify safety and reveal a practical path for exploration. Future work includes characterizing convergence rates and extending the observation model beyond Bernoulli to more general exponential-family forms.
Abstract
We consider the problem of identification of safe regions in the environment of an autonomous system. The environment is divided into a finite collections of Voronoi cells, with each cell having a representative, the Voronoi center. The extent to which each region is considered to be safe by an oracle is captured through a trust distribution. The trust placed by the oracle conditioned on the region is modeled through a Bernoulli distribution whose the parameter depends on the region. The parameters are unknown to the system. However, if the agent were to visit a given region, it will receive a binary valued random response from the oracle on whether the oracle trusts the region or not. The objective is to design a path for the agent where, by traversing through the centers of the cells, the agent is eventually able to label each cell safe or unsafe. To this end, we formulate an active parameter learning problem with the objective of minimizing visits or stays in potentially unsafe regions. The active learning problem is formulated as a finite horizon stochastic control problem where the cost function is derived utilizing the large deviations principle (LDP). The challenges associated with a dynamic programming approach to solve the problem are analyzed. Subsequently, the optimization problem is relaxed to obtain single-step optimization problems for which closed form solution is obtained. Using the solution, we propose an algorithm for the active learning of the parameters. A relationship between the trust distributions and the label of a cell is defined and subsequently a classification algorithm is proposed to identify the safe regions. We prove that the algorithm identifies the safe regions with finite number of visits to unsafe regions. We demonstrate the algorithm through an example.
