OffRIPP: Offline RL-based Informative Path Planning
Srikar Babu Gadipudi, Srujan Deolasee, Siva Kailas, Wenhao Luo, Katia Sycara, Woojun Kim
TL;DR
This work addresses informative path planning (IPP) under resource constraints by introducing OffRIPP, an offline RL-based IPP framework that trains solely on pre-collected datasets to maximize information gain without environment interaction during training. OffRIPP leverages batch-constrained reinforcement learning with a behavior policy approximator and a Q-function, integrating a GP-based environment model and graph attention to propagate information across a PRM-structured search space. The framework can be plugged into existing online IPP methods (e.g., CAtNIPP, vashisth2024deep) and demonstrates superior performance and fast planning in both 2D light-intensity and 3D fruit identification tasks, including a real-robot experiment. These results illustrate the practical value of offline RL for safe, cost-effective IPP deployment in real-world robotics and lay groundwork for extending to multi-agent IPP scenarios.
Abstract
Informative path planning (IPP) is a crucial task in robotics, where agents must design paths to gather valuable information about a target environment while adhering to resource constraints. Reinforcement learning (RL) has been shown to be effective for IPP, however, it requires environment interactions, which are risky and expensive in practice. To address this problem, we propose an offline RL-based IPP framework that optimizes information gain without requiring real-time interaction during training, offering safety and cost-efficiency by avoiding interaction, as well as superior performance and fast computation during execution -- key advantages of RL. Our framework leverages batch-constrained reinforcement learning to mitigate extrapolation errors, enabling the agent to learn from pre-collected datasets generated by arbitrary algorithms. We validate the framework through extensive simulations and real-world experiments. The numerical results show that our framework outperforms the baselines, demonstrating the effectiveness of the proposed approach.
