Model-based Policy Optimization using Symbolic World Model
Andrey Gorodetskiy, Konstantin Mironov, Aleksandr Panov
TL;DR
The paper addresses sample inefficiency in robotics by introducing a transformer-generated symbolic world model for model-based policy optimization. A collection of per-coordinate symbolic expressions, refined with BFGS, enables short synthetic rollouts that train a SAC policy, yielding improved data efficiency over model-free baselines and several MBPR methods in simulated tasks. The approach emphasizes interpretability of the dynamics and demonstrates performance gains on continuous control problems, while acknowledging scalability and inference challenges for high-dimensional systems. Overall, it suggests a promising direction for combining symbolic regression with MBRL to achieve data-efficient, interpretable control in robotics, with future work aimed at scalability and integration with more dynamic modeling components.
Abstract
The application of learning-based control methods in robotics presents significant challenges. One is that model-free reinforcement learning algorithms use observation data with low sample efficiency. To address this challenge, a prevalent approach is model-based reinforcement learning, which involves employing an environment dynamics model. We suggest approximating transition dynamics with symbolic expressions, which are generated via symbolic regression. Approximation of a mechanical system with a symbolic model has fewer parameters than approximation with neural networks, which can potentially lead to higher accuracy and quality of extrapolation. We use a symbolic dynamics model to generate trajectories in model-based policy optimization to improve the sample efficiency of the learning algorithm. We evaluate our approach across various tasks within simulated environments. Our method demonstrates superior sample efficiency in these tasks compared to model-free and model-based baseline methods.
