Reward-Centered ReST-MCTS: A Robust Decision-Making Framework for Robotic Manipulation in High Uncertainty Environments

Xibai Wang

Reward-Centered ReST-MCTS: A Robust Decision-Making Framework for Robotic Manipulation in High Uncertainty Environments

Xibai Wang

TL;DR

The paper tackles robust decision-making for robotic manipulation under high uncertainty, where traditional MCTS struggles due to reliance on final rewards. It introduces Reward-Centered ReST-MCTS, which adds intermediate rewards through a Rewarding Center that combines rule-based validation, heuristic guidance, and neural estimation, modifying the Q-function update to $Q(s, a) = R_c(s) + \gamma \sum_{s'} P(s'|s,a) V(s')$ and defining $R_c(s) = \alpha R_{rule}(s) + \beta R_{heuristic}(s) + \gamma R_{neural}(s)$. The approach enables real-time search refinement, reduces error propagation, and demonstrates a 2–4% accuracy gain over baselines while maintaining feasible runtimes, with robustness under varying uncertainty. The framework shows promising applicability beyond robotics to domains requiring high-uncertainty planning and can be extended via meta-learning, retrieval-augmented search, and Bayesian reward estimation, offering a generalizable pathway for adaptive, intermediate-feedback–driven decision-making.

Abstract

Monte Carlo Tree Search (MCTS) has emerged as a powerful tool for decision-making in robotics, enabling efficient exploration of large search spaces. However, traditional MCTS methods struggle in environments characterized by high uncertainty and noisy data due to their reliance on final-step reward evaluation. The lack of intermediate feedback during search often results in suboptimal decision-making and computational inefficiencies. This paper introduces Reward-Centered ReST-MCTS, a novel framework that enhances MCTS by incorporating intermediate reward shaping. The core of our approach is the Rewarding Center, which refines search trajectories by dynamically assigning partial rewards using rule-based validation, heuristic guidance, and neural estimation. By integrating these mechanisms, our method enables real-time optimization of search paths, mitigating the effects of error propagation. We evaluate Reward-Centered ReST-MCTS in robotic manipulation tasks under high uncertainty, demonstrating consistent improvements in decision accuracy. Compared to baseline methods, including Chain-of-Thought (CoT) prompting and Vanilla ReST-MCTS, our framework achieves a 2-4% accuracy improvement while maintaining computational feasibility. Ablation studies confirm the effectiveness of intermediate feedback in search refinement, particularly in pruning incorrect decision paths early. Furthermore, robustness tests show that our method retains high performance across varying levels of uncertainty.

Reward-Centered ReST-MCTS: A Robust Decision-Making Framework for Robotic Manipulation in High Uncertainty Environments

TL;DR

Abstract

Reward-Centered ReST-MCTS: A Robust Decision-Making Framework for Robotic Manipulation in High Uncertainty Environments

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)