Table of Contents
Fetching ...

Optimized Monte Carlo Tree Search for Enhanced Decision Making in the FrozenLake Environment

Esteban Aldana Guerra

TL;DR

The results demonstrate that the optimized MCTS implementation applied to the FrozenLake environment effectively maximizes rewards and success rates while minimizing convergence time, outperforming baseline methods, especially in environments with inherent randomness.

Abstract

Monte Carlo Tree Search (MCTS) is a powerful algorithm for solving complex decision-making problems. This paper presents an optimized MCTS implementation applied to the FrozenLake environment, a classic reinforcement learning task characterized by stochastic transitions. The optimization leverages cumulative reward and visit count tables along with the Upper Confidence Bound for Trees (UCT) formula, resulting in efficient learning in a slippery grid world. We benchmark our implementation against other decision-making algorithms, including MCTS with Policy and Q-Learning, and perform a detailed comparison of their performance. The results demonstrate that our optimized approach effectively maximizes rewards and success rates while minimizing convergence time, outperforming baseline methods, especially in environments with inherent randomness.

Optimized Monte Carlo Tree Search for Enhanced Decision Making in the FrozenLake Environment

TL;DR

The results demonstrate that the optimized MCTS implementation applied to the FrozenLake environment effectively maximizes rewards and success rates while minimizing convergence time, outperforming baseline methods, especially in environments with inherent randomness.

Abstract

Monte Carlo Tree Search (MCTS) is a powerful algorithm for solving complex decision-making problems. This paper presents an optimized MCTS implementation applied to the FrozenLake environment, a classic reinforcement learning task characterized by stochastic transitions. The optimization leverages cumulative reward and visit count tables along with the Upper Confidence Bound for Trees (UCT) formula, resulting in efficient learning in a slippery grid world. We benchmark our implementation against other decision-making algorithms, including MCTS with Policy and Q-Learning, and perform a detailed comparison of their performance. The results demonstrate that our optimized approach effectively maximizes rewards and success rates while minimizing convergence time, outperforming baseline methods, especially in environments with inherent randomness.
Paper Structure (15 sections, 1 equation, 3 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 1 equation, 3 figures, 1 table, 1 algorithm.

Figures (3)

  • Figure 1: Average Reward per Episode (Smoothed) Comparison
  • Figure 2: Convergence Rate (Smoothed Steps per Episode) Comparison
  • Figure 3: Success Rate per Episode Comparison