Table of Contents
Fetching ...

Issues with Measuring Task Complexity via Random Policies in Robotic Tasks

Reabetswe M. Nkhumise, Mohamed S. Talamali, Aditya Gilra

TL;DR

This work addresses the challenge of quantifying task difficulty in non-tabular reinforcement learning by evaluating RWG-based statistics and information-theoretic metrics PIC and POIC on a family of structurally related robotic reaching tasks with dense and sparse rewards. The authors construct a controlled framework using 1- and 2-link manipulators to test known complexity relationships and train SAC agents to observe real learning difficulty, comparing these results with PIC/POIC measurements. They find that PIC and POIC can misorder task hardness (e.g., a 2-link dense task appearing easier by PIC while empirical RL shows it as harder), suggesting that RWG-based metrics can be misleading in certain task settings and that training dynamics and exploration are not captured by these measures. The study highlights the need for more reliable and interpretable task-complexity metrics for non-tabular robotics, and proposes directions such as incorporating inductive biases, dynamic RWG during learning, and alternative complexity measures to better capture the true difficulty of robotic tasks.

Abstract

Reinforcement learning (RL) has enabled major advances in fields such as robotics and natural language processing. A key challenge in RL is measuring task complexity, which is essential for creating meaningful benchmarks and designing effective curricula. While there are numerous well-established metrics for assessing task complexity in tabular settings, relatively few exist in non-tabular domains. These include (i) Statistical analysis of the performance of random policies via Random Weight Guessing (RWG), and (ii) information-theoretic metrics Policy Information Capacity (PIC) and Policy-Optimal Information Capacity (POIC), which are reliant on RWG. In this paper, we evaluate these methods using progressively difficult robotic manipulation setups, with known relative complexity, with both dense and sparse reward formulations. Our empirical results reveal that measuring complexity is still nuanced. Specifically, under the same reward formulation, PIC suggests that a two-link robotic arm setup is easier than a single-link setup - which contradicts the robotic control and empirical RL perspective whereby the two-link setup is inherently more complex. Likewise, for the same setup, POIC estimates that tasks with sparse rewards are easier than those with dense rewards. Thus, we show that both PIC and POIC contradict typical understanding and empirical results from RL. These findings highlight the need to move beyond RWG-based metrics towards better metrics that can more reliably capture task complexity in non-tabular RL with our task framework as a starting point.

Issues with Measuring Task Complexity via Random Policies in Robotic Tasks

TL;DR

This work addresses the challenge of quantifying task difficulty in non-tabular reinforcement learning by evaluating RWG-based statistics and information-theoretic metrics PIC and POIC on a family of structurally related robotic reaching tasks with dense and sparse rewards. The authors construct a controlled framework using 1- and 2-link manipulators to test known complexity relationships and train SAC agents to observe real learning difficulty, comparing these results with PIC/POIC measurements. They find that PIC and POIC can misorder task hardness (e.g., a 2-link dense task appearing easier by PIC while empirical RL shows it as harder), suggesting that RWG-based metrics can be misleading in certain task settings and that training dynamics and exploration are not captured by these measures. The study highlights the need for more reliable and interpretable task-complexity metrics for non-tabular robotics, and proposes directions such as incorporating inductive biases, dynamic RWG during learning, and alternative complexity measures to better capture the true difficulty of robotic tasks.

Abstract

Reinforcement learning (RL) has enabled major advances in fields such as robotics and natural language processing. A key challenge in RL is measuring task complexity, which is essential for creating meaningful benchmarks and designing effective curricula. While there are numerous well-established metrics for assessing task complexity in tabular settings, relatively few exist in non-tabular domains. These include (i) Statistical analysis of the performance of random policies via Random Weight Guessing (RWG), and (ii) information-theoretic metrics Policy Information Capacity (PIC) and Policy-Optimal Information Capacity (POIC), which are reliant on RWG. In this paper, we evaluate these methods using progressively difficult robotic manipulation setups, with known relative complexity, with both dense and sparse reward formulations. Our empirical results reveal that measuring complexity is still nuanced. Specifically, under the same reward formulation, PIC suggests that a two-link robotic arm setup is easier than a single-link setup - which contradicts the robotic control and empirical RL perspective whereby the two-link setup is inherently more complex. Likewise, for the same setup, POIC estimates that tasks with sparse rewards are easier than those with dense rewards. Thus, we show that both PIC and POIC contradict typical understanding and empirical results from RL. These findings highlight the need to move beyond RWG-based metrics towards better metrics that can more reliably capture task complexity in non-tabular RL with our task framework as a starting point.
Paper Structure (19 sections, 20 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 19 sections, 20 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of manipulators.
  • Figure 2: Learning curves of SAC algorithm across the six tasks. The left panel depicts agent performance in dense-reward settings, while the right panel is in sparse-reward settings. To accommodate wide and varying ranges of steps, results are plotted on a logarithmic scale to enhance interpretability. In the 2-link arm with sparse rewards, SAC results are presented with HER Andrychowicz17 augmentation (SAC+HER) and without it. The results are obtained via evaluation of each task over 5 runs.
  • Figure 3: Performance distribution plots for the tasks: (a) 1-link(L=1.0m), (b) 1-link(L=1.65m), (c) 2-link arms with dense rewards, and (d) 2-link arm with sparse rewards. The left column shows a histogram of mean performances of the random policies (Log-scale histogram of$M_{n}$). The middle column depicts mean performance curves in black, i.e. mean performance $M_{n}$ vs rank $R_{n}$. Moreover, all the cumulative rewards of the policies $s_{a,n,e}$ across the trials are represented by red dots (behind the black curve). The right column displays plots of standard deviation $\sqrt{V_{n}}$ vs mean performance $M_{n}$ (often referred to as variance distribution). The plots were made using $10^{4}$ random policies.
  • Figure 4: 2D-scatter plots with Normalised scores (performance) computed using min-max scaling (Equation \ref{['min-max']}) over the returns of untrained random policies. The Normalised scores are plotted against PIC, POIC, variance of returns, along with entropies of optimality variable and cumulative reward (return) variable.
  • Figure 5: Performance distribution plots for the tasks 1-link(L=1.0m) and 1-link(L=1.65m) arms with sparse rewards. The performance is normalised using min-max scaling to allow tasks to have the same range. Note that $a$ in $S_{a,n,e}$ represents the neural network architecture which is the same across all panels.
  • ...and 4 more figures