When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making

Jun Liu; Pu Zhao; Zhenglun Kong; Xuan Shen; Peiyan Dong; Fan Yang; Lin Cui; Hao Tang; Geng Yuan; Wei Niu; Wenbin Zhang; Xue Lin; Gaowen Liu; Yanzhi Wang; Dong Huang

When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making

Jun Liu, Pu Zhao, Zhenglun Kong, Xuan Shen, Peiyan Dong, Fan Yang, Lin Cui, Hao Tang, Geng Yuan, Wei Niu, Wenbin Zhang, Xue Lin, Gaowen Liu, Yanzhi Wang, Dong Huang

Abstract

Embodied robotic systems increasingly rely on large language model (LLM)-based agents to support high-level reasoning, planning, and decision-making during interactions with the environment. However, invoking LLM reasoning introduces substantial computational latency and resource overhead, which can interrupt action execution and reduce system reliability. Excessive reasoning may delay actions, while insufficient reasoning often leads to incorrect decisions and task failures. This raises a fundamental question for embodied agents: when should the agent reason, and when should it act? In this work, we propose RARRL (Resource-Aware Reasoning via Reinforcement Learning), a hierarchical framework for resource-aware orchestration of embodied agents. Rather than learning low-level control policies, RARRL learns a high-level orchestration policy that operates at the agent's decision-making layer. This policy enables the agent to adaptively determine whether to invoke reasoning, which reasoning role to employ, and how much computational budget to allocate based on current observations, execution history, and remaining resources. Extensive experiments, including evaluations with empirical latency profiles derived from the ALFRED benchmark, show that RARRL consistently improves task success rates while reducing execution latency and enhancing robustness compared with fixed or heuristic reasoning strategies. These results demonstrate that adaptive reasoning control is essential for building reliable and efficient embodied robotic agents.

When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making

Abstract

Paper Structure (33 sections, 5 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 33 sections, 5 equations, 5 figures, 4 tables, 1 algorithm.

INTRODUCTION
Related Work
The Proposed Method
Robot Model
Task Definition
State
Action Space
State Transition and Latency Modeling
Reward Function
Adaptive Reasoning Control
Architecture Overview
RL-Based Orchestration
Execution and Reasoning Interface
Training Signal and Abstraction Note
Problem Formulation
...and 18 more sections

Figures (5)

Figure 1: Overview of the proposed single-agent embodied architecture. A reinforcement learning (RL) policy operates at the decision-making layer to regulate when the agent should act directly and when to invoke expensive LLM-based reasoning modules under resource constraints. The reward signal is used during training to update the orchestration policy based on task outcome and execution latency.
Figure 2: Decision process and training pipeline of the proposed orchestration policy. The embodied agent observes the current task state, execution history, and remaining computational budget, which are encoded and provided to a reinforcement learning--based policy. At each decision step, the policy determines whether to execute a high-level action directly or to invoke an LLM-based reasoning module. Execution feedback, including task outcome and execution latency, is used to compute rewards and update the orchestration policy during training.
Figure 3: Illustration of a representative multi-step embodied robotic task. The agent performs navigation, inspection, and delivery stages, during which reasoning demands vary. Complex stages benefit from high-level reasoning, while routine stages favor direct execution, motivating adaptive reasoning orchestration.
Figure 4: Performance ceiling analysis. Each point shows mean task success over 5 random seeds with standard deviation. Execution and reasoning strength jointly determine the attainable performance ceiling, while adaptive orchestration enables closer approach to this ceiling.
Figure 5: Robustness to latency uncertainty. Task success rate under increasing execution and reasoning latency variability. Points show mean performance over 5 random seeds, with error bars indicating standard deviation. The proposed orchestration policy degrades more gracefully than heuristic strategies as latency uncertainty increases.

When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making

Abstract

When Should a Robot Think? Resource-Aware Reasoning via Reinforcement Learning for Embodied Robotic Decision-Making

Authors

Abstract

Table of Contents

Figures (5)