Energy-Aware Reinforcement Learning for Robotic Manipulation of Articulated Components in Infrastructure Operation and Maintenance

Xiaowen Tao; Yinuo Wang; Haitao Ding; Yuanyang Qi; Ziyu Song

Energy-Aware Reinforcement Learning for Robotic Manipulation of Articulated Components in Infrastructure Operation and Maintenance

Xiaowen Tao, Yinuo Wang, Haitao Ding, Yuanyang Qi, Ziyu Song

TL;DR

This work addresses the challenge of energy-efficient robotic manipulation of articulated infrastructure components for long-term operation and maintenance. It introduces an articulation-agnostic perception pipeline coupled with a CMDP-based reinforcement learning controller, trained with a Lagrangian-constrained Soft Actor-Critic to explicitly regulate actuation energy. The approach demonstrates consistent energy savings (roughly $16$–$30\%$) and shorter manipulation trajectories (roughly $16$–$32\%$ fewer steps) across door, drawer, and valve tasks, while maintaining high task success rates. The combination of functional, part-guided perception and energy-aware constrained optimization offers a scalable path toward sustainable, autonomous infrastructure O&M systems with strong reliability guarantees.

Abstract

With the growth of intelligent civil infrastructure and smart cities, operation and maintenance (O&M) increasingly requires safe, efficient, and energy-conscious robotic manipulation of articulated components, including access doors, service drawers, and pipeline valves. However, existing robotic approaches either focus primarily on grasping or target object-specific articulated manipulation, and they rarely incorporate explicit actuation energy into multi-objective optimisation, which limits their scalability and suitability for long-term deployment in real O&M settings. Therefore, this paper proposes an articulation-agnostic and energy-aware reinforcement learning framework for robotic manipulation in intelligent infrastructure O&M. The method combines part-guided 3D perception, weighted point sampling, and PointNet-based encoding to obtain a compact geometric representation that generalises across heterogeneous articulated objects. Manipulation is formulated as a Constrained Markov Decision Process (CMDP), in which actuation energy is explicitly modelled and regulated via a Lagrangian-based constrained Soft Actor-Critic scheme. The policy is trained end-to-end under this CMDP formulation, enabling effective articulated-object operation while satisfying a long-horizon energy budget. Experiments on representative O&M tasks demonstrate 16%-30% reductions in energy consumption, 16%-32% fewer steps to success, and consistently high success rates, indicating a scalable and sustainable solution for infrastructure O&M manipulation.

Energy-Aware Reinforcement Learning for Robotic Manipulation of Articulated Components in Infrastructure Operation and Maintenance

TL;DR

–

) and shorter manipulation trajectories (roughly

–

fewer steps) across door, drawer, and valve tasks, while maintaining high task success rates. The combination of functional, part-guided perception and energy-aware constrained optimization offers a scalable path toward sustainable, autonomous infrastructure O&M systems with strong reliability guarantees.

Abstract

Paper Structure (24 sections, 1 theorem, 33 equations, 6 figures, 7 tables, 1 algorithm)

This paper contains 24 sections, 1 theorem, 33 equations, 6 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Robotic Manipulation in Infrastructure Operation and Maintenance
Articulated Object Manipulation Methods
Functional Perception and Part-guided Manipulation
Energy-aware and Constrained Reinforcement Learning
Methodology
Overall Framework
Part-Guided Articulation-Agnostic Perception
Energy-Aware Constrained Reinforcement Learning Formulation
Optimisation Strategy and Theoretical Properties
Implementation
Environment Setup
RL MDP Details
Model Architecture Details
...and 9 more sections

Key Result

proposition 1

Suppose that $J_r(\pi)$ and $J_c(\pi)$ are bounded and Lipschitz-continuous with respect to the policy parameters, and that the actor and dual step sizes $\eta_\pi$ and $\eta_\lambda$ are chosen sufficiently small. Then the stochastic primal--dual updates associated with eq:actor_loss and eq:lambda_ In particular, the average constraint violation vanishes as $T\to\infty$, and any limit point of $\

Figures (6)

Figure 1: Representative real-world O&M scenarios: (a) cabinets with doors, widely present in industrial workshops, power distribution facilities, device rooms, and hazardous-material storage; (b) drawers and trays, commonly used in warehousing, logistics storage, maintenance workshops, and data-center equipment modules; and (c) pipeline valves, frequently encountered in oil and gas transportation, process-control stations, industrial automation lines, and water-treatment infrastructure. These scenarios illustrate the practical articulated components routinely operated in civil and infrastructure O&M environments.
Figure 2: Overview of the proposed energy-aware, articulation-agnostic, end-to-end RL manipulation framework. We integrate RGB-D part segmentation, masked point-cloud sampling, and PointNet-based visual encoding with robot proprioceptive states. A constrained SAC controller enforces an explicit energy constraint via a Lagrangian mechanism, allowing the policy to achieve effective articulated-object manipulation while regulating actuation energy consumption.
Figure 3: Tasks in simulated environments.
Figure 4: Some of the objects from 3 categories used in training and evaluation. All objects used in evaluation are not used in training, which are unseen to the pretrained models.
Figure 5: Total energy cost during training for the three tasks. For each checkpoint, the central box represents the interquartile range (IQR), spanning from the first quartile (Q1) to the third quartile (Q3), the horizontal line inside the box denotes the mean total energy cost per episode for Constrained SAC and SAC, and the whiskers extend from the edges of the box to the smallest and largest values from Q1 and Q3, respectively; the overlaid lines indicate the temporal evolution of the corresponding averages.
...and 1 more figures

Theorems & Definitions (2)

definition 1: Energy-feasible policy
proposition 1: Lyapunov decrease and asymptotic constraint satisfaction

Energy-Aware Reinforcement Learning for Robotic Manipulation of Articulated Components in Infrastructure Operation and Maintenance

TL;DR

Abstract

Energy-Aware Reinforcement Learning for Robotic Manipulation of Articulated Components in Infrastructure Operation and Maintenance

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (2)