Table of Contents
Fetching ...

Object Manipulation in Marine Environments using Reinforcement Learning

Ahmed Nader, Muhayy Ud Din, Mughni Irfan, Irfan Hussain

TL;DR

The paper addresses object manipulation at a dock under marine disturbances by training a Soft Actor-Critic (SAC) DRL agent on a static PyBullet environment and evaluating it in the MBZIRC maritime simulator under WMO sea-state conditions. The approach uses continuous-action control for a USV-mounted robotic arm with a dense reward structure that guides reaching, grasping, and lifting, formalized in the objective $π = argmax_π E_{τ∈π}[∑_{t=0}^{∞} γ^t (R(s_t,a_t,s_{t+1}) + α H(π(·|s_t)))]$. Results show ~93% training success and up to 80% success in sea-state 2 disturbances, indicating that static-environment training can generalize to certain dynamic marine conditions. The work demonstrates a viable path toward marine intervention with robotic manipulation, while noting limitations related to non-static disturbances and calling for future validation in drifting scenarios and real-sea experiments.

Abstract

Performing intervention tasks in the maritime domain is crucial for safety and operational efficiency. The unpredictable and dynamic marine environment makes the intervention tasks such as object manipulation extremely challenging. This study proposes a robust solution for object manipulation from a dock in the presence of disturbances caused by sea waves. To tackle this challenging problem, we apply a deep reinforcement learning (DRL) based algorithm called Soft. Actor-Critic (SAC). SAC employs an actor-critic framework; the actors learn a policy that minimizes an objective function while the critic evaluates the learned policy and provides feedback to guide the actor-learning process. We trained the agent using the PyBullet dynamic simulator and tested it in a realistic simulation environment called MBZIRC maritime simulator. This simulator allows the simulation of different wave conditions according to the World Meteorological Organization (WMO) sea state code. Simulation results demonstrate a high success rate in retrieving the objects from the dock. The trained agent achieved an 80 percent success rate when applied in the simulation environment in the presence of waves characterized by sea state 2, according to the WMO sea state code

Object Manipulation in Marine Environments using Reinforcement Learning

TL;DR

The paper addresses object manipulation at a dock under marine disturbances by training a Soft Actor-Critic (SAC) DRL agent on a static PyBullet environment and evaluating it in the MBZIRC maritime simulator under WMO sea-state conditions. The approach uses continuous-action control for a USV-mounted robotic arm with a dense reward structure that guides reaching, grasping, and lifting, formalized in the objective . Results show ~93% training success and up to 80% success in sea-state 2 disturbances, indicating that static-environment training can generalize to certain dynamic marine conditions. The work demonstrates a viable path toward marine intervention with robotic manipulation, while noting limitations related to non-static disturbances and calling for future validation in drifting scenarios and real-sea experiments.

Abstract

Performing intervention tasks in the maritime domain is crucial for safety and operational efficiency. The unpredictable and dynamic marine environment makes the intervention tasks such as object manipulation extremely challenging. This study proposes a robust solution for object manipulation from a dock in the presence of disturbances caused by sea waves. To tackle this challenging problem, we apply a deep reinforcement learning (DRL) based algorithm called Soft. Actor-Critic (SAC). SAC employs an actor-critic framework; the actors learn a policy that minimizes an objective function while the critic evaluates the learned policy and provides feedback to guide the actor-learning process. We trained the agent using the PyBullet dynamic simulator and tested it in a realistic simulation environment called MBZIRC maritime simulator. This simulator allows the simulation of different wave conditions according to the World Meteorological Organization (WMO) sea state code. Simulation results demonstrate a high success rate in retrieving the objects from the dock. The trained agent achieved an 80 percent success rate when applied in the simulation environment in the presence of waves characterized by sea state 2, according to the WMO sea state code
Paper Structure (12 sections, 10 equations, 6 figures, 3 tables)

This paper contains 12 sections, 10 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 2: The actor-critic framework displaying the actor and critic neural networks and the environment, the actor encapsulates the learned policy while the citric evaluate this actor-learned policy and provide feedback to guide the actor toward an optimal path.
  • Figure 3: Testing setup scene featuring an Unmanned Surface Vehicle (USV) equipped with the Oberon7 arm and its gripper, and the object to be picked.
  • Figure 4: The figure depicts the dynamic interaction between the agent, the simulator, and the inverse kinematics solver at each time step, highlighting the process and sequence of command exchanges.
  • Figure 5: The reward improvement during training, The original reward is represented by the grey line, whereas the smoothed reward, achieved through a window of size 5, is illustrated by the blue line.
  • Figure 6: The change in the distance from the gripper to the object versus time. It shows how the agent approaches the object despite the wave-induced disturbances.
  • ...and 1 more figures