Table of Contents
Fetching ...

Accelerated Multi-objective Task Learning using Modified Q-learning Algorithm

Varun Prakash Rajamohan, Senthil Kumar Jagatheesaperumal

TL;DR

This work addresses efficient task learning for robotic cleaning in grid-based environments by introducing Q-SD, a Q-learning variant that adds a scaled Euclidean distance penalty to the update rule to bias actions toward shorter transitions. The method is evaluated on table-cleaning tasks partitioned into 3x3 and 4x4 grids with stationary objects, showing up to 86% and 59% task success and distance reductions of 8.61% and 6.7%, respectively, depending on the distance-scaling factor. A key insight is the trade-off between task learning and movement minimization governed by the scaling parameter $s$, where appropriate values yield faster learning and shorter trajectories, while excessive scaling degrades performance. The work demonstrates the potential of distance-aware reinforcement learning for practical robotic applications and outlines future directions including deeper RL, dynamic environments, and mobile manipulation to extend scalability and robustness.

Abstract

Robots find extensive applications in industry. In recent years, the influence of robots has also increased rapidly in domestic scenarios. The Q-learning algorithm aims to maximise the reward for reaching the goal. This paper proposes a modified version of the Q-learning algorithm, known as Q-learning with scaled distance metric (Q-SD). This algorithm enhances task learning and makes task completion more meaningful. A robotic manipulator (agent) applies the Q-SD algorithm to the task of table cleaning. Using Q-SD, the agent acquires the sequence of steps necessary to accomplish the task while minimising the manipulator's movement distance. We partition the table into grids of different dimensions. The first has a grid count of 3 times 3, and the second has a grid count of 4 times 4. Using the Q-SD algorithm, the maximum success obtained in these two environments was 86% and 59% respectively. Moreover, Compared to the conventional Q-learning algorithm, the drop in average distance moved by the agent in these two environments using the Q-SD algorithm was 8.61% and 6.7% respectively.

Accelerated Multi-objective Task Learning using Modified Q-learning Algorithm

TL;DR

This work addresses efficient task learning for robotic cleaning in grid-based environments by introducing Q-SD, a Q-learning variant that adds a scaled Euclidean distance penalty to the update rule to bias actions toward shorter transitions. The method is evaluated on table-cleaning tasks partitioned into 3x3 and 4x4 grids with stationary objects, showing up to 86% and 59% task success and distance reductions of 8.61% and 6.7%, respectively, depending on the distance-scaling factor. A key insight is the trade-off between task learning and movement minimization governed by the scaling parameter , where appropriate values yield faster learning and shorter trajectories, while excessive scaling degrades performance. The work demonstrates the potential of distance-aware reinforcement learning for practical robotic applications and outlines future directions including deeper RL, dynamic environments, and mobile manipulation to extend scalability and robustness.

Abstract

Robots find extensive applications in industry. In recent years, the influence of robots has also increased rapidly in domestic scenarios. The Q-learning algorithm aims to maximise the reward for reaching the goal. This paper proposes a modified version of the Q-learning algorithm, known as Q-learning with scaled distance metric (Q-SD). This algorithm enhances task learning and makes task completion more meaningful. A robotic manipulator (agent) applies the Q-SD algorithm to the task of table cleaning. Using Q-SD, the agent acquires the sequence of steps necessary to accomplish the task while minimising the manipulator's movement distance. We partition the table into grids of different dimensions. The first has a grid count of 3 times 3, and the second has a grid count of 4 times 4. Using the Q-SD algorithm, the maximum success obtained in these two environments was 86% and 59% respectively. Moreover, Compared to the conventional Q-learning algorithm, the drop in average distance moved by the agent in these two environments using the Q-SD algorithm was 8.61% and 6.7% respectively.
Paper Structure (7 sections, 4 equations, 9 figures, 7 tables, 2 algorithms)

This paper contains 7 sections, 4 equations, 9 figures, 7 tables, 2 algorithms.

Figures (9)

  • Figure 1: $3\times3$ Grid with objects placed at center ($G5$).
  • Figure 2: Simulation environment illustrating a scenario with a grid of dimensions $3\times3$ done using CoppeliaSim rohmer2013v
  • Figure 3: $4\times4$ Grid with objects placed at center ($G6, G7, G10, G11$)
  • Figure 4: Simulation environment illustrating a scenario with a grid of dimensions $4\times4$ done using CoppeliaSim rohmer2013v
  • Figure 5: Agent-Environment interface for a $Q-SD$ algorithm
  • ...and 4 more figures