Table of Contents
Fetching ...

Optimizing Robotic Manipulation with Decision-RWKV: A Recurrent Sequence Modeling Approach for Lifelong Learning

Yujian Dong, Tianyu Wu, Chaoyang Song

TL;DR

The Decision-RWKV (DRWKV) model is introduced and extensive experiments using the D4RL database within the OpenAI Gym environment and on the D'Claw platform are conducted to assess the DRWKV model's performance in single-task tests and lifelong learning scenarios, showcasing its ability to handle multiple subtasks efficiently.

Abstract

Models based on the Transformer architecture have seen widespread application across fields such as natural language processing, computer vision, and robotics, with large language models like ChatGPT revolutionizing machine understanding of human language and demonstrating impressive memory and reproduction capabilities. Traditional machine learning algorithms struggle with catastrophic forgetting, which is detrimental to the diverse and generalized abilities required for robotic deployment. This paper investigates the Receptance Weighted Key Value (RWKV) framework, known for its advanced capabilities in efficient and effective sequence modeling, and its integration with the decision transformer and experience replay architectures. It focuses on potential performance enhancements in sequence decision-making and lifelong robotic learning tasks. We introduce the Decision-RWKV (DRWKV) model and conduct extensive experiments using the D4RL database within the OpenAI Gym environment and on the D'Claw platform to assess the DRWKV model's performance in single-task tests and lifelong learning scenarios, showcasing its ability to handle multiple subtasks efficiently. The code for all algorithms, training, and image rendering in this study is open-sourced at https://github.com/ancorasir/DecisionRWKV.

Optimizing Robotic Manipulation with Decision-RWKV: A Recurrent Sequence Modeling Approach for Lifelong Learning

TL;DR

The Decision-RWKV (DRWKV) model is introduced and extensive experiments using the D4RL database within the OpenAI Gym environment and on the D'Claw platform are conducted to assess the DRWKV model's performance in single-task tests and lifelong learning scenarios, showcasing its ability to handle multiple subtasks efficiently.

Abstract

Models based on the Transformer architecture have seen widespread application across fields such as natural language processing, computer vision, and robotics, with large language models like ChatGPT revolutionizing machine understanding of human language and demonstrating impressive memory and reproduction capabilities. Traditional machine learning algorithms struggle with catastrophic forgetting, which is detrimental to the diverse and generalized abilities required for robotic deployment. This paper investigates the Receptance Weighted Key Value (RWKV) framework, known for its advanced capabilities in efficient and effective sequence modeling, and its integration with the decision transformer and experience replay architectures. It focuses on potential performance enhancements in sequence decision-making and lifelong robotic learning tasks. We introduce the Decision-RWKV (DRWKV) model and conduct extensive experiments using the D4RL database within the OpenAI Gym environment and on the D'Claw platform to assess the DRWKV model's performance in single-task tests and lifelong learning scenarios, showcasing its ability to handle multiple subtasks efficiently. The code for all algorithms, training, and image rendering in this study is open-sourced at https://github.com/ancorasir/DecisionRWKV.
Paper Structure (21 sections, 3 equations, 7 figures, 1 table)

This paper contains 21 sections, 3 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: An overview of our algorithm architecture. For a single task i, we employ RWKV blocks for sequential decision-making learning. During continual learning across multiple tasks, we maintain the model's memory capabilities across various tasks by incorporating a replay buffer to utilize the experience replay method.
  • Figure 2: The structure of the DRWKV model, where A represents action, referring to the robot's motion information; S represents state, indicating the current state information; R stands for return, which is the reward information obtained. We introduce DRWKV by utilizing the RWKV block, which incorporates Time-mix and Channel-mix, as a token-mixing module instead of the self-attention module in DT.
  • Figure 3: Simulation Experimental Environment Overview. We utilize the D4RL dataset for single-task offline reinforcement learning and the D'Claw dataset for lifelong robot learning tasks.
  • Figure 4: During training, the loss of different decision models changes with the number of update steps obtained by averaging over three random seeds.
  • Figure 5: The memory consumption consumption of different decision models varies with the length of the input sequence.
  • ...and 2 more figures