Task-Agnostic Learning to Accomplish New Tasks
Xianqi Zhang, Xingtao Wang, Xu Liu, Wenrui Wang, Xiaopeng Fan, Debin Zhao
TL;DR
The paper tackles the challenge of generalizing to novel task compositions in robotics by separating task knowledge from task-specific optimization. It introduces Task-Agnostic Learning (TAL), a four-stage framework that (1) builds a knowledge graph through task-agnostic exploration, (2) learns task-agnostic action features with an Action Feature Extractor, (3) generates candidate action sets via a Candidate Action Generator, and (4) plans actions using an Action Proposal Network. The approach achieves substantial improvements over offline RL and imitation-learning baselines in a virtual indoor scene, with TAL reaching an average success rate of 45.78% on Dataset-I and 29.96% on Dataset-II, outperforming competitors by significant margins. These results suggest that learning fragmented, task-agnostic knowledge and composing it at planning time can enable robust generalization to unseen task configurations in robotic manipulation, potentially reducing the need for task-specific rewards or expert demonstrations.
Abstract
Reinforcement Learning (RL) and Imitation Learning (IL) have made great progress in robotic decision-making in recent years. However, these methods show obvious deterioration for new tasks that need to be completed through new combinations of actions. RL methods suffer from reward functions and distribution shifts, while IL methods are limited by expert demonstrations which do not cover new tasks. In contrast, humans can easily complete these tasks with the fragmented knowledge learned from task-agnostic experience. Inspired by this observation, this paper proposes a task-agnostic learning method (TAL for short) that can learn fragmented knowledge only from task-agnostic data to accomplish new tasks. TAL consists of four stages. First, the task-agnostic exploration is performed to collect data from interactions with the environment. The collected data is organized via a knowledge graph. Second, an action feature extractor is proposed and trained using the collected knowledge graph data for task-agnostic fragmented knowledge learning. Third, a candidate action generator is designed, which applies the action feature extractor on a new task to generate multiple candidate action sets. Finally, an action proposal network is designed to produce the probabilities for actions in a new task according to the environmental information. The probabilities are then used to generate order information for selecting actions to be executed from multiple candidate action sets to form the plan. Experiments on a virtual indoor scene show that the proposed method outperforms the state-of-the-art offline RL methods and IL methods by more than 20%.
