Dynamic Task Control Method of a Flexible Manipulator Using a Deep Recurrent Neural Network
Kento Kawaharazuka, Toru Ogawa, Cota Nabeshima
TL;DR
Dynamic Task Execution Network (DTXNET) addresses the challenge of controlling flexible, underactuated manipulators by learning state and task transitions from both actuator data and images, enabling real-time, torque-based control for dynamic tasks with sparse cues. The method combines a deep recurrent (LSTM) model with backpropagation through time to optimize control sequences over a horizon $T_{control}$, using a loss $L_{control}$ that emphasizes accurate task-state timing. Multiple input/output configurations are explored, with the Type 3^+ setup (Joint State + Image as input, with robot state output) delivering the best performance in predicting both motion and task signals. Experiments on a Wadaiko drumming task demonstrate improved timing and movement fidelity over random control, validating DTXNET’s ability to manage dynamic tasks for flexible robots and suggesting broad applicability to other soft-robot control problems. The approach offers a flexible, data-driven alternative to precise modeling for dynamic, sparse-event manipulation in soft robotics, with potential for extension to diverse sensors and tasks.
Abstract
The flexible body has advantages over the rigid body in terms of environmental contact thanks to its underactuation. On the other hand, when applying conventional control methods to realize dynamic tasks with the flexible body, there are two difficulties: accurate modeling of the flexible body and the derivation of intermediate postures to achieve the tasks. Learning-based methods are considered to be more effective than accurate modeling, but they require explicit intermediate postures. To solve these two difficulties at the same time, we developed a real-time task control method with a deep recurrent neural network named Dynamic Task Execution Network (DTXNET), which acquires the relationship among the control command, robot state including image information, and task state. Once the network is trained, only the target event and its timing are needed to realize a given task. To demonstrate the effectiveness of our method, we applied it to the task of Wadaiko (traditional Japanese drum) drumming as an example, and verified the best configuration of DTXNET.
