Neural Style Transfer with Twin-Delayed DDPG for Shared Control of Robotic Manipulators

Raul Fernandez-Fernandez; Marco Aggravi; Paolo Robuffo Giordano; Juan G. Victores; Claudio Pacchierotti

Neural Style Transfer with Twin-Delayed DDPG for Shared Control of Robotic Manipulators

Raul Fernandez-Fernandez, Marco Aggravi, Paolo Robuffo Giordano, Juan G. Victores, Claudio Pacchierotti

TL;DR

This work extends Neural Style Transfer to robotic manipulation by encoding Content as trajectory structure and Style as motion quality, and then using a TD3-based policy to execute style-transferred motions. The NPST3 framework integrates an autoencoder-based loss network with a TD3 controller to produce velocity commands that apply a target style to a given Content trajectory, enabling both offline autonomous operation and online teleoperation. A four-emotion style set (angry, happy, calm, sad) is learned from human demonstrations, and a 73-subject study demonstrates that the transferred motions convey recognizable emotions. The approach broadens NST's applicability to continuous control, showing potential for personalized, expressive robot behavior across a range of applications while highlighting directions for extension to more robot types and motion modalities.

Abstract

Neural Style Transfer (NST) refers to a class of algorithms able to manipulate an element, most often images, to adopt the appearance or style of another one. Each element is defined as a combination of Content and Style: the Content can be conceptually defined as the what and the Style as the how of said element. In this context, we propose a custom NST framework for transferring a set of styles to the motion of a robotic manipulator, e.g., the same robotic task can be carried out in an angry, happy, calm, or sad way. An autoencoder architecture extracts and defines the Content and the Style of the target robot motions. A Twin Delayed Deep Deterministic Policy Gradient (TD3) network generates the robot control policy using the loss defined by the autoencoder. The proposed Neural Policy Style Transfer TD3 (NPST3) alters the robot motion by introducing the trained style. Such an approach can be implemented either offline, for carrying out autonomous robot motions in dynamic environments, or online, for adapting at runtime the style of a teleoperated robot. The considered styles can be learned online from human demonstrations. We carried out an evaluation with human subjects enrolling 73 volunteers, asking them to recognize the style behind some representative robotic motions. Results show a good recognition rate, proving that it is possible to convey different styles to a robot using this approach.

Neural Style Transfer with Twin-Delayed DDPG for Shared Control of Robotic Manipulators

TL;DR

Abstract

Paper Structure (17 sections, 6 equations, 7 figures, 1 table)

This paper contains 17 sections, 6 equations, 7 figures, 1 table.

Introduction
Background and Preliminaries
Style Transfer
Deep Reinforcement Learning
Framework
Inputs
Autoencoder network: the loss network
Constraints
TD3 Policy network: the execution network
Outputs
Training
Experiments
Subjects
Methods and Task
Results
...and 2 more sections

Figures (7)

Figure 1: Using Neural Style Transfer (NST), we can alter a robotic motion (the Content) according to the Style of a pre-recorded human demonstration. A certain teleoperated robotic motion can be carried out in, e.g., a angry, happy, calm, or sad way.
Figure 2: Framework for the NPST3 algorithm. The Style, Content, and Generated motion trajectories are used as input for the autoencoder network. The loss obtained with the autoencoder network ($L_{st}$) is added to the resulting constraint loss ($L_{p}$, $L_{ep}$, $L_{v}$) as to obtain the overall loss $L$. The inverse (1/s) of this loss is the reward used for training the TD3 Policy network, together with the Content and Generated trajectories as input. The result of the TD3 network is a 3D Cartesian velocity vector, which is executed by the robot end effector. Finally, the Generated trajectory is updated adding the new position actuated by the robot, while the Content trajectory is updated using the positions defined by the user. The Content trajectory can be defined online via teleoperation or offline via a preplanned motion.
Figure 3: Cartesian trajectories of the selected Styles. A) anger/annoyance, B) happiness/joy, C) calm/acceptance, and D) sadness/grief. The units in the axis are in mm.
Figure 4: Cartesian motions generated with the NPST3 algorithm. The trajectory at the top A) depicts the Content motion. The transferred Styles are: B) anger/annoyance, C) happiness/joy, D) calm/acceptance, and E) sadness/grief. The style trajectories are extracted from the right hand of a human demonstrator using a Vicon optical motion capture system.
Figure 5: Wheel of emotion. Considering the center of the wheel as the origin of a standard Cartesian coordinate system growing towards "Intense" (top) and "Pleasant" (right), we can give to each emotion in the wheel a coordinate, e.g., "pleased" can be at (14, 3) and "tired" at (-15, 1). Doing so, we can calculate the average answers of the subjects and place them in the wheel (bold). Figure inspired from Foxcroft et al.Foxcroft2014 and based on the model proposed by Rusell et al.Rusell1980.
...and 2 more figures

Neural Style Transfer with Twin-Delayed DDPG for Shared Control of Robotic Manipulators

TL;DR

Abstract

Neural Style Transfer with Twin-Delayed DDPG for Shared Control of Robotic Manipulators

Authors

TL;DR

Abstract

Table of Contents

Figures (7)