Table of Contents
Fetching ...

Development and Evaluation of a Learning-based Model for Real-time Haptic Texture Rendering

Negin Heravi, Heather Culbertson, Allison M. Okamura, Jeannette Bohg

TL;DR

A deep learning-based action-conditional model for haptic texture rendering that uses data from a vision-based tactile sensor (GelSight) to render the appropriate surface conditioned on the user's action in real-time and is capable of rendering previously unseen textures using a single GelSight image of their surface.

Abstract

Current Virtual Reality (VR) environments lack the rich haptic signals that humans experience during real-life interactions, such as the sensation of texture during lateral movement on a surface. Adding realistic haptic textures to VR environments requires a model that generalizes to variations of a user's interaction and to the wide variety of existing textures in the world. Current methodologies for haptic texture rendering exist, but they usually develop one model per texture, resulting in low scalability. We present a deep learning-based action-conditional model for haptic texture rendering and evaluate its perceptual performance in rendering realistic texture vibrations through a multi part human user study. This model is unified over all materials and uses data from a vision-based tactile sensor (GelSight) to render the appropriate surface conditioned on the user's action in real time. For rendering texture, we use a high-bandwidth vibrotactile transducer attached to a 3D Systems Touch device. The result of our user study shows that our learning-based method creates high-frequency texture renderings with comparable or better quality than state-of-the-art methods without the need for learning a separate model per texture. Furthermore, we show that the method is capable of rendering previously unseen textures using a single GelSight image of their surface.

Development and Evaluation of a Learning-based Model for Real-time Haptic Texture Rendering

TL;DR

A deep learning-based action-conditional model for haptic texture rendering that uses data from a vision-based tactile sensor (GelSight) to render the appropriate surface conditioned on the user's action in real-time and is capable of rendering previously unseen textures using a single GelSight image of their surface.

Abstract

Current Virtual Reality (VR) environments lack the rich haptic signals that humans experience during real-life interactions, such as the sensation of texture during lateral movement on a surface. Adding realistic haptic textures to VR environments requires a model that generalizes to variations of a user's interaction and to the wide variety of existing textures in the world. Current methodologies for haptic texture rendering exist, but they usually develop one model per texture, resulting in low scalability. We present a deep learning-based action-conditional model for haptic texture rendering and evaluate its perceptual performance in rendering realistic texture vibrations through a multi part human user study. This model is unified over all materials and uses data from a vision-based tactile sensor (GelSight) to render the appropriate surface conditioned on the user's action in real time. For rendering texture, we use a high-bandwidth vibrotactile transducer attached to a 3D Systems Touch device. The result of our user study shows that our learning-based method creates high-frequency texture renderings with comparable or better quality than state-of-the-art methods without the need for learning a separate model per texture. Furthermore, we show that the method is capable of rendering previously unseen textures using a single GelSight image of their surface.
Paper Structure (20 sections, 2 equations, 8 figures, 2 tables)

This paper contains 20 sections, 2 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: High-level overview of our learning-based real-time texture generation model. First, a generative action-conditional model takes as input a GelSight image of the material as well as the user's force and planar velocity as input. An image encoder followed by a texture encoder processes the texture information from the GelSight image into a texture representation vector while a force, speed, and action encoder processe the user's force and velocity information into an action representation. The acceleration predictor module then outputs the magnitude of the spectral content of the the generated acceleration induced on this material due to this action. The model is trained in two stages. We first train the image encoder followed by a classification module for the proxy task of texture classification, and then use this encoder's frozen weights in training of the rest of the modules. During inference, we use the predicted DFT magnitude of acceleration from our generative model to construct the temporally rendered acceleration signal using the Single Pass Spectrogram Inversion (SPSISPSI) algorithm.
  • Figure 2: Comparison of our generated acceleration signal and the ground truth recordings for a few materials in the HaTT dataset. NN is the predicted generation of our neural network-based model while Real is the ground truth acceleration readings.
  • Figure 3: Experimental setup. Participants sat at a table in front of a monitor that displayed the virtual environment and held a 3D systems Touch stylus augmented with a Haptuator behind a divider to their right. The divider blocked the participant's view of their hands and the real textures. Using this setup participants interacted with virtual (on left side on the Touch device here) and/or real (on right side of the Touch device here) surfaces and answered questions shown on the screen using a keyboard. To mask the sound of the device and the environment participants wore headphones playing white noise.
  • Figure 4: RGB and GelSight images of the 12 textures from the Penn Haptic Texture ToolkitCulbertson2014OneHD used and modeled in our user study. RGB images from Culbertson2014OneHD.
  • Figure 5: Similarity rating comparisons between our proposed method (NN), piece-wise Auto Regressive baseline (AR), and the real materials (Real). The box shows the range from the first to the third quartile of the ratings across all participants. The red lines indicate the medians. The whiskers extend the box by 1.5 of the inter-quartile range (IQR) as defined by Tukey's definition of boxplots. Our method shows similarity scores close to that of the piece-wise AR baseline during rendering. We make this observation based on the comparison of Real vs. NN and Real vs. AR scores. Using a two-sided Wilcoxon test, we cannot conclude that statistically significantly difference exists between our model and the baseline AR model. The virtual textures are rated more similar to each other than the real material as expected. This is due to the fact that the virtual models only model acceleration and do not capture other aspects of the real texture. Using a Bonferroni adjusted significance level of 0.002, we found the distributions of similarity ratings of Real vs. NN to be statistically significantly different than those of AR vs. NN for all materials except Rough Plastic, Wax paper, and Whiteboard.
  • ...and 3 more figures