Table of Contents
Fetching ...

Learning Visually Interpretable Oscillator Networks for Soft Continuum Robots from Video

Henrik Krauss, Johann Licher, Naoya Takeishi, Annika Raatz, Takehisa Yairi

TL;DR

This work tackles the interpretability gap in data-driven soft continuum robot dynamics learned from video. It introduces the Attention Broadcast Decoder (ABCD), a plug-and-play autoencoder module that outputs pixel-precise attention maps per latent and decouples static background, enabling direct on-image visualization when coupled with 2D oscillator networks. By adding an attention-coupling mechanism, the approach provides physically meaningful visualization of latent dynamics (masses, stiffness, forces) on the robot image, and discovers chain-structured oscillators for multi-segment SCRs. Empirically, ABCD improves multi-step prediction accuracy and enables smooth latent-space extrapolation, while maintaining a compact, physically interpretable model suitable for control and extension to 3D or multi-camera setups.

Abstract

Data-driven learning of soft continuum robot (SCR) dynamics from high-dimensional observations offers flexibility but often lacks physical interpretability, while model-based approaches require prior knowledge and can be computationally expensive. We bridge this gap by introducing (1) the Attention Broadcast Decoder (ABCD), a plug-and-play module for autoencoder-based latent dynamics learning that generates pixel-accurate attention maps localizing each latent dimension's contribution while filtering static backgrounds. (2) By coupling these attention maps to 2D oscillator networks, we enable direct on-image visualization of learned dynamics (masses, stiffness, and forces) without prior knowledge. We validate our approach on single- and double-segment SCRs, demonstrating that ABCD-based models significantly improve multi-step prediction accuracy: 5.7x error reduction for Koopman operators and 3.5x for oscillator networks on the two-segment robot. The learned oscillator network autonomously discovers a chain structure of oscillators. Unlike standard methods, ABCD models enable smooth latent space extrapolation beyond training data. This fully data-driven approach yields compact, physically interpretable models suitable for control applications.

Learning Visually Interpretable Oscillator Networks for Soft Continuum Robots from Video

TL;DR

This work tackles the interpretability gap in data-driven soft continuum robot dynamics learned from video. It introduces the Attention Broadcast Decoder (ABCD), a plug-and-play autoencoder module that outputs pixel-precise attention maps per latent and decouples static background, enabling direct on-image visualization when coupled with 2D oscillator networks. By adding an attention-coupling mechanism, the approach provides physically meaningful visualization of latent dynamics (masses, stiffness, forces) on the robot image, and discovers chain-structured oscillators for multi-segment SCRs. Empirically, ABCD improves multi-step prediction accuracy and enables smooth latent-space extrapolation, while maintaining a compact, physically interpretable model suitable for control and extension to 3D or multi-camera setups.

Abstract

Data-driven learning of soft continuum robot (SCR) dynamics from high-dimensional observations offers flexibility but often lacks physical interpretability, while model-based approaches require prior knowledge and can be computationally expensive. We bridge this gap by introducing (1) the Attention Broadcast Decoder (ABCD), a plug-and-play module for autoencoder-based latent dynamics learning that generates pixel-accurate attention maps localizing each latent dimension's contribution while filtering static backgrounds. (2) By coupling these attention maps to 2D oscillator networks, we enable direct on-image visualization of learned dynamics (masses, stiffness, and forces) without prior knowledge. We validate our approach on single- and double-segment SCRs, demonstrating that ABCD-based models significantly improve multi-step prediction accuracy: 5.7x error reduction for Koopman operators and 3.5x for oscillator networks on the two-segment robot. The learned oscillator network autonomously discovers a chain structure of oscillators. Unlike standard methods, ABCD models enable smooth latent space extrapolation beyond training data. This fully data-driven approach yields compact, physically interpretable models suitable for control applications.

Paper Structure

This paper contains 18 sections, 14 equations, 7 figures.

Figures (7)

  • Figure 1: Main contributions of this study: (1) A newly proposed attention broadcast decoder (ABCD) is used plug-and-play for autoencoder-based Koopman and oscillator dynamics learning of a soft continuum robot. (2) Attention maps inside the ABCD are coupled to 2D oscillator dynamics so that the oscillator network can be visualized (here, masses, stiffnesses and actuation forces) on the original, reconstructed, or predicted future images.
  • Figure 2: (a) The attention broadcast decoder (ABCD) integrated plug-and-play in an autoencoder setup for image-reconstruction learning. (b) The attention processor within the ABCD that retrieves attention maps and attended latents before they are spatially broadcasted and decoded.
  • Figure 3: Attention maps for the 1-segment and 2-segment robots for Koopman and oscillator networks using the ABCD. Two attention maps are shown per image for the Koopman models, to achieve visual compactness.
  • Figure 4: On-image 2D oscillator networks for the 1-segment and 2-segment robots (left) and latent space visualization (right). A validation state (current) is shown as well as a future state, 20 steps ahead. The oscillator networks which accurately capture the SCR dynamics, are shown with their masses, stiffness, and current actuation forces. Oscillator positions are scaled 1.5 times around their shared mean position and higher resolution images are used for visual clarity.
  • Figure 5: Multi-step reconstruction error over 0.5s for 1-segment (top) and 2-segment (bottom) robots. Shaded regions indicate standard error of the mean over 50 validation trajectories. The ABCD improves multi-step reconstruction accuracy for the more complex 2-segment system for both Koopman and oscillator network dynamics.
  • ...and 2 more figures