Table of Contents
Fetching ...

Creative synthesis of kinematic mechanisms

Jiong Lin, Jialong Ning, Judah Goldfeder, Hod Lipson

TL;DR

This work reframes planar kinematic synthesis as a cross-domain image-generation problem, proposing a shared-latent VAE that jointly models curves and mechanism images to enable bidirectional synthesis and analysis. A new dataset of paired RGB images for $1$-DOF planar linkages, including complex multi-loop mechanisms like Jansen’s, supports scalable learning across simple and complex structures. Empirical results on three dataset families demonstrate that image-based representations can unify trajectory-to-mechanism and mechanism-to-trajectory generation, with ViT-based decoders achieving higher fidelity and color-augmented inputs enhancing performance. The approach lays groundwork for data-driven mechanism design and robotics, highlighting both practical potential and avenues for improvement through larger datasets and richer representations (e.g., video).

Abstract

In this paper, we formulate the problem of kinematic synthesis for planar linkages as a cross-domain image generation task. We develop a planar linkages dataset using RGB image representations, covering a range of mechanisms: from simple types such as crank-rocker and crank-slider to more complex eight-bar linkages like Jansen's mechanism. A shared-latent variational autoencoder (VAE) is employed to explore the potential of image generative models for synthesizing unseen motion curves and simulating novel kinematics. By encoding the drawing speed of trajectory points as color gradients, the same architecture also supports kinematic synthesis conditioned on both trajectory shape and velocity profiles. We validate our method on three datasets of increasing complexity: a standard four-bar linkage set, a mixed set of four-bar and crank-slider mechanisms, and a complex set including multi-loop mechanisms. Preliminary results demonstrate the effectiveness of image-based representations for generative mechanical design, showing that mechanisms with revolute and prismatic joints, and potentially cams and gears, can be represented and synthesized within a unified image generation framework.

Creative synthesis of kinematic mechanisms

TL;DR

This work reframes planar kinematic synthesis as a cross-domain image-generation problem, proposing a shared-latent VAE that jointly models curves and mechanism images to enable bidirectional synthesis and analysis. A new dataset of paired RGB images for -DOF planar linkages, including complex multi-loop mechanisms like Jansen’s, supports scalable learning across simple and complex structures. Empirical results on three dataset families demonstrate that image-based representations can unify trajectory-to-mechanism and mechanism-to-trajectory generation, with ViT-based decoders achieving higher fidelity and color-augmented inputs enhancing performance. The approach lays groundwork for data-driven mechanism design and robotics, highlighting both practical potential and avenues for improvement through larger datasets and richer representations (e.g., video).

Abstract

In this paper, we formulate the problem of kinematic synthesis for planar linkages as a cross-domain image generation task. We develop a planar linkages dataset using RGB image representations, covering a range of mechanisms: from simple types such as crank-rocker and crank-slider to more complex eight-bar linkages like Jansen's mechanism. A shared-latent variational autoencoder (VAE) is employed to explore the potential of image generative models for synthesizing unseen motion curves and simulating novel kinematics. By encoding the drawing speed of trajectory points as color gradients, the same architecture also supports kinematic synthesis conditioned on both trajectory shape and velocity profiles. We validate our method on three datasets of increasing complexity: a standard four-bar linkage set, a mixed set of four-bar and crank-slider mechanisms, and a complex set including multi-loop mechanisms. Preliminary results demonstrate the effectiveness of image-based representations for generative mechanical design, showing that mechanisms with revolute and prismatic joints, and potentially cams and gears, can be represented and synthesized within a unified image generation framework.

Paper Structure

This paper contains 8 sections, 1 equation, 6 figures, 4 tables.

Figures (6)

  • Figure 1: We propose an image based generative framework for kinematic synthesis and analysis. The model supports both synthesis and analysis using the same architecture and representation. Trajectory speed is encoded using color gradients (left), while types of links and joints are encoded using predefined colors (right).
  • Figure 2: Overview of our shared-latent VAE framework for cross-domain kinematic synthesis and analysis. Curve and mechanism images ($C$, $M$) are encoded into latent embeddings ($Z_c$, $Z_m$), which are aligned in a shared latent space. During training, both reconstruction ($\hat{C}$, $\hat{M}$) and cross-domain prediction ($\hat{C}_m$, $\hat{M}_c$) are supervised via back-propagation. At inference time, the model enables synthesis (from $C$ to $M'$) and analysis (from $M$ to $C'$) through feed-forward decoding. Following MAE he2022masked, we adopt an asymmetric ViT encoder–decoder design.
  • Figure 3: Examples of dataset construction. Left: Example mechanisms (4-bar mechanism and Jansen’s linkage) built with 2 and 5 triangle layers, with their corresponding sequences of link connections. Right: For each complexity level (T2–T5), two representative graphs from different isomorphism classes are shown, along with rendered mechanism examples from the dataset.
  • Figure 4: Qualitative results. Columns: (1) Curve to synthesized mechanism, compared with ground-truth mechanism; (2) Mechanism to calculated curve, shown beside the ground-truth curve; (3) Curve-to-mechanism-to-curve, compared with ground-truth curve. Rows list ViT-tiny and ViT-base dosovitskiy2020image models on three datasets.
  • Figure 5: Samples from the latent space, shown in a $3\times 3$ grid: from left to right: ViT-base with $\beta=10$ (curves, mechanisms) and CNN with $\beta=1$ (curves, mechanisms).
  • ...and 1 more figures