Table of Contents
Fetching ...

DiffeoMorph: Learning to Morph 3D Shapes Using Differentiable Agent-Based Simulations

Seong Ho Pahng, Guoye Guan, Benjamin Fefferman, Sahand Hormoz

TL;DR

DiffeoMorph introduces an end-to-end differentiable framework for learning distributed morphogenesis rules that drive a population of agents to form target 3D shapes. It pairs an SE(3)-equivariant graph-based force model with a novel 3D Zernike moment–based shape-matching loss that is permutation-, point-count-, and rotation-invariant while remaining sensitive to chirality. The rotation alignment between predicted and target shapes is handled in a bilevel setup, with an inner optimization over a unit quaternion solved via implicit differentiation to enable efficient end-to-end training. The approach demonstrates robust morphogenesis to simple and complex geometries (ellipsoids, crescents, Stanford bunny) under minimal spatial cues and noise, highlighting potential applications in developmental biology, swarm robotics, and programmable matter.

Abstract

Biological systems can form complex three-dimensional structures through the collective behavior of identical agents -- cells that follow the same internal rules and communicate without central control. How such distributed control gives rise to precise global patterns remains a central question not only in developmental biology but also in distributed robotics, programmable matter, and multi-agent learning. Here, we introduce DiffeoMorph, an end-to-end differentiable framework for learning a morphogenesis protocol that guides a population of agents to morph into a target 3D shape. Each agent updates its position and internal state using an attention-based SE(3)-equivariant graph neural network, based on its own internal state and signals received from other agents. To train this system, we introduce a new shape-matching loss based on the 3D Zernike polynomials, which compares the predicted and target shapes as continuous spatial distributions, not as discrete point clouds, and is invariant to agent ordering, number of agents, and rigid-body transformations. To enforce full SO(3) invariance -- invariant to rotations yet sensitive to reflections, we include an alignment step that optimally rotates the predicted Zernike spectrum to match the target before computing the loss. This results in a bilevel problem, with the inner loop optimizing a unit quaternion for the best alignment and the outer loop updating the agent model. We compute gradients through the alignment step using implicit differentiation. We perform systematic benchmarking to establish the advantages of our shape-matching loss over other standard distance metrics for shape comparison tasks. We then demonstrate that DiffeoMorph can form a range of shapes -- from simple ellipsoids to complex morphologies -- using only minimal spatial cues.

DiffeoMorph: Learning to Morph 3D Shapes Using Differentiable Agent-Based Simulations

TL;DR

DiffeoMorph introduces an end-to-end differentiable framework for learning distributed morphogenesis rules that drive a population of agents to form target 3D shapes. It pairs an SE(3)-equivariant graph-based force model with a novel 3D Zernike moment–based shape-matching loss that is permutation-, point-count-, and rotation-invariant while remaining sensitive to chirality. The rotation alignment between predicted and target shapes is handled in a bilevel setup, with an inner optimization over a unit quaternion solved via implicit differentiation to enable efficient end-to-end training. The approach demonstrates robust morphogenesis to simple and complex geometries (ellipsoids, crescents, Stanford bunny) under minimal spatial cues and noise, highlighting potential applications in developmental biology, swarm robotics, and programmable matter.

Abstract

Biological systems can form complex three-dimensional structures through the collective behavior of identical agents -- cells that follow the same internal rules and communicate without central control. How such distributed control gives rise to precise global patterns remains a central question not only in developmental biology but also in distributed robotics, programmable matter, and multi-agent learning. Here, we introduce DiffeoMorph, an end-to-end differentiable framework for learning a morphogenesis protocol that guides a population of agents to morph into a target 3D shape. Each agent updates its position and internal state using an attention-based SE(3)-equivariant graph neural network, based on its own internal state and signals received from other agents. To train this system, we introduce a new shape-matching loss based on the 3D Zernike polynomials, which compares the predicted and target shapes as continuous spatial distributions, not as discrete point clouds, and is invariant to agent ordering, number of agents, and rigid-body transformations. To enforce full SO(3) invariance -- invariant to rotations yet sensitive to reflections, we include an alignment step that optimally rotates the predicted Zernike spectrum to match the target before computing the loss. This results in a bilevel problem, with the inner loop optimizing a unit quaternion for the best alignment and the outer loop updating the agent model. We compute gradients through the alignment step using implicit differentiation. We perform systematic benchmarking to establish the advantages of our shape-matching loss over other standard distance metrics for shape comparison tasks. We then demonstrate that DiffeoMorph can form a range of shapes -- from simple ellipsoids to complex morphologies -- using only minimal spatial cues.

Paper Structure

This paper contains 24 sections, 54 equations, 9 figures, 1 table, 1 algorithm.

Figures (9)

  • Figure 1: Overview of the morphogenesis model and shape optimization of DiffeoMorph. (a) Agents sense their neighbors using an attention‐based mechanism based on distances to neighbors as well as their internal states. The force model evolves the positions of agents and internal states, without direct access to positions of neighbors. The final evolved shape is compared to a target shape using their spectra given by the 3D Zernike moments. A bilevel optimization procedure aligns the spectra by learning the unit quaternion (inner optimization) and updates model parameters to minimize the shape‐matching loss (outer optimization) (b) As the outer optimization proceeds, the simulation produces a desired target shape.
  • Figure 2: Spectral alignment and shape optimization by matching 3D Zernike moments. (a) A 3D shape represented as a point cloud can be expressed as a linear combination of the 3D Zernike polynomials, composed of radial and angular parts. The expansion coefficients $\mathbf{C} {\coloneq} \{c_{n\ell m} \}$ are the Zernike moments. (b) The spectra of two shapes, given by the Zernike moments, can be aligned by solving for the unit quaternion that maximizes the spectral overlap $\mathcal{M}$. When the optimization converges at point 2, the corresponding unit quaternion $\mathbf{q}_2$ rotates, via the Wigner--D matrix $D$, the spectrum of the standing bunny in the Initial box to align with that of the toppled bunny in the Target box. Spatially rotating the point cloud using the corresponding spatial rotation matrix yields identical orientation to the target. (c) The same experiment is performed on weighted point clouds, with orange and blue indicating weight values of 1 and 2, respectively. Applying the spatial rotation corresponding to the optimized unit quaternion flips the crescent by $180^\circ$, placing the regions with weight 1 and weight 2 on the correct arms to match the target configuration. (d, e) To isolate the behavior of the shape-matching loss, we bypass the simulation step and directly optimize the input point cloud $\mathbf{X}$ in (d) and, along with its weights $\boldsymbol{\omega}$, in (e). Solving the resulting bilevel optimization problem successfully morphs both dense and sparse ellipsoidal point clouds—each with a different initial orientation—into the target shapes. Importantly, the optimized point clouds retain the global orientations of their respective initial ellipsoids. As a result, the optimization cares only about matching morphology, not absolute orientation, highlighting the rotation invariance of the loss.
  • Figure 3: Benchmarking the proposed loss. (a) Distances between the original bunny point cloud (Self) and its geometrically perturbed variants are computed using standard losses for shape comparison. For optimal transport--based losses (Earth Mover's and Gromov--Wasserstein), the distances between identical shapes (Self vs. Self) are nonzero due to the probabilistic relaxation introduced by the entropic regularization, which is required to make them differentiable. (b) Behavior of each loss summarized based on (a). Our loss is the only distance metric satisfying all desired properties. (c) Visualizations of point clouds learned through the direct shape optimization setup of Fig. \ref{['main:fig2']} using losses satisfying the three invariance properties. We compute higher-order spectra cumulatively: the bispectrum includes the power spectrum, and the trispectrum includes both the power spectrum and bispectrum. The correct head direction is recovered only when training with our loss. (d) Runtime analyses. Gromov--Wasserstein distance scales poorly with the number of points and the weakening of regularization, captured by decreasing $\epsilon$. In contrast, the runtime of the spectra-based losses is unaffected by the number of points, since the summation over points during projection step is vectorized. However, computing the trispectrum is slower than our loss because it requires enumerating valid quartets of angular degrees $(\ell_1, \ell_2, \ell_3, \ell_4)$ to capture fourth-order coupling. The spectral alignment step in our loss is faster than this enumeration process, even when vectorized, resulting in the best overall runtime performance.
  • Figure 4: Visualization of morphogenesis trajectories from trained models. (a) The ellipsoid, crescent, and bunny serve as representative shapes with successive stages of symmetry breaking. Colored regions indicate “organizer cells,” within which cells share the same gene expression pattern distinct from the rest. When introducing an additional group of organizer cells, each new group is assigned a different pattern and placed orthogonally to the preexisting group(s). Models---trained at noise magnitudes of $0.05\epsilon$ (ellipsoid and crescent) and $0.03\epsilon$ (bunny)---maintain robust morphogenesis trajectories, preserving overall geometries even under higher noise levels. (b) Shifting the organizer regions, while keeping the coordinates of cells fixed, results in a new morphogenesis whose global orientation is shifted accordingly. (c) Generalization under added noise with and without organizer shifts. Increasing noise levels leads to higher test losses, consistent with trajectories shown in (a), while shifting the organizers causes only a moderate increase across all shapes.
  • Figure 5: Visualization of the evolution of internal states during morphogenesis. (a) Expression patterns of representative genes and polarity (shown as arrows). The top row corresponds to $t\! = \!0.05$. As morphogenesis proceeds, gene expression levels develop spatial variation, jointly encoding distinct regions of the shape, while polarity vectors evolve coherently with these domains. (b) UMAP visualization of gene expression trajectories (left); gray lines connect the same agent across time points to track its evolution. Final-time embeddings (highlighted as points enclosed by black boundary lines) are clustered using the Leiden algorithm, and the resulting cluster identities are mapped onto the final spatial shape (insets). Distinct clusters mark different spatial domains.
  • ...and 4 more figures