Table of Contents
Fetching ...

Physical Embodiment Enables Information Processing Beyond Explicit Sensing in Active Matter

Diptabrata Paul, Nikola Milosevic, Nico Scherf, Frank Cichos

TL;DR

This study investigates how physical embodiment enables information processing in active matter systems without explicit sensing. Using online reinforcement learning to control self-thermophoretic microswimmers, the authors show that embodied dynamics induce information about hidden hydrodynamic perturbations, allowing navigation in both inert and flow-perturbed environments. In inert flows, the learned policy converges to a simple radial strategy, while in perturbations the agent adopts counteractive, vortex-like control that opposes the local flow and generalizes to reversed and time-varying flows. The work demonstrates morphological computation as a practical route to autonomous microscale navigation and highlights potential applications in autonomous microrobotics and bio-inspired computation where conventional sensing is challenging or impossible.

Abstract

Living microorganisms have evolved dedicated sensory machinery to detect environmental perturbations, processing these signals through biochemical networks to guide behavior. Replicating such capabilities in synthetic active matter remains a fundamental challenge. Here, we demonstrate that synthetic active particles can adapt to hidden hydrodynamic perturbations through physical embodiment alone, without explicit sensing mechanisms. Using reinforcement learning to control self-thermophoretic particles, we show that they learn navigation strategies to counteract unobserved flow fields by exploiting information encoded in their physical dynamics. Remarkably, particles successfully navigate perturbations that are not included in their state inputs, revealing that embodied dynamics can serve as an implicit sensing mechanism. This discovery establishes physical embodiment as a computational resource for information processing in active matter, with implications for autonomous microrobotic systems and bio-inspired computation.

Physical Embodiment Enables Information Processing Beyond Explicit Sensing in Active Matter

TL;DR

This study investigates how physical embodiment enables information processing in active matter systems without explicit sensing. Using online reinforcement learning to control self-thermophoretic microswimmers, the authors show that embodied dynamics induce information about hidden hydrodynamic perturbations, allowing navigation in both inert and flow-perturbed environments. In inert flows, the learned policy converges to a simple radial strategy, while in perturbations the agent adopts counteractive, vortex-like control that opposes the local flow and generalizes to reversed and time-varying flows. The work demonstrates morphological computation as a practical route to autonomous microscale navigation and highlights potential applications in autonomous microrobotics and bio-inspired computation where conventional sensing is challenging or impossible.

Abstract

Living microorganisms have evolved dedicated sensory machinery to detect environmental perturbations, processing these signals through biochemical networks to guide behavior. Replicating such capabilities in synthetic active matter remains a fundamental challenge. Here, we demonstrate that synthetic active particles can adapt to hidden hydrodynamic perturbations through physical embodiment alone, without explicit sensing mechanisms. Using reinforcement learning to control self-thermophoretic particles, we show that they learn navigation strategies to counteract unobserved flow fields by exploiting information encoded in their physical dynamics. Remarkably, particles successfully navigate perturbations that are not included in their state inputs, revealing that embodied dynamics can serve as an implicit sensing mechanism. This discovery establishes physical embodiment as a computational resource for information processing in active matter, with implications for autonomous microrobotic systems and bio-inspired computation.

Paper Structure

This paper contains 10 sections, 3 equations, 4 figures.

Figures (4)

  • Figure 1: Experimental realization of reinforcement learning with microswimmer.A. The study setup consists of a dilute solution of light activated microswimmer agents in a physically perturbed environment. Perturbations arise from periodic heating of a thin ($50\nm$) Au film with a focused heating laser, generating hydrodynamic flow fields (black streamlines) that the agents must navigate to reach the target. B. The control of the microswimmer is enabled by self-thermophoretic motion of an AuNP coated melamine formaldehyde (AuMF) particles ($R = 1.09\micron$) by asymmetric laser illumination ($\delta$ displaced from center). C. The agent is empowered with five discrete control actions corresponding to heating the particle at specific positions on its circumference (${a_{\uparrow}, a_{\downarrow}, a_{\leftarrow}, a_{\rightarrow}}$) or no heating ($a_{\circ}$). D. The displacement vector in $x$-$y$ plane for one of the actions for an incident laser power $\text{P}_0 = 0.15\mW$ with a mean displacement of $0.92\micron$ shown by the vertical dashed black line. The corresponding angular distribution of the displacement vectors is fitted to a normal distribution with $\theta_{\text{FWHM}} = 100.62^{\circ}$ indicated by the solid black lines. E. The environmental perturbation is generated due to periodic heating of a thin Au film is characterized by a clockwise hydrodynamic flow field ($\vec{u}$) comprising thermo-osmotic and thermo-viscous effects at frequency $f = 2.5\kHz$ and heating power $\text{P} = 1.30\mW$ traced with $250\nm$ AuNPs.
  • Figure 2: Learning in an inert environment.A. Trajectories of the microswimmer agent at average speed of $v = 6.1\, \mathrm{\mu m/s}$ show that progressive training leads to more deterministic motion towards the target position, indicated by the green circle. B. The corresponding policy, characterized by Shannon entropy ($\Delta H$) and evaluated relative to the initial state of the agent, indicates evolution to a more deterministic policy. The corresponding expected velocity field $\langle \vec{v}_{\text{A}}\rangle_{\text{inert}}$ evolves towards a radial field, represented by the streamlines. C. The length of episodes decreases with progressive training episodes and fitted to an exponential decaying function ($\propto \exp(-t/\tau_c)$, $\tau_c = 8.96$ episodes ($\approx 4000$ steps) being the characteristic convergence time. The solid line shows moving mean, the dashed line the corresponding fit and the shaded region indicates the corresponding standard deviation. D. The path efficiency calculated from the policy increases with progressive training, with the highest value being $\eta_d = 0.76 \pm 0.11$ at the end of the training for 150 episodes. The gray vertical dashed-lines in C and D marks the episode of policy indicated in B.
  • Figure 3: Learning in flow-perturbed environment.A. With progressive training, the agent’s trajectories in the flow-perturbed environment (shows as the background) evolve from circling around the disturbed region to successfully navigation to reach the target. B. (top) The policy after $60$ training episodes is characterized by the entropy ($\Delta H$) and evaluated with respect to the initial state of the agent. The corresponding expected velocity field $\langle \vec{v}_{\text{A}}\rangle_{\text{flow}}$ is represented by the black streamlines. (bottom) Net velocity field ($\vec{v}_{\text{net}}$) of the microswimmer agent, computed by adding the $\langle \vec{v}_{\text{A}}\rangle_{\text{flow}}$ with the experimentally measured hydrodynamic flow-field ($\vec{u}$), $\vec{v}_{\text{net}} = \langle \vec{v}_{\text{A}}\rangle_{\text{flow}} + \vec{u}$. The resulting pattern exhibits vortex-like structure centered near the target position (green circle). C.The policies in the inert and flow-perturbed environments are characterized by the relative angle between $\langle \vec{v}_{\text{A}}\rangle$ and $\vec{u}$. (top) The resulting histogram for the inert environment yields a mean relative angle $0.48 \pi$ for the inert-environment policy, corresponding to a radially inward policy. (bottom) In the flow-perturbed case, the mean angle shifts to $0.65 \pi$, reflecting a counteractive motion against the flow-perturbed region. D. The net velocity $\vec{v}_{\text{net}}$ in both the inert and flow-perturbed environments is analyzed by decomposing it into tangential ($\vec{v}_{\phi}$) and radial ($\vec{v}_r$) component, with the target located at the origin. The histogram of the $|\vec{v}_{\phi}|$ reveals a lower magnitude for the flow-perturbed policy ($|\vec{v}_{\phi}|_{\text{mean}}^{\text{flow}}\approx 15.70\micron$) compared to the inert-environment policy ($|\vec{v}_{\phi}|_{\text{mean}}^{\text{inert}}\approx 17.67\micron$). The lower value for the flow-perturbed environment policy is attributed to the counteractive response of trained agent. E. The flow-perturbed policy quantified by the mean relative angle between $\langle \vec{v}_{\text{A}} \rangle$ and $\vec{u}$ extracted from corresponding histogram increases from $0.49 \pi$ when $|\vec{u}|\approx|\vec{v}|$ to $0.65\pi$ when $|\vec{u}|\approx4|\vec{v}|$, indicating higher counteractive response to stronger perturbation.
  • Figure 4: Learned response to counter-clockwise flow and dynamic flow field.A. (top) Policy of an agent trained under the perturbation of counter-clockwise flow-field for $60$ episodes is characterized by the entropy difference ($\Delta H$) from the initial state. The corresponding expected velocity field $\langle \vec{v}_{\text{A}}\rangle_{\text{flow}}$ is represented by the black streamlines. (bottom) The resulting net velocity profile ($\vec{v}_{\text{net}} = \langle \vec{v}_{\text{A}}\rangle_{\text{flow}} + \vec{u}$), results in a vortex-like pattern covering around the target position (green circle), indicating successful navigation. B. An agent trained in a dynamic flow-perturbed environment learns an effective policy adapted to time-varying perturbations. Time-series snapshots show trajectories from agents starting at (top) the upper-left and (bottom) the lower-right positions (indicated by the red circle), successfully navigating to the target.