Table of Contents
Fetching ...

Residual RL--MPC for Robust Microrobotic Cell Pushing Under Time-Varying Flow

Yanda Yang, Sambeeta Das

Abstract

Contact-rich micromanipulation in microfluidic flow is challenging because small disturbances can break pushing contact and induce large lateral drift. We study planar cell pushing with a magnetic rolling microrobot that tracks a waypoint-sampled reference curve under time-varying Poiseuille flow. We propose a hybrid controller that augments a nominal MPC with a learned residual policy trained by SAC. The policy outputs a bounded 2D velocity correction that is contact-gated, so residual actions are applied only during robot--cell contact, preserving reliable approach behavior and stabilizing learning. All methods share the same actuation interface and speed envelope for fair comparisons. Experiments show improved robustness and tracking accuracy over pure MPC and PID under nonstationary flow, with generalization from a clover training curve to unseen circle and square trajectories. A residual-bound sweep identifies an intermediate correction limit as the best trade-off, which we use in all benchmarks.

Residual RL--MPC for Robust Microrobotic Cell Pushing Under Time-Varying Flow

Abstract

Contact-rich micromanipulation in microfluidic flow is challenging because small disturbances can break pushing contact and induce large lateral drift. We study planar cell pushing with a magnetic rolling microrobot that tracks a waypoint-sampled reference curve under time-varying Poiseuille flow. We propose a hybrid controller that augments a nominal MPC with a learned residual policy trained by SAC. The policy outputs a bounded 2D velocity correction that is contact-gated, so residual actions are applied only during robot--cell contact, preserving reliable approach behavior and stabilizing learning. All methods share the same actuation interface and speed envelope for fair comparisons. Experiments show improved robustness and tracking accuracy over pure MPC and PID under nonstationary flow, with generalization from a clover training curve to unseen circle and square trajectories. A residual-bound sweep identifies an intermediate correction limit as the best trade-off, which we use in all benchmarks.
Paper Structure (38 sections, 22 equations, 8 figures, 5 tables)

This paper contains 38 sections, 22 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Task overview and evaluation curves under time-varying background flow. A magnetic rolling microrobot pushes a cell along a waypoint-sampled reference curve while compensating for drift induced by Poiseuille flow. Snapshots illustrate contact-rich pushing and typical tracking behavior under disturbance.
  • Figure 2: Local cross-track error (CTE) definition at step $k$. A local unit tangent $\hat{\mathbf{t}}_k$ is computed from the segment $\mathbf{w}_{i_k}\!\rightarrow\!\mathbf{w}_{(i_k+1)\bmod N}$, and the corresponding unit normal is $\hat{\mathbf{n}}_k$. The signed CTE is $e_k$, whose magnitude $|e_k|$ is used as the tracking error and for large-error termination.
  • Figure 3: Contact-gated residual RL on top of MPC. The nominal MPC produces a planar velocity $\mathbf{u}^{\mathrm{mpc}}_k$. In parallel, the residual policy $\pi_\theta$ maps the observation $\mathbf{o}_k$ to a bounded action $\mathbf{a}_k\in[-1,1]^2$, which is scaled to a residual velocity $\Delta\mathbf{u}_k$ and gated by the contact indicator $\mathbb{I}_{\mathrm{ct}}(k)$. The final command is composed as $\mathbf{u}_k=\mathbf{u}^{\mathrm{mpc}}_k+\tilde{\Delta\mathbf{u}}_k$ and clipped to a shared speed envelope $v_{\max}$ before being applied to the simulator.
  • Figure 4: Poiseuille flow model used as a drift disturbance. The flow is aligned with a fixed channel axis $\hat{\mathbf{d}}$ and has a parabolic speed profile across the channel width $2R$. We randomize only the centerline speed $u_k$ over time.
  • Figure 5: Training task snapshot (clover curve) under time-varying background flow. The cell (green) is pushed by the microrobot (blue) to follow the reference curve (orange). The current waypoint is shown with a marker and reach region. The cell and robot trajectories are overlaid to illustrate tracking behavior under drift.
  • ...and 3 more figures