Table of Contents
Fetching ...

ReSteer: Quantifying and Refining the Steerability of Multitask Robot Policies

Zhenyang Chen, Alan Tian, Liquan Wang, Benjamin Joffe, Yingyan Celine Lin, Yuxiao Chen, Siddharth Karamcheti, Danfei Xu

Abstract

Despite strong multi-task pretraining, existing policies often exhibit poor task steerability. For example, a robot may fail to respond to a new instruction ``put the bowl in the sink" when moving towards the oven, executing ``close the oven", even though it can complete both tasks when executed separately. We propose ReSteer, a framework to quantify and improve task steerability in multitask robot policies. We conduct an exhaustive evaluation of state-of-the-art policies, revealing a common lack of steerability. We find that steerability is associated with limited overlap among training task trajectory distributions, and introduce a proxy metric to measure this overlap from policy behavior. Building on this insight, ReSteer improves steerability via three components: (i) a steerability estimator that identifies low-steerability states without full-rollout evaluation, (ii) a steerable data generator that synthesizes motion segments from these states, and (iii) a self-refinement pipeline that improves policy steerability using the generated data. In simulation on LIBERO, ReSteer improves steerability by 11\% over 18k rollouts. In real-world experiments, we show that improved steerability is critical for interactive use, enabling users to instruct robots to perform any task at any time. We hope this work motivates further study on quantifying steerability and data collection strategies for large robot policies.

ReSteer: Quantifying and Refining the Steerability of Multitask Robot Policies

Abstract

Despite strong multi-task pretraining, existing policies often exhibit poor task steerability. For example, a robot may fail to respond to a new instruction ``put the bowl in the sink" when moving towards the oven, executing ``close the oven", even though it can complete both tasks when executed separately. We propose ReSteer, a framework to quantify and improve task steerability in multitask robot policies. We conduct an exhaustive evaluation of state-of-the-art policies, revealing a common lack of steerability. We find that steerability is associated with limited overlap among training task trajectory distributions, and introduce a proxy metric to measure this overlap from policy behavior. Building on this insight, ReSteer improves steerability via three components: (i) a steerability estimator that identifies low-steerability states without full-rollout evaluation, (ii) a steerable data generator that synthesizes motion segments from these states, and (iii) a self-refinement pipeline that improves policy steerability using the generated data. In simulation on LIBERO, ReSteer improves steerability by 11\% over 18k rollouts. In real-world experiments, we show that improved steerability is critical for interactive use, enabling users to instruct robots to perform any task at any time. We hope this work motivates further study on quantifying steerability and data collection strategies for large robot policies.
Paper Structure (37 sections, 36 equations, 16 figures, 4 tables, 1 algorithm)

This paper contains 37 sections, 36 equations, 16 figures, 4 tables, 1 algorithm.

Figures (16)

  • Figure 1: Daily manipulation is diverse and time-critical, and often requires interruptible behavior, with users revising their intent after execution begins. This demands interactive, steerable policies that can switch behaviors from any intermediate state in response to a new task prompt. We propose ReSteer, a framework to quantify and improve the steerability of multitask robot policies.
  • Figure 2: Steerability evaluation. We run an original task $i$, then at sampled timesteps $t$ we switch the language prompt to a target task $j$ and measure whether the policy completes $j$ from that state. Repeating this over all sampled states and all target tasks yields a steerability matrix (time$\times$target task), whose average success rate gives the overall steerability score in task $i$.
  • Figure 3: We propose an online learning framework to improve the steerability of multitask policies. The framework comprises three components. First, we propose a CMI-based state sampling strategy that prioritizes data collection at the most unsteerable states, improving the sample efficiency of data generation. Second, we introduce a stage-aware steering data generation pipeline that synthesizes cross-task steering motions ($s^t_{task1} \rightarrow s^t_{task2}$), thereby expanding the steerable states, $\mathcal{S}^{\mathrm{steer}}_{i\leftrightarrow j}$. Third, we develop a self-refining behavior cloning (SRBC) scheme that finetunes the policy using successful steered trajectories given different instructions $l$ and consequently increases steerability coverage ratio $\mathrm{SCR}_{i \leftrightarrow j}(\pi)$.
  • Figure 4: Illustration of the steerability improvement afforded by ReSteer. Left: the bidirectionally steerable set $\mathcal{S}^{\mathrm{steer}}_{i\leftrightarrow j}$ (green) occupies only a small subset of the policy-induced feasible states $\mathcal{S}_i$ and $\mathcal{S}_j$. Middle: stage-aware steering data generation ($\mathcal{S}^{\text{gen}}_i$ and $\mathcal{S}^{\text{gen}}_j$) introduces instruction-contrasting transitions, expanding the steerable region. Right: self-refining behavior cloning (SRBC) further enlarges steerability within the same feasible state space.
  • Figure 5: Evaluating Steerability on LIBERO-Goal. For each source task (x-axis), we sample intermediate execution states and measure the average success rate of switching to each of the other nine target tasks under the corresponding language prompt (bars). ReSteer achieves the highest steerability across all ten tasks and consistently outperforms strong baselines such as CAST glossop2025castcounterfactuallabelsimprove, demonstrating the effectiveness of our two-stage data generation pipeline.
  • ...and 11 more figures