Table of Contents
Fetching ...

Encoding Predictability and Legibility for Style-Conditioned Diffusion Policy

Adrien Jacquet Crétides, Mouad Abrini, Hamed Rahimi, Mohamed Chetouani

Abstract

Striking a balance between efficiency and transparent motion is a core challenge in human-robot collaboration, as highly expressive movements often incur unnecessary time and energy costs. In collaborative environments, legibility allows a human observer a better understanding of the robot's actions, increasing safety and trust. However, these behaviors result in sub-optimal and exaggerated trajectories that are redundant in low-ambiguity scenarios where the robot's goal is already obvious. To address this trade-off, we propose Style-Conditioned Diffusion Policy (SCDP), a modular framework that constrains the trajectory generation of a pre-trained diffusion model toward either legibility or efficiency based on the environment's configuration. Our method utilizes a post-training pipeline that freezes the base policy and trains a lightweight scene encoder and conditioning predictor to modulate the diffusion process. At inference time, an ambiguity detection module activates the appropriate conditioning, prioritizing expressive motion only for ambiguous goals and reverting to efficient paths otherwise. We evaluate SCDP on manipulation and navigation tasks, and results show that it enhances legibility in ambiguous settings while preserving optimal efficiency when legibility is unnecessary, all without retraining the base policy.

Encoding Predictability and Legibility for Style-Conditioned Diffusion Policy

Abstract

Striking a balance between efficiency and transparent motion is a core challenge in human-robot collaboration, as highly expressive movements often incur unnecessary time and energy costs. In collaborative environments, legibility allows a human observer a better understanding of the robot's actions, increasing safety and trust. However, these behaviors result in sub-optimal and exaggerated trajectories that are redundant in low-ambiguity scenarios where the robot's goal is already obvious. To address this trade-off, we propose Style-Conditioned Diffusion Policy (SCDP), a modular framework that constrains the trajectory generation of a pre-trained diffusion model toward either legibility or efficiency based on the environment's configuration. Our method utilizes a post-training pipeline that freezes the base policy and trains a lightweight scene encoder and conditioning predictor to modulate the diffusion process. At inference time, an ambiguity detection module activates the appropriate conditioning, prioritizing expressive motion only for ambiguous goals and reverting to efficient paths otherwise. We evaluate SCDP on manipulation and navigation tasks, and results show that it enhances legibility in ambiguous settings while preserving optimal efficiency when legibility is unnecessary, all without retraining the base policy.
Paper Structure (29 sections, 11 equations, 5 figures, 2 tables)

This paper contains 29 sections, 11 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Style-Conditioned Diffusion Policy is an offline imitation learning framework that allows for motion conditioning depending on the environment’s context. In ambiguous scenes (top), SCDP produces intent-expressive motion to resolve goal ambiguity. When ambiguity is low (bottom), it prioritizes task efficiency, avoiding sub-optimal and exaggerated trajectories.
  • Figure 2: (a) The predictor module is integrated via a post-training pipeline where the base Diffusion Policy weights remain frozen. By training the lightweight MLP on a subset of expressive demonstrations, the module learns to specifically compensate for the residuals between the style-specific trajectories and the general paths the base model was originally trained to reproduce.(b) The conditioning from the predictor is only applied to the bottleneck of the diffusion U-Net using FiLM to denoise $X$ at each timestep $t$.
  • Figure 3: (a) The environment state is passed through the ambiguity detection module to determine if the scene is spatially ambiguous and decide which conditioning should be used. (b) Visualization of the ellipse of ambiguity used for the scene's classification. The scene is labeled as spatially ambiguous when $g^-$ falls inside the elliptical boundary.
  • Figure 4: Visualization of SCDP and baselines' inferences in ambiguous (top) and non-ambiguous (bottom) scene configurations for the navigation task. While Diffusion Policy captures the entire data distribution and Legibility Diffuser collapses on the most legible mode, SCDP constrains its generation depending on the scene configuration.
  • Figure 5: Real-world deployment of SCDP on a Franka Emika Panda robot for the Block Reach task. The image sequence (left to right) illustrates the model successfully generating an exaggerated, intent-expressive trajectory toward the target blue object to resolve spatial ambiguity relative to the distractor pink object.