Back to artifacts

Robotics: March 2026 Week 10

Mar 2 – Mar 8, 2026 · 86 papers analyzed · 3 breakthroughs

Summary

86 unique papers analyzed (2026-03-02 to 2026-03-08). 3 breakthroughs: (1) 2603.01452 proves via Theorem 5.3 that task-scaling is asymptotically more sample-efficient than model-free approaches for multi-task humanoid control (EfficientZero-Multitask achieves SOTA on HumanoidBench); (2) 2603.03818 demonstrates that modern pretrained VLA models (Pi0, etc.) are surprisingly resistant to catastrophic forgetting in continual learning, achieving >87% SR on LIBERO-Spatial with minimal negative backward transfer; (3) 2603.05449 introduces RealWonder, the first real-time action-conditioned video generation system using physics simulation as a bridge between 3D physical actions and video prediction, achieving ~10x speedup over CogVideoX. Key trends: multi-scale memory for long-horizon tasks, hyperbolic geometry for manipulation pretraining, and bimanual dexterous grasping with synthetic data.

Key Takeaway

Week 10's standout insight: task scaling and pretrained VLAs together may dramatically simplify two hard robotics problems (sample efficiency and continual learning) — the field may be closer to practical generalist robots than the current data-centric narrative suggests.

Breakthroughs (3)

1. Scaling Tasks, Not Samples: Mastering Humanoid Control through Multi-Task Model-Based Reinforcement Learning

Why Novel: Challenges the dominant paradigm that scaling model parameters or offline datasets is the path to generalist robots. Instead, proves that scaling the number of tasks provides a structural advantage unique to model-based RL (MBRL), where dynamics invariance allows shared world model learning.

Key Innovations:

  • [object Object]
  • [object Object]
  • [object Object]

Evidence:

  • — undefined
  • — undefined
  • — undefined
  • — undefined

Impact: Establishes task scaling as a new research axis for embodied AI, potentially redirecting effort from data collection to task diversity engineering for humanoid learning systems.

2. Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning

Why Novel: Prior continual learning work in robotics focused on small BC models trained from scratch; this is the first systematic study of forgetting in large pretrained VLAs, revealing that large-scale pretraining itself is a powerful regularizer against forgetting.

Key Innovations:

  • [object Object]
  • [object Object]

Evidence:

  • — undefined
  • — undefined
  • — undefined

Impact: Simplifies deployment of robot learning systems: practitioners can sequentially finetune large pretrained VLAs without complex continual learning machinery, unlocking rapid skill accumulation.

3. RealWonder: Real-Time Physical Action-Conditioned Video Generation

Why Novel: Previous video generation conditioned on 2D controls or drag trajectories; RealWonder is first to ground generation in true 3D physical actions via physics-sim intermediates, achieving real-time throughput while maintaining physical plausibility.

Key Innovations:

  • [object Object]
  • [object Object]
  • [object Object]

Evidence:

  • — undefined
  • — undefined
  • — undefined
  • — undefined

Impact: Opens a practical path to physics-grounded world models for robotics, enabling fast sim-to-real evaluation loops and interactive AR/VR robot simulation without expensive physics rendering.

Trends

  • Task diversity as a scaling axis: Multiple papers challenge the samples-first paradigm, instead arguing that task breadth is the key lever for generalizable robot learning.

  • Memory-augmented VLAs: Growing evidence that episodic and semantic memory architectures are needed for multi-stage real-world manipulation beyond single-observation policies.

  • Pretrained VLAs simplify lifelong learning: Large-scale pretraining appears to confer forgetting resistance, potentially collapsing a complex research problem into a finetuning recipe.

  • Physics-grounded world models: Physics simulation as an intermediate representation is gaining traction for bridging 3D actions with learned visual models.

  • Non-Euclidean representation learning: Hyperbolic embeddings are emerging as a tool for hierarchical spatial perception in manipulation.

Notable Papers (6)

1. Hyperbolic Multiview Pretraining for Robotic Manipulation

HyperMVP uses hyperbolic geometry for 3D-aware visual pretraining, better capturing hierarchical structural relations among scene embeddings and improving multi-task RLBench performance over Euclidean pretraining methods.

2. UltraDexGrasp: Learning Universal Dexterous Grasping for Bimanual Robots with Synthetic Data

First unified framework for bimanual dexterous grasping that autonomously selects between single-hand, two-hand, and tool-assisted strategies using synthetic training data, with strong sim-to-real transfer.

3. MEM: Multi-Scale Embodied Memory for Vision Language Action Models

Augments VLA models with multi-scale memory (long-term semantic + short-term episodic) enabling complex multi-stage manipulation tasks that require remembering events across occlusions and recipe-style sequencing.

4. RoboCasa365: A Large-Scale Simulation Framework for Training and Benchmarking Generalist Robots

Introduces 365 everyday household manipulation tasks across 2,500 diverse kitchen configurations, providing the field's most comprehensive benchmark for evaluating generalist robot learning.

5. SeedPolicy: Horizon Scaling via Self-Evolving Diffusion Policy for Robot Manipulation

SEGA gated attention module compresses long-horizon observation histories into a fixed-size recurrent state, resolving diffusion policy performance degradation over long horizons with SOTA on 50-task benchmarks.

6. PTLD: Sim-to-real Privileged Tactile Latent Distillation for Dexterous Manipulation

Distills privileged tactile simulation signals into a visuomotor policy via latent space alignment, enabling dexterous in-hand reorientation without real-world tactile sensors.

Honorable Mentions

  • ACE-Brain-0: Spatial Intelligence as a Shared Scaffold for Universal Embodiments ()
  • CMoE: Contrastive Mixture of Experts for Motion Control and Terrain Adaptation of Humanoid Robots ()
  • Non-Markovian Long-Horizon Robot Manipulation via Keyframe Chaining ()
  • Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons ()
  • RoboMME: Benchmarking and Understanding Memory for Robotic Generalist Policies ()