Back to artifacts

Robotics: January 2026 Week 2

Jan 8 – Jan 14, 2026 · 68 papers analyzed · 3 breakthroughs

Summary

Week 2 (Jan 8-14): 3 breakthroughs from 68 papers. (1) 2601.06748 introduces TT-VLA for test-time RL adaptation of VLA models without retraining; (2) 2601.06286 (Walk the PLANC) enables agile humanoid locomotion on constrained footholds via CLF-based rewards; (3) 2601.07821 (FARL) provides failure-aware RL with world-model safety critic and self-recovery. VLA architectures and humanoid locomotion dominate this week.

Key Takeaway

VLA adaptation and humanoid agility are converging with world-model-based safety for real-world deployment.

Breakthroughs (3)

1. On-the-Fly VLA Adaptation via Test-Time Reinforcement Learning

Why Novel: First method enabling online adaptation of Vision-Language-Action models during deployment without retraining. Uses dense progress-based reward and value-free PPO for test-time updates.

Key Innovations:

  • Progress-based dense reward derived from task completion signals
  • Value-free PPO enabling stable online updates
  • No access to training data or pre-training required during adaptation

Evidence:

  • — TT-VLA framework with progress-based reward formulation
  • — Value-free PPO derivation for test-time updates
  • — Adaptation results across manipulation benchmarks

Impact: Enables VLA deployment adaptation in-the-wild without costly retraining, critical for real-world generalization.

2. Walk the PLANC: Physics-Guided RL for Agile Humanoid Locomotion on Constrained Footholds

Why Novel: Marries reduced-order stepping planner with RL through CLF-based rewards, generating dynamically consistent references for constrained foothold locomotion.

Key Innovations:

  • LIP-based foothold optimization provides CoM and timing references
  • CLF-based rewards guide RL training toward physically consistent motion
  • Handles stepping stones, beams, and planks with agile transitions

Evidence:

  • — LIP-based parallel foothold planner
  • — CLF reward formulation
  • — Real-world stepping stone traversal

Impact: Bridges the gap between model-based planning and learned policies for precise foothold locomotion.

3. Failure-Aware RL: Reliable Offline-to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation

Why Novel: First framework combining world-model-based safety critic with learned recovery policy for intervention-free real-world RL fine-tuning.

Key Innovations:

  • World-model safety critic predicts intervention likelihood
  • Recovery policy trained offline for self-correction
  • Couples offline pre-training with safe online adaptation

Evidence:

  • — FARL framework with safety critic formulation
  • — Recovery policy training procedure
  • — Real-world manipulation experiments with recovery rate

Impact: Enables autonomous real-world RL without human intervention, addressing a key barrier to deployment.

Trends

  • VLA architectures continue rapid iteration with focus on deployment adaptation and long-horizon reasoning

  • Humanoid locomotion papers emphasize constrained footholds and perceptive control

  • Increasing integration of world models with policy learning for safety and planning

Notable Papers (5)

1. ActiveVLA: Injecting Active Perception into Vision-Language-Action Models

Coarse-to-fine perception loop with active viewpoint selection for precise 3D manipulation.

2. UniBiDex: A Unified Teleoperation Framework for Robotic Bimanual Dexterous Manipulation

Unified VR and leader-follower teleoperation with null-space coordination for dual-arm dexterity.

3. Generate, Transfer, Adapt: Learning Functional Dexterous Grasping from a Single Human Demonstration

CorDex generates diverse grasps from single demo via correspondence-based data engine.

4. PALM: Progress-Aware Policy Learning via Affordance Reasoning

Four structured affordance cues (Global, Local, Spatial, Dynamic) for long-horizon manipulation.

5. Hiking in the Wild: A Scalable Perceptive Parkour Framework for Humanoids

End-to-end policy with MoE backbone for humanoid hiking in unstructured terrain.

Honorable Mentions

  • SceneFoundry: Generating Interactive Infinite 3D Worlds ()
  • Teaching Robots Like Dogs: Learning Agile Navigation from Human Social Cues ()