Back to artifacts

Computer Vision: January 2026 Week 2

Jan 8 – Jan 14, 2026 · 112 papers analyzed · 3 breakthroughs

Summary

Week 2 (Jan 8-14): 3 breakthroughs from 112 papers. (1) 2601.09212 (COOL-SD) provides theoretical grounding for speculative decoding in AR image gen with annealed relaxation; (2) 2601.09881 (TMD) decouples video diffusion into semantic backbone + recurrent flow head for few-step generation; (3) 2601.05722 shows video diffusion can generate high-quality 3D characters from single image. AR acceleration and video diffusion control are the themes.

Key Takeaway

AR and diffusion paradigms racing on inference speed; 3DGS becoming commodity infrastructure.

Breakthroughs (3)

1. Annealed Relaxation of Speculative Decoding for Faster Autoregressive Image Generation

Why Novel: First theoretical grounding of speculative decoding for AR image generation. Derives near-tight TV distance bounds and introduces COOL-SD with annealed relaxation.

Key Innovations:

  • Theoretical upper bound on TV distance for speculative decoding
  • Annealed relaxation schedule for acceptance probability
  • 2-3x speedup on AR image models without quality loss

Evidence:

  • — TV distance bound for speculative decoding
  • — COOL-SD algorithm with annealing
  • — Speedup vs quality tradeoff results

Impact: Makes AR image generation competitive with diffusion on inference speed.

2. Transition Matching Distillation for Fast Video Generation

Why Novel: Decouples video diffusion into semantic backbone and recurrent flow head, enabling few-step generation from multi-step teachers.

Key Innovations:

  • Two-stage distillation: semantic backbone + recurrent flow head
  • Transition matching between teacher and student trajectories
  • 4-8 step generation matching 50-step quality

Evidence:

  • — Decoupled architecture design
  • — Transition matching loss formulation
  • — FVD scores vs step count

Impact: Provides practical path to real-time video generation.

3. Rotate Your Character: Revisiting Video Diffusion Models for High-Quality 3D Character Generation

Why Novel: Shows video diffusion models can be repurposed for 3D character generation by generating consistent multi-view rotations.

Key Innovations:

  • Video diffusion generates 360° rotation of character
  • Multi-view consistency from temporal coherence
  • Single image to 3D character pipeline

Evidence:

  • — Pipeline from single image to 3D mesh
  • — View consistency metrics

Impact: Bridges video generation and 3D reconstruction.

Trends

  • AR image generation getting theoretical foundations for acceleration

  • Video diffusion distillation enabling few-step generation

  • 3DGS continuing to specialize (SLAM, face swap, indoor scenes)

  • Reward hacking mitigation extending beyond T2I to hybrid reasoning

Notable Papers (5)

1. Thinking-Based Non-Thinking: Solving Reward Hacking in Hybrid Reasoning

Derives non-thinking token cap from thinking-mode solution to prevent reward hacking.

2. GaussianSwap: Animatable Video Face Swapping with 3D Gaussian Splatting

Combines 3DGS with video face swapping for temporally consistent results.

3. TIDI-GS: Floater Suppression in 3D Gaussian Splatting

Addresses common floater artifacts in indoor 3DGS reconstruction.

4. Focal Guidance: Controllability from Semantic-Weak Layers in Video Diffusion

Unlocks control from previously ignored layers in video diffusion.

5. FeatureSLAM: Feature-enriched 3D Gaussian Splatting SLAM

Real-time SLAM with semantic features via 3DGS.

Honorable Mentions

  • ProFuse: Efficient Cross-View Context Fusion for Open-Vocabulary 3DGS ()
  • GS-DMSR: Dynamic Sensitive Multi-scale Enhancement for 3DGS ()