Table of Contents
Fetching ...

Where Bits Matter in World Model Planning: A Paired Mixed-Bit Study for Efficient Spatial Reasoning

Suraj Ranganath, Anish Patnaik, Vaishak Menon

TL;DR

This work addresses how to deploy world-model planners under tight memory and latency budgets by asking whether planning effectiveness depends more on total bitwidth or on where bits are allocated between encoder and predictor. It introduces a paired mixed-bit evaluation on DINO-WM for the Wall task, comparing FP16, uniform INT{8,6,4,3}, mixed INT{8,6,4,3}, and asymmetric and layerwise variants across two budgets. The results reveal a structured three-regime landscape: 8/6-bit settings remain close to FP16, 3-bit settings collapse, and 4-bit settings are allocation-sensitive, with encoder-preserving configurations often outperforming uniform INT4; the effect persists across budgets and difficulty slices, though direction can shift in smaller samples. These findings motivate budget-aware, module-aware quantization policies to enable efficient spatial reasoning, suggesting that optimizing bit allocation directly for planning success is a promising direction for deployment under resource constraints.

Abstract

Efficient spatial reasoning requires world models that remain reliable under tight precision budgets. We study whether low-bit planning behavior is determined mostly by total bitwidth or by where bits are allocated across modules. Using DINO-WM on the Wall planning task, we run a paired-goal mixed-bit evaluation across uniform, mixed, asymmetric, and layerwise variants under two planner budgets. We observe a consistent three-regime pattern: 8-bit and 6-bit settings remain close to FP16, 3-bit settings collapse, and 4-bit settings are allocation-sensitive. In that transition region, preserving encoder precision improves planning relative to uniform quantization, and near-size asymmetric variants show the same encoder-side direction. In a later strict 22-cell replication with smaller per-cell episode count, the mixed-versus-uniform INT4 sign becomes budget-conditioned, which further highlights the sensitivity of this transition regime. These findings motivate module-aware, budget-aware quantization policies as a broader research direction for efficient spatial reasoning. Code and run artifacts are available at https://github.com/suraj-ranganath/DINO-MBQuant.

Where Bits Matter in World Model Planning: A Paired Mixed-Bit Study for Efficient Spatial Reasoning

TL;DR

This work addresses how to deploy world-model planners under tight memory and latency budgets by asking whether planning effectiveness depends more on total bitwidth or on where bits are allocated between encoder and predictor. It introduces a paired mixed-bit evaluation on DINO-WM for the Wall task, comparing FP16, uniform INT{8,6,4,3}, mixed INT{8,6,4,3}, and asymmetric and layerwise variants across two budgets. The results reveal a structured three-regime landscape: 8/6-bit settings remain close to FP16, 3-bit settings collapse, and 4-bit settings are allocation-sensitive, with encoder-preserving configurations often outperforming uniform INT4; the effect persists across budgets and difficulty slices, though direction can shift in smaller samples. These findings motivate budget-aware, module-aware quantization policies to enable efficient spatial reasoning, suggesting that optimizing bit allocation directly for planning success is a promising direction for deployment under resource constraints.

Abstract

Efficient spatial reasoning requires world models that remain reliable under tight precision budgets. We study whether low-bit planning behavior is determined mostly by total bitwidth or by where bits are allocated across modules. Using DINO-WM on the Wall planning task, we run a paired-goal mixed-bit evaluation across uniform, mixed, asymmetric, and layerwise variants under two planner budgets. We observe a consistent three-regime pattern: 8-bit and 6-bit settings remain close to FP16, 3-bit settings collapse, and 4-bit settings are allocation-sensitive. In that transition region, preserving encoder precision improves planning relative to uniform quantization, and near-size asymmetric variants show the same encoder-side direction. In a later strict 22-cell replication with smaller per-cell episode count, the mixed-versus-uniform INT4 sign becomes budget-conditioned, which further highlights the sensitivity of this transition regime. These findings motivate module-aware, budget-aware quantization policies as a broader research direction for efficient spatial reasoning. Code and run artifacts are available at https://github.com/suraj-ranganath/DINO-MBQuant.
Paper Structure (27 sections, 8 figures, 5 tables)

This paper contains 27 sections, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Success--size Pareto frontier for the paired mixed-bit study (budgets bA and bB). Each point is one variant, with vertical bars showing run-level 95% confidence intervals over seeds. Stars denote non-dominated Pareto points (higher success, lower model size). The frontier shows a stable 8/6-bit region, an allocation-sensitive 4-bit transition, and a collapsed 3-bit region.
  • Figure 2: Budget robustness view. Uniform INT3 remains collapsed; mixed INT4 remains above uniform INT4 at both budgets.
  • Figure 3: Encoder-retention sweep (INT4 predictor). The highest mean success occurs when encoder precision is fully preserved.
  • Figure 4: Difficulty-conditioned success at bA (paired episodes). Mixed INT4 is higher in most plotted bins.
  • Figure 5: Mechanistic scatter: larger visual-embedding divergence is associated with lower planning success.
  • ...and 3 more figures