Table of Contents
Fetching ...

Self-Aware Markov Models for Discrete Reasoning

Gregor Kornhardt, Jannis Chemseddine, Christian Wald, Gabriele Steidl

Abstract

Standard masked discrete diffusion models face limitations in reasoning tasks due to their inability to correct their own mistakes on the masking path. Since they rely on a fixed number of denoising steps, they are unable to adjust their computation to the complexity of a given problem. To address these limitations, we introduce a method based on learning a Markov transition kernel that is trained on its own outputs. This design enables tokens to be remasked, allowing the model to correct its previous mistakes. Furthermore, we do not need a fixed time schedule but use a trained stopping criterion. This allows for adaptation of the number of function evaluations to the difficulty of the reasoning problem. Our adaptation adds two lightweight prediction heads, enabling reuse and fine-tuning of existing pretrained models. On the Sudoku-Extreme dataset we clearly outperform other flow based methods with a validity of 95%. For the Countdown-4 we only need in average of 10 steps to solve almost 96% of them correctly, while many problems can be solved already in 2 steps.

Self-Aware Markov Models for Discrete Reasoning

Abstract

Standard masked discrete diffusion models face limitations in reasoning tasks due to their inability to correct their own mistakes on the masking path. Since they rely on a fixed number of denoising steps, they are unable to adjust their computation to the complexity of a given problem. To address these limitations, we introduce a method based on learning a Markov transition kernel that is trained on its own outputs. This design enables tokens to be remasked, allowing the model to correct its previous mistakes. Furthermore, we do not need a fixed time schedule but use a trained stopping criterion. This allows for adaptation of the number of function evaluations to the difficulty of the reasoning problem. Our adaptation adds two lightweight prediction heads, enabling reuse and fine-tuning of existing pretrained models. On the Sudoku-Extreme dataset we clearly outperform other flow based methods with a validity of 95%. For the Countdown-4 we only need in average of 10 steps to solve almost 96% of them correctly, while many problems can be solved already in 2 steps.
Paper Structure (42 sections, 39 equations, 3 figures, 3 tables)

This paper contains 42 sections, 39 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Inference trajectories. DFMGRSK2024: once the sampler leaves the masking path (red, incorrect token), it fails to recover and does not converge to the target solution ($X$). Ours: trained on model-induced off-path states, the model detects the mistake and corrects it, recovering the correct final sequence (green); color intensity indicates the model’s confidence.
  • Figure 2: Inference on a $3{\times}3$ Sudoku subgrid from a complete puzzle. We show the state after $k\!=\!1,2,3$ refinement steps; the predicted progress $\tau_\theta$ is displayed above each panel. Cell shading encodes confidence (mixing score $c_\theta$); clue cells are marked with $\ast$. Correct predictions are blue and incorrect ones red.
  • Figure 3: Sudoku accuracy vs. ensemble size.

Theorems & Definitions (1)

  • Remark 2.1