Table of Contents
Fetching ...

Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion

Aditya Shankar, Yuandou Wang, Rihan Hai, Lydia Y. Chen

TL;DR

The paper addresses conditional generation of tabular data under unseen constraints by reframing diffusion in a manifold setting. It proves that the denoiser acts as an orthogonal projector onto the data manifold $\mathcal{M}_0$ and that gradients of any differentiable inference-time loss lie in the tangent space $T_{\hat{x}_0}\mathcal{M}_0$, enabling tangent-space guidance. Building on this theory, it introduces Harpoon, a training-once, inference-time adaptable algorithm that interleaves tangential gradient corrections with unconditional denoising to satisfy imputation and inequality constraints on mixed-type tabular data. Empirical results across multiple datasets show Harpoon delivers strong imputation quality, effectively enforces diverse constraints, and runs with practical efficiency, highlighting the practical benefits of manifold-aware guidance for tabular diffusion.

Abstract

Generating tabular data under conditions is critical to applications requiring precise control over the generative process. Existing methods rely on training-time strategies that do not generalise to unseen constraints during inference, and struggle to handle conditional tasks beyond tabular imputation. While manifold theory offers a principled way to guide generation, current formulations are tied to specific inference-time objectives and are limited to continuous domains. We extend manifold theory to tabular data and expand its scope to handle diverse inference-time objectives. On this foundation, we introduce HARPOON, a tabular diffusion method that guides unconstrained samples along the manifold geometry to satisfy diverse tabular conditions at inference. We validate our theoretical contributions empirically on tasks such as imputation and enforcing inequality constraints, demonstrating HARPOON'S strong performance across diverse datasets and the practical benefits of manifold-aware guidance for tabular data. Code URL: https://github.com/adis98/Harpoon

Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion

TL;DR

The paper addresses conditional generation of tabular data under unseen constraints by reframing diffusion in a manifold setting. It proves that the denoiser acts as an orthogonal projector onto the data manifold and that gradients of any differentiable inference-time loss lie in the tangent space , enabling tangent-space guidance. Building on this theory, it introduces Harpoon, a training-once, inference-time adaptable algorithm that interleaves tangential gradient corrections with unconditional denoising to satisfy imputation and inequality constraints on mixed-type tabular data. Empirical results across multiple datasets show Harpoon delivers strong imputation quality, effectively enforces diverse constraints, and runs with practical efficiency, highlighting the practical benefits of manifold-aware guidance for tabular diffusion.

Abstract

Generating tabular data under conditions is critical to applications requiring precise control over the generative process. Existing methods rely on training-time strategies that do not generalise to unseen constraints during inference, and struggle to handle conditional tasks beyond tabular imputation. While manifold theory offers a principled way to guide generation, current formulations are tied to specific inference-time objectives and are limited to continuous domains. We extend manifold theory to tabular data and expand its scope to handle diverse inference-time objectives. On this foundation, we introduce HARPOON, a tabular diffusion method that guides unconstrained samples along the manifold geometry to satisfy diverse tabular conditions at inference. We validate our theoretical contributions empirically on tasks such as imputation and enforcing inequality constraints, demonstrating HARPOON'S strong performance across diverse datasets and the practical benefits of manifold-aware guidance for tabular data. Code URL: https://github.com/adis98/Harpoon
Paper Structure (41 sections, 3 theorems, 38 equations, 5 figures, 18 tables, 2 algorithms)

This paper contains 41 sections, 3 theorems, 38 equations, 5 figures, 18 tables, 2 algorithms.

Key Result

Proposition 1

During the forward diffusion process, noisy samples $x_t$ at diffusion step $t$ probabilistically concentrate on a $(d-1)$-dimensional manifold $\mathcal{M}_t$ forming a shell around a scaled copy of the $n$-dimensional data manifold $\mathcal{M}_0$:

Figures (5)

  • Figure 1: Geometry of forward ($\uparrow$) and backward ($\uparrow$) diffusion.
  • Figure 2: (a) Shows the orthogonal behaviour of dirty estimates projected from $x_{t}$ to $\hat{x}_0$. Figs. (a) and (b) show Harpoon's guidance mechanism, interleaving unconditional denoising ($\rightarrow$) and tangential updates ($\uparrow$) for (b) imputation constraints of the form $(1-m)\odot x_0$, and (c) inequality constraints. Yellow regions indicate the areas on the manifold $\mathcal{M}_0$ matching the constraint.
  • Figure 3: Avg. angle between gradients (100 samples) and dirty estimates $\hat{x}_0$ for various loss functions on Adult.
  • Figure 4: Disconnected submanifolds under disjunctive constraints, e.g. (colour=redorcolour=blue). Point $x_t$ likely favours the region indicated by red arrow due to the proximity and larger feasible area imposed by the constraint.
  • Figure : Sampling with Harpoon

Theorems & Definitions (6)

  • Remark 1
  • Proposition 1: Shell Structure
  • Definition 1: "Dirty" Estimate via Diffusion Model
  • Theorem 3.1: Limiting behaviour of dirty estimates
  • Theorem 3.2: Manifold-constrained inference-time gradients
  • Remark 2