Table of Contents
Fetching ...

From Circuits to Dynamics: Understanding and Stabilizing Failure in 3D Diffusion Transformers

Maximilian Plattner, Fabian Paischer, Johannes Brandstetter, Arturs Berzins

TL;DR

The paper reveals Meltdown, a catastrophic failure in 3D diffusion transformers where tiny on-surface perturbations cause output fragmentation during sparse point-cloud surface reconstruction. It localizes the failure to a single early cross-attention activation via activation patching and shows that the spectral entropy $H$ of that activation tracks Meltdown and its rescue; it further connects this metric to a symmetry-breaking bifurcation in the reverse diffusion dynamics. A simple, test-time spectral intervention, PowerRemap, is proposed and shown to stabilize Meltdown across WaLa and Make-A-Shape on GSO and SimJEB, achieving up to $98.3\%$ rescue in some settings. The work bridges circuit-level mechanisms and diffusion-dynamics theory to provide a practical robustness tool and a mechanistic understanding for diffusion-based 3D reconstruction.

Abstract

Reliable surface completion from sparse point clouds underpins many applications spanning content creation and robotics. While 3D diffusion transformers attain state-of-the-art results on this task, we uncover that they exhibit a catastrophic mode of failure: arbitrarily small on-surface perturbations to the input point cloud can fracture the output into multiple disconnected pieces -- a phenomenon we call Meltdown. Using activation-patching from mechanistic interpretability, we localize Meltdown to a single early denoising cross-attention activation. We find that the singular-value spectrum of this activation provides a scalar proxy: its spectral entropy rises when fragmentation occurs and returns to baseline when patched. Interpreted through diffusion dynamics, we show that this proxy tracks a symmetry-breaking bifurcation of the reverse process. Guided by this insight, we introduce PowerRemap, a test-time control that stabilizes sparse point-cloud conditioning. We demonstrate that Meltdown persists across state-of-the-art architectures (WaLa, Make-a-Shape), datasets (GSO, SimJEB) and denoising strategies (DDPM, DDIM), and that PowerRemap effectively counters this failure with stabilization rates of up to 98.3%. Overall, this work is a case study on how diffusion model behavior can be understood and guided based on mechanistic analysis, linking a circuit-level cross-attention mechanism to diffusion-dynamics accounts of trajectory bifurcations.

From Circuits to Dynamics: Understanding and Stabilizing Failure in 3D Diffusion Transformers

TL;DR

The paper reveals Meltdown, a catastrophic failure in 3D diffusion transformers where tiny on-surface perturbations cause output fragmentation during sparse point-cloud surface reconstruction. It localizes the failure to a single early cross-attention activation via activation patching and shows that the spectral entropy of that activation tracks Meltdown and its rescue; it further connects this metric to a symmetry-breaking bifurcation in the reverse diffusion dynamics. A simple, test-time spectral intervention, PowerRemap, is proposed and shown to stabilize Meltdown across WaLa and Make-A-Shape on GSO and SimJEB, achieving up to rescue in some settings. The work bridges circuit-level mechanisms and diffusion-dynamics theory to provide a practical robustness tool and a mechanistic understanding for diffusion-based 3D reconstruction.

Abstract

Reliable surface completion from sparse point clouds underpins many applications spanning content creation and robotics. While 3D diffusion transformers attain state-of-the-art results on this task, we uncover that they exhibit a catastrophic mode of failure: arbitrarily small on-surface perturbations to the input point cloud can fracture the output into multiple disconnected pieces -- a phenomenon we call Meltdown. Using activation-patching from mechanistic interpretability, we localize Meltdown to a single early denoising cross-attention activation. We find that the singular-value spectrum of this activation provides a scalar proxy: its spectral entropy rises when fragmentation occurs and returns to baseline when patched. Interpreted through diffusion dynamics, we show that this proxy tracks a symmetry-breaking bifurcation of the reverse process. Guided by this insight, we introduce PowerRemap, a test-time control that stabilizes sparse point-cloud conditioning. We demonstrate that Meltdown persists across state-of-the-art architectures (WaLa, Make-a-Shape), datasets (GSO, SimJEB) and denoising strategies (DDPM, DDIM), and that PowerRemap effectively counters this failure with stabilization rates of up to 98.3%. Overall, this work is a case study on how diffusion model behavior can be understood and guided based on mechanistic analysis, linking a circuit-level cross-attention mechanism to diffusion-dynamics accounts of trajectory bifurcations.
Paper Structure (73 sections, 1 theorem, 32 equations, 21 figures, 3 tables, 2 algorithms)

This paper contains 73 sections, 1 theorem, 32 equations, 21 figures, 3 tables, 2 algorithms.

Key Result

Proposition 3.3

Let $H$ and PowerRemap be defined as above. For any $\gamma>1$, with equality iff all $\sigma_i > 0$ are equal.

Figures (21)

  • Figure 1: We investigate diffusion transformers on the task of surface reconstruction from sparse point clouds. We find that arbitrarily small on-surface perturbations to a point cloud can turn a shape into a speckle. We call this failure Meltdown and study it through mechanistic interpretability and diffusion dynamics. Based on this analysis, we propose a test-time intervention, PowerRemap, which stabilizes diffusion-based surface reconstruction under sparse conditions at test-time.
  • Figure 2: Our search in activation space finds that a single cross-attention write $\mathbf{Y}_{4,7}$ controls Meltdown.
  • Figure 3: As we move from a healthy to an unhealthy run, we observe that the baseline case shows a smooth rise in spectral entropy $H$ and a sudden jump in connectivity $C$. Patching our $\mathbf{Y}$ keeps the spectral entropy at healthy levels and preserves connectivity. This behavior is consistent across diffusion seeds.
  • Figure 4: Example results on the Google Scanned Objects dataset. We identify Meltdown behavior in the WaLa diffusion transformer for $89.9\%$ of shapes. Out of these, the PowerRemap intervention rescues $98.3\%$, producing semantically valid outputs.
  • Figure 5: In expectation over the initial noise, both the sphere and speckle shapes are produced at intermediate conditions, relaxing the sharp Meltdown behavior for a fixed initial noise.
  • ...and 16 more figures

Theorems & Definitions (4)

  • Definition 3.1: Spectral entropy
  • Definition 3.2: PowerRemap
  • Proposition 3.3: PowerRemap lowers spectral entropy
  • proof : Proof of Proposition \ref{['prop:powerremap-entropy']}