Manifold Percolation: from generative model to Reinforce learning
Rui Tong
TL;DR
This work reframes generative modeling as a topology problem, introducing continuum percolation and the percolation threshold as observer-centric probes of a model’s data-support geometry. It defines the Percolation Shift, proves scaling relations linking percolation thresholds to manifold volume, and introduces a differentiable topological loss to expand and stabilize the generated support. The approach is demonstrated across diffusion, RL, and language-model settings, showing that topology-aware supervision yields synergistic improvements where fidelity and diversity reinforce each other rather than trade off. By linking geometric connectivity to learning dynamics, the paper offers a unified framework for diagnosing and mitigating implicit mode collapse, with practical implications for long-horizon robustness and policy optimization.
Abstract
Generative modeling is typically framed as learning mapping rules, but from an observer's perspective without access to these rules, the task becomes disentangling the geometric support from the probability distribution. We propose that continuum percolation is uniquely suited to this support analysis, as the sampling process effectively projects high-dimensional density estimation onto a geometric counting problem on the support. In this work, we establish a rigorous correspondence between the topological phase transitions of random geometric graphs and the underlying data manifold in high-dimensional space. By analyzing the relationship between our proposed Percolation Shift metric and FID, we show that this metric captures structural pathologies, such as implicit mode collapse, where standard statistical metrics fail. Finally, we translate this topological phenomenon into a differentiable loss function that guides training. Experimental results confirm that this approach not only prevents manifold shrinkage but also fosters a form of synergistic improvement, where topological stability becomes a prerequisite for sustained high fidelity in both static generation and sequential decision making.
