Table of Contents
Fetching ...

Accelerating Black Hole Image Generation via Latent Space Diffusion Models

Ao Liu, Xudong Zhang, Cuihong Wen, Wentao Liu, Jieci Wang

TL;DR

The paper tackles the computational bottleneck in horizon-scale black hole imaging posed by General Relativistic Ray Tracing, introducing a latent-space diffusion approach trained on PCA-compressed GRRT data. By projecting 65,536-dimensional images into a 256-dimensional latent space and conditioning a diffusion model on physical parameters, the method achieves real-time-like image generation with a fixed PCA decoder, while maintaining physical fidelity. The core innovations include integrating a self-attention mechanism into the parameter-conditioning pathway and operating diffusion entirely in the latent space, resulting in substantial improvements in both image quality and parameter estimation over pixel-space baselines. This approach enables rapid parameter exploration and robust inference for next-generation black hole imaging, with potential extensions to other accretion-flow models and polarization information.

Abstract

Interpreting horizon-scale black hole images currently relies on computationally intensive General Relativistic Ray Tracing (GRRT) simulations, which pose a significant bottleneck for rapid parameter exploration and high-precision tests of strong-field gravity. We demonstrate that physically accurate black hole images, synthesized from magnetized accretion flows, inherently reside on a low-dimensional manifold-encoding the essential features of spacetime geometry, plasma distribution, and relativistic emission. Leveraging this structure, we introduce a physics-conditioned diffusion model that operates in a compact latent space to generate high-fidelity black hole imagery directly from physical parameters. The model accurately reproduces critical observational signatures from full GRRT simulations-such as shadow diameter, photon-ring structure, and relativistic brightness asymmetry-while achieving over a fourfold reduction in computational expense. Compared with the previous generation of denoising diffusion models, the proposed approach achieves significant improvements in image quality, reconstruction fidelity, and parameter estimation accuracy, while reducing the average inference time per black hole image from 5.25 seconds to 1.15 seconds. Our work establishes diffusion-based latent models as efficient and scalable substitutes for traditional radiative transfer solvers, offering a practical framework toward real-time modeling and inference for next-generation black hole imaging.

Accelerating Black Hole Image Generation via Latent Space Diffusion Models

TL;DR

The paper tackles the computational bottleneck in horizon-scale black hole imaging posed by General Relativistic Ray Tracing, introducing a latent-space diffusion approach trained on PCA-compressed GRRT data. By projecting 65,536-dimensional images into a 256-dimensional latent space and conditioning a diffusion model on physical parameters, the method achieves real-time-like image generation with a fixed PCA decoder, while maintaining physical fidelity. The core innovations include integrating a self-attention mechanism into the parameter-conditioning pathway and operating diffusion entirely in the latent space, resulting in substantial improvements in both image quality and parameter estimation over pixel-space baselines. This approach enables rapid parameter exploration and robust inference for next-generation black hole imaging, with potential extensions to other accretion-flow models and polarization information.

Abstract

Interpreting horizon-scale black hole images currently relies on computationally intensive General Relativistic Ray Tracing (GRRT) simulations, which pose a significant bottleneck for rapid parameter exploration and high-precision tests of strong-field gravity. We demonstrate that physically accurate black hole images, synthesized from magnetized accretion flows, inherently reside on a low-dimensional manifold-encoding the essential features of spacetime geometry, plasma distribution, and relativistic emission. Leveraging this structure, we introduce a physics-conditioned diffusion model that operates in a compact latent space to generate high-fidelity black hole imagery directly from physical parameters. The model accurately reproduces critical observational signatures from full GRRT simulations-such as shadow diameter, photon-ring structure, and relativistic brightness asymmetry-while achieving over a fourfold reduction in computational expense. Compared with the previous generation of denoising diffusion models, the proposed approach achieves significant improvements in image quality, reconstruction fidelity, and parameter estimation accuracy, while reducing the average inference time per black hole image from 5.25 seconds to 1.15 seconds. Our work establishes diffusion-based latent models as efficient and scalable substitutes for traditional radiative transfer solvers, offering a practical framework toward real-time modeling and inference for next-generation black hole imaging.
Paper Structure (12 sections, 16 equations, 6 figures, 2 tables)

This paper contains 12 sections, 16 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Overall framework of the proposed Latent Self-Attentive Denoising Diffusion Model. This figure includes the training process and image generation process.
  • Figure 2: Schematic of the self-attention block integrated into the parameter prediction branch. The diagram illustrates the flow from input features through linear projections to Query (Q), Key (K), and Value (V) vectors, the computation of attention weights, and the final weighted aggregation that produces a refined feature representation. This module enables the model to capture long-range dependencies within the feature set, enhancing physical consistency.
  • Figure 3: Reconstruction quality assessment across varying PCA dimensions. As the dimensionality increases, the NRMSE decreases and visual fidelity improves sharply until a saturation point near $d=256$. The minimal improvement beyond 256 components confirms that $d=256$ represents the optimal trade-off between fidelity and efficiency.
  • Figure 4: Cumulative explained variance as a function of the number of retained principal components. The first 151 components capture $99.9\%$ of the total variance. We select $d=256$ (retaining $R\approx0.9993$) to ensure compatibility with the U-Net's power-of-two architecture constraints, achieving a 256-fold compression from the original $65,536$ pixel dimensions.
  • Figure 5: Visual comparison of black hole images generated by different model variants under identical physical parameters. Each panel displays, from top to bottom: the normalized ground-truth GRRT image, the image generated by the baseline BCDDM, the image from the PCA-enhanced BCDDM, and the image synthesized by the full proposed LSA-DDM. This side-by-side comparison qualitatively demonstrates the progressive improvement in visual fidelity achieved by our synergistic design.
  • ...and 1 more figures