Table of Contents
Fetching ...

Active learning-based variance reduction for Monte Carlo simulations: A feasibility study for the nanodosimetry around a gold nanoparticle

Leo Thomas, Miriam Schwarze, Hans Rabus

TL;DR

This paper tackles the high computational cost of nanodosimetric Monte Carlo simulations around a gold NP by introducing a data-driven variance-reduction strategy. It develops an active-learning framework that uses a Gaussian Process Sampler to iteratively optimize an importance distribution $q(b)$ over impact parameters, guided by a loss that combines Wasserstein-1 distance and regularization. The method is coupled to Geant4 via a TCP interface to obtain $F_4$-cluster dose tallies and their shell-mean values, enabling efficient estimation of the $F_4$ cluster dose as a function of radius. Results show substantial efficiency gains and reasonable agreement with reference data near the NP, demonstrating proof-of-principle viability for ill-posed sampling problems in nanodosimetry, with clear paths for generalization and automation in AI-assisted MC workflows.

Abstract

Objective: This work presents a data-driven importance sampling-based variance reduction (VR) scheme inspired by active learning. The method is applied to the estimation of an optimal impact-parameter distribution in the calculation of ionization clusters around a gold nanoparticle (NP). Here, such an optimal importance distribution can not be inferred from principle. Approach: An iterative optimization procedure is set up that uses a Gaussian Process Sampler to propose optimal sampling distributions based on a loss function. The loss is constructed based on appropriate heuristics. The optimization code obtains estimates of the number of ionization clusters in shells around the NP by interfacing with a Geant4 simulation via a dedicated Transmission Control Protocol (TCP) interface. Main results: It is shown that the so-derived impact-parameter distribution easily outperforms the actual, uniform irradiation case. The results resemble those obtained with other VR schemes but do still slightly overestimate background contributions. Significance: While the method presented is a proof-of-principle, it provides a novel method of estimating importance distributions in ill-posed scenarios. The presented TCP interface described here is a simple and efficient method to expose compiled Geant4 code to other scripts, written for example, in Python.

Active learning-based variance reduction for Monte Carlo simulations: A feasibility study for the nanodosimetry around a gold nanoparticle

TL;DR

This paper tackles the high computational cost of nanodosimetric Monte Carlo simulations around a gold NP by introducing a data-driven variance-reduction strategy. It develops an active-learning framework that uses a Gaussian Process Sampler to iteratively optimize an importance distribution over impact parameters, guided by a loss that combines Wasserstein-1 distance and regularization. The method is coupled to Geant4 via a TCP interface to obtain -cluster dose tallies and their shell-mean values, enabling efficient estimation of the cluster dose as a function of radius. Results show substantial efficiency gains and reasonable agreement with reference data near the NP, demonstrating proof-of-principle viability for ill-posed sampling problems in nanodosimetry, with clear paths for generalization and automation in AI-assisted MC workflows.

Abstract

Objective: This work presents a data-driven importance sampling-based variance reduction (VR) scheme inspired by active learning. The method is applied to the estimation of an optimal impact-parameter distribution in the calculation of ionization clusters around a gold nanoparticle (NP). Here, such an optimal importance distribution can not be inferred from principle. Approach: An iterative optimization procedure is set up that uses a Gaussian Process Sampler to propose optimal sampling distributions based on a loss function. The loss is constructed based on appropriate heuristics. The optimization code obtains estimates of the number of ionization clusters in shells around the NP by interfacing with a Geant4 simulation via a dedicated Transmission Control Protocol (TCP) interface. Main results: It is shown that the so-derived impact-parameter distribution easily outperforms the actual, uniform irradiation case. The results resemble those obtained with other VR schemes but do still slightly overestimate background contributions. Significance: While the method presented is a proof-of-principle, it provides a novel method of estimating importance distributions in ill-posed scenarios. The presented TCP interface described here is a simple and efficient method to expose compiled Geant4 code to other scripts, written for example, in Python.

Paper Structure

This paper contains 21 sections, 36 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Geometrical irradiation setup (not to scale, c.f. Fig. \ref{['fig:impact']} for a more detailed description of the geometry). The world volume (region '$\mathbf{A}$') is a cylinder filled with water. Located at its center is a gold NP (in yellow). The source has the shape of a disk, located in the $x$-$y$-plane at a distance of $100\,\mu\mathrm{m}$ from the NP center. It is divided into annuli that are logarithmically increasing in width (region '$\mathbf{B}$'). The dark blue volume (region '$\mathbf{C}$') is the volume in which secondary electrons are transported. Its size is chosen so that any electron track that could reach any of the scoring shells around the NP is included.
  • Figure 2: Illustration of the coordinate system used with a gold NP (in yellow) placed at the origin. The shells around the NP symbolize the scoring volumes. Collectively they make up the region of interest. The source is a disk located in the $x$-$y$-plane at $z=-d_\mathrm{src} = -100\,\mu\mathrm{m}$ and photons are generated with a momentum direction of $(0,0,1)$.
  • Figure 3: Flowchart of the optimization procedure. The process begins with an initial choice for the importance function $q$ ($\mathbf{A}$). For each annulus, a number of primary particles $N^q_j$ is simulated to obtain estimates for the annulus- and shell-mean of the cluster dose $<g_{F_4}>_{r_i,b_j}$ ($\mathbf{B}$). Using this information, importance scores and the loss function are calculated using eqs. \ref{['eq:u_j']} and \ref{['eq:loss-final']}, respectively ($\mathbf{C}$). Using the TPE sampler a new importance function $q$ is chosen that optimizes this loss function ($\mathbf{D}$).
  • Figure 4: \ref{['fig:comparison']} Comparison of the cluster dose as a function of radial distance from the NP (eq. \ref{['eq:exp-mean']}) for the optimized importance function (blue curve), the "analog" computation of eq. \ref{['eq:exp-mean']} (red curve) as well as data taken from Thomas_2024 in original (green curve) and fluence-adjusted (purple curve) form for comparison. The shaded area corresponds to a sampling uncertainty of one standard deviation $\sigma$. The bottom plot displays the corresponding relative uncertainties as $\sigma/\mu$. \ref{['fig:q_comparison']} Comparison of the initial (red) and final (blue) distribution function that were used to generate the cluster doses in \ref{['fig:comparison']}. The $y$-axis has been cut off for legibility omitting higher values of the initial importance distribution. The initial distribution is, however, identical so $p_j$ the numerical values of which can be viewed in table \ref{['tab:p_j']}.
  • Figure 5: Final importance function divided by initial importance distribution ($q_J^{(0)} = p_j$). This is equal to the inverse likelihood ratio and also $\propto q_j / A_j$, the probability mass per area of the corresponding annulus.
  • ...and 3 more figures