Table of Contents
Fetching ...

From $α$ decay to cluster decay: an extreme case of transfer learning

Yinu Zhang, Zhiyi Li, Kele Li, Jiaxuan Zhong, Cenxi Yuan

Abstract

When training data are limited, data-driven models are especially vulnerable to optimization-related fluctuations from random initialization and to sampling-induced bias from insufficient training data. We address both challenges with transfer learning (TL): deep neural networks (DNNs) are first pretrained on $α$ decay half-lives and then fine-tuned on a small cluster decay dataset. The pretraining stage provides a physically informed initialization that stabilizes optimization, while transferred global decay systematics regularize the fit and reduce sensitivity to training set composition. Despite extreme data sparsity, the resulting models accurately predict cluster decay half-lives for parent nuclei from $^{221}$Fr to $^{242}$Cm. We further quantify how initialization and sample selection affect predictive accuracy and robustness, demonstrating that TL enables stable and reliable learning in the small-sample regime.

From $α$ decay to cluster decay: an extreme case of transfer learning

Abstract

When training data are limited, data-driven models are especially vulnerable to optimization-related fluctuations from random initialization and to sampling-induced bias from insufficient training data. We address both challenges with transfer learning (TL): deep neural networks (DNNs) are first pretrained on decay half-lives and then fine-tuned on a small cluster decay dataset. The pretraining stage provides a physically informed initialization that stabilizes optimization, while transferred global decay systematics regularize the fit and reduce sensitivity to training set composition. Despite extreme data sparsity, the resulting models accurately predict cluster decay half-lives for parent nuclei from Fr to Cm. We further quantify how initialization and sample selection affect predictive accuracy and robustness, demonstrating that TL enables stable and reliable learning in the small-sample regime.

Paper Structure

This paper contains 7 sections, 4 equations, 4 figures.

Figures (4)

  • Figure 1: Schematic of the TL architecture. In general, TL reuses knowledge learned from a data-rich source task to improve performance on a related, data-scarce target task. In this work, a DNN is first pretrained on the abundant $\alpha$ decay dataset and then adapted to cluster decay by either fine-tuning all layers (full fine-tuning) or fine-tuning only the last few layers (shallow fine-tuning), thereby leveraging shared tunneling systematics while limiting overfitting in the data-scarce target domain.
  • Figure 2: Performance evaluation using 10-fold cross-validation. The left panels show the rms deviations, $\bar{\sigma}_{\rm rms}$, for the training (TR), validation (VS), and test (TS) folds in the $\alpha$ decay (a), and $\alpha$+cluster decay (c). The right panels illustrate the domain shift when an $\alpha$-only network is applied directly to cluster decay (b) and the corresponding performance when the network is trained on the combined $\alpha$+cluster decay dataset (d). The x-axis of the right panels denotes the parent nuclei for cluster decay, where the right superscripts demonstrate different decay channels, green strips are training data, and pink strips are test data.
  • Figure 3: Optimization-related fluctuations for half-life prediction are shown using the same cluster decay train/test data in Fig. \ref{['dnn_evaluation']}. The x-axis denotes the parent nuclei for cluster decay, where the right superscripts demonstrate different decay channels, and the y-axis denotes the difference between the experimental value and the corresponding TL prediction. Panel (a) shows a model trained directly on 10 cluster decay samples from random initialization, which overfits and exhibits large variability across 50 random initializations. Panel (b) shows the full TL model initialized with pretrained $\alpha$ decay parameters $\boldsymbol{\theta}_{\rm pre}$ and then fine-tuned on the cluster decay training subset, yielding more stable predictions for both training and test data.
  • Figure 4: Predictive performance as a function of the number of cluster decay training samples. For each sample size, the mean $\sigma_{rms}$ and its standard deviation for complete cluster decay data were computed from 50 random training set selections, evaluating three approaches: the combined $\alpha$+cluster model, shallow TL fine-tuning, and full TL fine-tuning; the UDL result is included as a benchmark.