Identifying percolation phase transitions with unsupervised learning based on largest clusters

Dian Xu; Shanshan Wang; Weibing Deng; Feng Gao; Wei Li; Jianmin Shen

Identifying percolation phase transitions with unsupervised learning based on largest clusters

Dian Xu, Shanshan Wang, Weibing Deng, Feng Gao, Wei Li, Jianmin Shen

TL;DR

This work shows that unsupervised learning on raw percolation configurations struggles to locate $p_c$, but inputting the largest cluster enables PCA and AE to reveal the critical point through density-related signals. By combining Monte Carlo simulations with a Fake Finite Size Scaling (FFSS) approach, the authors obtain $p_c$ estimates that align with theory for both site and bond percolation, while shuffle experiments indicate the learned features predominantly reflect active-site density rather than spatial order. The results strongly support interpreting the principal PCA component and AE latent variable as density proxies, and demonstrate the method’s robustness via FFSS and shuffled-cluster analyses. This approach offers a practical unsupervised pathway to analyze percolation transitions and could inform similar studies in complex networks and lattice models.

Abstract

The application of machine learning in the study of phase transitions has achieved remarkable success in both equilibrium and non-equilibrium systems. It is widely recognized that unsupervised learning can retrieve phase transition information through hidden variables. However, using unsupervised methods to identify the critical point of percolation models has remained an intriguing challenge. This paper suggests that, by inputting the largest cluster rather than the original configuration into the learning model, unsupervised learning can indeed predict the critical point of the percolation model. Furthermore, we observe that when the largest cluster configuration is randomly shuffled-altering the positions of occupied sites or bonds-there is no significant difference in the output compared to learning the largest cluster configuration directly. This finding suggests a more general principle: unsupervised learning primarily captures particle density, or more specifically, occupied site density. However, shuffling does impact the formation of the largest cluster, which is directly related to phase transitions. As randomness increases, we observe that the correlation length tends to decrease, providing direct evidence of this relationship. We also propose a method called Fake Finite Size Scaling (FFSS) to calculate the critical value, which improves the accuracy of fitting to a great extent.

Identifying percolation phase transitions with unsupervised learning based on largest clusters

TL;DR

This work shows that unsupervised learning on raw percolation configurations struggles to locate

, but inputting the largest cluster enables PCA and AE to reveal the critical point through density-related signals. By combining Monte Carlo simulations with a Fake Finite Size Scaling (FFSS) approach, the authors obtain

estimates that align with theory for both site and bond percolation, while shuffle experiments indicate the learned features predominantly reflect active-site density rather than spatial order. The results strongly support interpreting the principal PCA component and AE latent variable as density proxies, and demonstrate the method’s robustness via FFSS and shuffled-cluster analyses. This approach offers a practical unsupervised pathway to analyze percolation transitions and could inform similar studies in complex networks and lattice models.

Abstract

Paper Structure (15 sections, 13 equations, 8 figures, 3 tables)

This paper contains 15 sections, 13 equations, 8 figures, 3 tables.

Introduction
Model and unsupervised Learning
percolation model
Unsupervised Learning
PCA
AE
The Unspervised Learning Results
Results of percolation
The raw configurations
The largest cluster
The shuffled largest cluster
The largest cluster of shuffled largest cluster
Discussions
Conclusion
Acknowledgements

Figures (8)

Figure 1: In this article, we examine six distinct configurations. a is the raw configuration of site percolation with occupation probability = 0.8. b is the largest cluster of a. c shows the shuffled configuration of b with a ratio $r = 0.2$ . d shows the largest cluster of figure c. e is the raw configuration of bond percolation with occupation probability = 0.8. f shows the largest cluster of e.
Figure 2: Neural network schematic structure of autoencoder.
Figure 3: The MC simulation results about site and bond percolation of a two-dimensional system of size $L \times L$. (a) means the probability that a site belongs to a percolating cluster, (b) calculated the largest cluster's density of active sites of the system.(c) means the probability that a bond belongs to a percolating cluster, (d) calculated the largest cluster's density of active bonds of the system.
Figure 4: The results of two-dimensional site and bond percolation with the raw configuration. The vertical coordinates of Panels a & d denote the density of active sites/bonds, b & e, the first principal component of PCA, and c & f the single latent variable of AE, respectively. Their horizontal coordinates are all occupation probability. They all exhibit a simple linear increase in nature and overlap, making it impossible to identify the critical point of the system. However, there is a highly similar behavior among the three.
Figure 5: The results of two-dimensional percolation model with the largest cluster selected from the raw configuration. The vertical coordinates of Panels a-c denote the density of active sites, the first principal component of PCA, and the single latent variable of AE, respectively. Their horizontal coordinates are all occupation probability. Panels d-f then correspond to FFSS, FFSS and FSS, respectively. The mean correlation coefficients of $h$, $pca_1$, to density are $0.878\pm(0.003)$ and $0.999\pm(0.001)$, respectively. The vertical coordinates of Panels g-i denote the density of active bonds, the first principal component of PCA, and the single latent variable of AE, respectively. Their horizontal coordinates are all occupation probability. Panels j-l then correspond to FFSS, FFSS and FSS, respectively. The mean correlation coefficients of $h$, $pca_1$, to density are $0.848\pm(0.002)$ and $0.998\pm(0.001)$, respectively.Both the ffss method and the fss method exhibit excellent properties in locating critical points.
...and 3 more figures

Identifying percolation phase transitions with unsupervised learning based on largest clusters

TL;DR

Abstract

Identifying percolation phase transitions with unsupervised learning based on largest clusters

Authors

TL;DR

Abstract

Table of Contents

Figures (8)