Identifying percolation phase transitions with unsupervised learning based on largest clusters
Dian Xu, Shanshan Wang, Weibing Deng, Feng Gao, Wei Li, Jianmin Shen
TL;DR
This work shows that unsupervised learning on raw percolation configurations struggles to locate $p_c$, but inputting the largest cluster enables PCA and AE to reveal the critical point through density-related signals. By combining Monte Carlo simulations with a Fake Finite Size Scaling (FFSS) approach, the authors obtain $p_c$ estimates that align with theory for both site and bond percolation, while shuffle experiments indicate the learned features predominantly reflect active-site density rather than spatial order. The results strongly support interpreting the principal PCA component and AE latent variable as density proxies, and demonstrate the method’s robustness via FFSS and shuffled-cluster analyses. This approach offers a practical unsupervised pathway to analyze percolation transitions and could inform similar studies in complex networks and lattice models.
Abstract
The application of machine learning in the study of phase transitions has achieved remarkable success in both equilibrium and non-equilibrium systems. It is widely recognized that unsupervised learning can retrieve phase transition information through hidden variables. However, using unsupervised methods to identify the critical point of percolation models has remained an intriguing challenge. This paper suggests that, by inputting the largest cluster rather than the original configuration into the learning model, unsupervised learning can indeed predict the critical point of the percolation model. Furthermore, we observe that when the largest cluster configuration is randomly shuffled-altering the positions of occupied sites or bonds-there is no significant difference in the output compared to learning the largest cluster configuration directly. This finding suggests a more general principle: unsupervised learning primarily captures particle density, or more specifically, occupied site density. However, shuffling does impact the formation of the largest cluster, which is directly related to phase transitions. As randomness increases, we observe that the correlation length tends to decrease, providing direct evidence of this relationship. We also propose a method called Fake Finite Size Scaling (FFSS) to calculate the critical value, which improves the accuracy of fitting to a great extent.
