Record Acceleration of the Two-Dimensional Ising Model Using High-Performance Wafer Scale Engine
Dirk Van Essendelft, Hayl Almolyki, Wei Shi, Terry Jordan, Mei-Yu Wang, Wissam A. Saidi
TL;DR
This work demonstrates a pioneering acceleration of the two-dimensional Ising model on the Cerebras Wafer-Scale Engine (WSE) by tailoring a checkerboard Monte Carlo update to the WSE's 2D processing-element grid. It employs a domain-folding strategy and a 16-spin int16 spin packing across eight arrays to minimize memory traffic and enable near-ideal weak scaling with only nearest-neighbor communication. The approach achieves a peak of 61.8 trillion flip attempts per second for lattices up to 200 million spins, with up to 148x speedup over a highly optimized V100 implementation and up to 88x higher productivity vs H100 for multi-simulation workloads. The results highlight the WSE's potential for large-scale scientific computing and materials modeling, enabling massive parallelism in spin-based Monte Carlo simulations.
Abstract
The versatility and wide-ranging applicability of the Ising model, originally introduced to study phase transitions in magnetic materials, have made it a cornerstone in statistical physics and a valuable tool for evaluating the performance of emerging computer hardware. Here, we present a novel implementation of the two-dimensional Ising model on a Cerebras Wafer-Scale Engine (WSE), a revolutionary processor that is opening new frontiers in computing. In our deployment of the checkerboard algorithm, we optimized the Ising model to take advantage of the unique WSE architecture. Specifically, we employed a compressed bit representation storing 16 spins on each int16 word, and efficiently distributed the spins over the processing units enabling seamless weak scaling and limiting communications to only immediate neighboring units. Our implementation can handle up to 754 simulations in parallel, achieving an aggregate of over 61.8 trillion flip attempts per second for Ising models with up to 200 million spins. This represents a gain of up to 148 times over previously reported single-device with a highly optimized implementation on NVIDIA V100 and up to 88 times in productivity compared to NVIDIA H100. Our findings highlight the significant potential of the WSE in scientific computing, particularly in the field of materials modeling.
