WgPy: GPU-accelerated NumPy-like array library for web browsers
Masatoshi Hidaka, Tatsuya Harada
TL;DR
WgPy tackles the limitation of running NumPy in web browsers by providing a GPU-accelerated, NumPy-like array library that runs Python in the browser via Pyodide and delegates computations to WebGL/WebGPU. It introduces a synchronization mechanism using SharedArrayBuffer and Atomics to preserve synchronous Python semantics while GPU operations execute asynchronously, and supports custom kernels to accelerate diverse workloads. Key findings include up to $95\times$ speedup in ResNet-18 training and $340\times$ speedup in $1024 \times 1024$ matrix multiplications, plus a demonstration of distributed CNN hyperparameter optimization across mobile devices. These results suggest that installation-free, browser-based scientific computing and large-scale volunteer-style distributed computing are practical with WgPy, with future work aimed at automatic kernel fusion via JIT and broader NumPy coverage.
Abstract
To execute scientific computing programs such as deep learning at high speed, GPU acceleration is a powerful option. With the recent advancements in web technologies, interfaces like WebGL and WebGPU, which utilize GPUs on the client side of web applications, have become available. On the other hand, Pyodide, a Python runtime that operates on web browsers, allows web applications to be written in Python, but it can only utilize the CPU, leaving room for acceleration. Our proposed new library, WgPy, provides array computation capabilities on the GPU with a NumPy-compatible interface in the web browser. This library not only implements array operations such as matrix multiplication on WebGL and WebGPU, but also allows the users to write custom kernels that can run on GPUs with minimal syntax knowledge, allowing you to run a variety of algorithms with minimal overhead. WgPy also implements a special thread synchronization mechanism, which bridges asynchronous semantics of JavaScript with Python's synchronous semantics, allows code written for CuPy, the NumPy-compatible array library for CUDA, to run directly in a web browser. In experiments involving training a CNN model, it achieved processing at 95 times the speed compared to CPU execution.
