Table of Contents
Fetching ...

WgPy: GPU-accelerated NumPy-like array library for web browsers

Masatoshi Hidaka, Tatsuya Harada

TL;DR

WgPy tackles the limitation of running NumPy in web browsers by providing a GPU-accelerated, NumPy-like array library that runs Python in the browser via Pyodide and delegates computations to WebGL/WebGPU. It introduces a synchronization mechanism using SharedArrayBuffer and Atomics to preserve synchronous Python semantics while GPU operations execute asynchronously, and supports custom kernels to accelerate diverse workloads. Key findings include up to $95\times$ speedup in ResNet-18 training and $340\times$ speedup in $1024 \times 1024$ matrix multiplications, plus a demonstration of distributed CNN hyperparameter optimization across mobile devices. These results suggest that installation-free, browser-based scientific computing and large-scale volunteer-style distributed computing are practical with WgPy, with future work aimed at automatic kernel fusion via JIT and broader NumPy coverage.

Abstract

To execute scientific computing programs such as deep learning at high speed, GPU acceleration is a powerful option. With the recent advancements in web technologies, interfaces like WebGL and WebGPU, which utilize GPUs on the client side of web applications, have become available. On the other hand, Pyodide, a Python runtime that operates on web browsers, allows web applications to be written in Python, but it can only utilize the CPU, leaving room for acceleration. Our proposed new library, WgPy, provides array computation capabilities on the GPU with a NumPy-compatible interface in the web browser. This library not only implements array operations such as matrix multiplication on WebGL and WebGPU, but also allows the users to write custom kernels that can run on GPUs with minimal syntax knowledge, allowing you to run a variety of algorithms with minimal overhead. WgPy also implements a special thread synchronization mechanism, which bridges asynchronous semantics of JavaScript with Python's synchronous semantics, allows code written for CuPy, the NumPy-compatible array library for CUDA, to run directly in a web browser. In experiments involving training a CNN model, it achieved processing at 95 times the speed compared to CPU execution.

WgPy: GPU-accelerated NumPy-like array library for web browsers

TL;DR

WgPy tackles the limitation of running NumPy in web browsers by providing a GPU-accelerated, NumPy-like array library that runs Python in the browser via Pyodide and delegates computations to WebGL/WebGPU. It introduces a synchronization mechanism using SharedArrayBuffer and Atomics to preserve synchronous Python semantics while GPU operations execute asynchronously, and supports custom kernels to accelerate diverse workloads. Key findings include up to speedup in ResNet-18 training and speedup in matrix multiplications, plus a demonstration of distributed CNN hyperparameter optimization across mobile devices. These results suggest that installation-free, browser-based scientific computing and large-scale volunteer-style distributed computing are practical with WgPy, with future work aimed at automatic kernel fusion via JIT and broader NumPy coverage.

Abstract

To execute scientific computing programs such as deep learning at high speed, GPU acceleration is a powerful option. With the recent advancements in web technologies, interfaces like WebGL and WebGPU, which utilize GPUs on the client side of web applications, have become available. On the other hand, Pyodide, a Python runtime that operates on web browsers, allows web applications to be written in Python, but it can only utilize the CPU, leaving room for acceleration. Our proposed new library, WgPy, provides array computation capabilities on the GPU with a NumPy-compatible interface in the web browser. This library not only implements array operations such as matrix multiplication on WebGL and WebGPU, but also allows the users to write custom kernels that can run on GPUs with minimal syntax knowledge, allowing you to run a variety of algorithms with minimal overhead. WgPy also implements a special thread synchronization mechanism, which bridges asynchronous semantics of JavaScript with Python's synchronous semantics, allows code written for CuPy, the NumPy-compatible array library for CUDA, to run directly in a web browser. In experiments involving training a CNN model, it achieved processing at 95 times the speed compared to CPU execution.

Paper Structure

This paper contains 17 sections, 1 equation, 8 figures, 1 table.

Figures (8)

  • Figure 1: The structure of WgPy. WgPy exposes NumPy-like array interface in the Python interpreter, and intermediates with array processing routines implemented in WebGL and WebGPU, the GPU interfaces for web browsers.
  • Figure 2: Pseudo code to transfer data from GPU to CPU, which does not work. Left: Python side, Right: JavaScript side.
  • Figure 3: Sequence diagram showing the process of transferring array data on the GPU to a NumPy array on the CPU, using the Atomics API to block worker threads, so that Python code does not need to be aware of asynchronous processing and can run without modifying existing code that uses CuPy.
  • Figure 4: Visualization of Mandelbrot set. The range of the real axis is [-2.0, 0.5] and the range of the imaginary axis is [-1.2, 1.2]. White pixels indicate that the sequence does not diverge, and black pixels indicate that the sequence does diverge.
  • Figure 5: The speed of computing the Mandelbrot set. Normal indicates the case where the kernel is implemented using a combination of basic operations, and custom indicates the case where a custom kernel is used.
  • ...and 3 more figures