Table of Contents
Fetching ...

An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator

Tong Dong Qiu, Andreas Thune, Vinicius Oliveira Martins, Markus Blatt, Alf Birger Rustad, Razvan Nane

TL;DR

This work tackles the high computational cost of reservoir simulation by evaluating GPU-based acceleration for the OPM Flow solver. It implements and benchmarks custom OpenCL kernels and several GPU sparse linear algebra libraries (cuSPARSE, rocsparse, amgcl, rocalution) against the CPU DUNE-based baseline, using NORNE and larger models to assess scalability. The results show GPUs can achieve up to around 5.6x speedups for a single dual-thread MPI process, with performance depending on library choice, hardware, and whether well contributions are coupled or separated; memory transfer and ILU0 parallelization are key bottlenecks. The study provides a practical, open-source bridge to integrate GPU solvers into OPM Flow and offers concrete directions for further optimization and profiling-driven improvements in GPU-based reservoir simulation workflows.

Abstract

Realistic reservoir simulation is known to be prohibitively expensive in terms of computation time when increasing the accuracy of the simulation or by enlarging the model grid size. One method to address this issue is to parallelize the computation by dividing the model in several partitions and using multiple CPUs to compute the result using techniques such as MPI and multi-threading. Alternatively, GPUs are also a good candidate to accelerate the computation due to their massively parallel architecture that allows many floating point operations per second to be performed. The numerical iterative solver takes thus the most computational time and is challenging to solve efficiently due to the dependencies that exist in the model between cells. In this work, we evaluate the OPM Flow simulator and compare several state-of-the-art GPU solver libraries as well as custom developed solutions for a BiCGStab solver using an ILU0 preconditioner and benchmark their performance against the default DUNE library implementation running on multiple CPU processors using MPI. The evaluated GPU software libraries include a manual linear solver in OpenCL and the integration of several third party sparse linear algebra libraries, such as cuSparse, rocSparse, and amgcl. To perform our bench-marking, we use small, medium, and large use cases, starting with the public test case NORNE that includes approximately 50k active cells and ending with a large model that includes approximately 1 million active cells. We find that a GPU can accelerate a single dual-threaded MPI process up to 5.6 times, and that it can compare with around 8 dual-threaded MPI processes.

An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator

TL;DR

This work tackles the high computational cost of reservoir simulation by evaluating GPU-based acceleration for the OPM Flow solver. It implements and benchmarks custom OpenCL kernels and several GPU sparse linear algebra libraries (cuSPARSE, rocsparse, amgcl, rocalution) against the CPU DUNE-based baseline, using NORNE and larger models to assess scalability. The results show GPUs can achieve up to around 5.6x speedups for a single dual-thread MPI process, with performance depending on library choice, hardware, and whether well contributions are coupled or separated; memory transfer and ILU0 parallelization are key bottlenecks. The study provides a practical, open-source bridge to integrate GPU solvers into OPM Flow and offers concrete directions for further optimization and profiling-driven improvements in GPU-based reservoir simulation workflows.

Abstract

Realistic reservoir simulation is known to be prohibitively expensive in terms of computation time when increasing the accuracy of the simulation or by enlarging the model grid size. One method to address this issue is to parallelize the computation by dividing the model in several partitions and using multiple CPUs to compute the result using techniques such as MPI and multi-threading. Alternatively, GPUs are also a good candidate to accelerate the computation due to their massively parallel architecture that allows many floating point operations per second to be performed. The numerical iterative solver takes thus the most computational time and is challenging to solve efficiently due to the dependencies that exist in the model between cells. In this work, we evaluate the OPM Flow simulator and compare several state-of-the-art GPU solver libraries as well as custom developed solutions for a BiCGStab solver using an ILU0 preconditioner and benchmark their performance against the default DUNE library implementation running on multiple CPU processors using MPI. The evaluated GPU software libraries include a manual linear solver in OpenCL and the integration of several third party sparse linear algebra libraries, such as cuSparse, rocSparse, and amgcl. To perform our bench-marking, we use small, medium, and large use cases, starting with the public test case NORNE that includes approximately 50k active cells and ending with a large model that includes approximately 1 million active cells. We find that a GPU can accelerate a single dual-threaded MPI process up to 5.6 times, and that it can compare with around 8 dual-threaded MPI processes.
Paper Structure (24 sections, 2 equations, 3 figures, 16 tables, 5 algorithms)

This paper contains 24 sections, 2 equations, 3 figures, 16 tables, 5 algorithms.

Figures (3)

  • Figure 1: The General Structure of OPM Flow.
  • Figure 2: Memory layout of blocks in OPM (left) and amgcl (right).
  • Figure 3: A flowchart depicting different implementations.