Table of Contents
Fetching ...

Palace: A Library for Interactive GPU-Accelerated Large Tensor Processing and Visualization

Dominik Drees, Benjamin Risse

TL;DR

Palace tackles the challenge of interactive processing and visualization of tensors that exceed local memory by introducing a GPU-accelerated, out-of-core framework on workstations. It combines chunked tensors, a pull-based compute graph, asynchronous task scheduling, a GPU-oriented page-table hierarchy, and level-of-detail pyramids to efficiently manage data across RAM, VRAM, and disk, including multi-GPU execution. The paper demonstrates Palace’s effectiveness through volume raycasting and hierarchical random walker segmentation, achieving superior or competitive performance against state-of-the-art systems and showcasing versatile use cases from 2D slide viewers to 4D time-series processing and virtually unlimited procedurally generated data. The results indicate Palace enables rapid prototyping and scalable visualization pipelines on commodity hardware, with practical impact for researchers handling very large, multi-dimensional tensors.

Abstract

Tensor datasets (two-, three-, or higher-dimensional) are fundamental to many scientific fields utilizing imaging or simulation technologies. Advances in these methods have led to ever-increasing data sizes and, consequently, interest and development of out-of-core processing and visualization techniques, although mostly as specialized solutions. Here we present Palace, an open-source, cross-platform, general-purpose library for interactive and accelerated out-of-core tensor processing and visualization. Through a high-performance asynchronous concurrent architecture and a simple compute-graph interface, Palace enables the interactive development of out-of-core pipelines on workstation hardware. We demonstrate on benchmarks that Palace outperforms or matches state-of-the-art systems for volume rendering and hierarchical random-walker segmentation and demonstrate applicability in use cases involving tensors from 2D images up to 4D time series datasets.

Palace: A Library for Interactive GPU-Accelerated Large Tensor Processing and Visualization

TL;DR

Palace tackles the challenge of interactive processing and visualization of tensors that exceed local memory by introducing a GPU-accelerated, out-of-core framework on workstations. It combines chunked tensors, a pull-based compute graph, asynchronous task scheduling, a GPU-oriented page-table hierarchy, and level-of-detail pyramids to efficiently manage data across RAM, VRAM, and disk, including multi-GPU execution. The paper demonstrates Palace’s effectiveness through volume raycasting and hierarchical random walker segmentation, achieving superior or competitive performance against state-of-the-art systems and showcasing versatile use cases from 2D slide viewers to 4D time-series processing and virtually unlimited procedurally generated data. The results indicate Palace enables rapid prototyping and scalable visualization pipelines on commodity hardware, with practical impact for researchers handling very large, multi-dimensional tensors.

Abstract

Tensor datasets (two-, three-, or higher-dimensional) are fundamental to many scientific fields utilizing imaging or simulation technologies. Advances in these methods have led to ever-increasing data sizes and, consequently, interest and development of out-of-core processing and visualization techniques, although mostly as specialized solutions. Here we present Palace, an open-source, cross-platform, general-purpose library for interactive and accelerated out-of-core tensor processing and visualization. Through a high-performance asynchronous concurrent architecture and a simple compute-graph interface, Palace enables the interactive development of out-of-core pipelines on workstation hardware. We demonstrate on benchmarks that Palace outperforms or matches state-of-the-art systems for volume rendering and hierarchical random-walker segmentation and demonstrate applicability in use cases involving tensors from 2D images up to 4D time series datasets.

Paper Structure

This paper contains 20 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 2: Overview of operator network and resulting dynamic request and task network. While the users' view is an compute graph (the operator network), static for each resolve-call to the library, during fulfillment of the request, tasks and transitive requests are tracked using the task graph structure.
  • Figure 3: Illustration of concurrent execution in Palace.
  • Figure 4: Benchmark views rendered as part of the evaluation as reported in \ref{['tab:bench_raycast']}. Notably, the kidney appears squashed due to sarton2020gpuooc not supporting anisotropic voxel spacing which has been mimicked in Palace in this experiment for a fair comparison.
  • Figure 5: Four zoom levels rendered with Palace's image viewer in a whole slide image from a bone marrow smear dataset kockwelp2022cvprkockwelp2024deep.
  • Figure 6: Screenshot of example application segmenting the heart in a volumetric time series dataset from seeds in time points 0 and 199, showing segmentation in time step 100.
  • ...and 1 more figures