Table of Contents
Fetching ...

Otus Supercomputer

Sadaf Ehtesabi, Manoar Hossain, Tobias Kenter, Andreas Krawinkel, Holger Nitsche, Lukas Ostermann, Christian Plessl, Heinrich Riebler, Stefan Rohde, Robert Schade, Michael Schwarz, Jens Simon, Nils Winnwa, Alex Wiens, Xin Wu

TL;DR

Otus is PC2's flagship HPC system designed to double the compute capacity of its predecessor Noctua 2 while preserving familiar node types (CPU, GPU, FPGA). The paper details Otus's hardware (Lenovo/AMD/NVIDIA/Alveo V80), upgraded InfiniBand networking (NDR fat-tree), and a diverse storage subsystem (IBM Storage Scale) integrated with a highly energy-aware data center. It also documents the software stack (Rocky Linux, Slurm, Lmod, ClusterCockpit) and the PERSEUS compute-project management system, emphasizing modularity and automation. The practical impact lies in delivering significantly higher FP/memory throughput and energy-efficient operation within a national HPC framework, with lessons applicable to other centers seeking scalable HPC deployments and efficient data-center integration.

Abstract

Otus is a high-performance computing cluster that was launched in 2025 and is operated by the Paderborn Center for Parallel Computing (PC2) at Paderborn University in Germany. The system is part of the National High Performance Computing (NHR) initiative. Otus complements the previous supercomputer Noctua 2, offering approximately twice the computing power while retaining the three node types that were characteristic of Noctua 2: 1) CPU compute nodes with different memory capacities, 2) high-end GPU nodes, and 3) HPC-grade FPGA nodes. On the Top500 list, which ranks the 500 most powerful supercomputers in the world, Otus is in position 164 with the CPU partition and in position 255 with the GPU partition (June 2025). On the Green500 list, ranking the 500 most energy-efficient supercomputers in the world, Otus is in position 5 with the GPU partition (June 2025). This article provides a comprehensive overview of the system in terms of its hardware, software, system integration, and its overall integration into the data center building to ensure energy-efficient operation. The article aims to provide unique insights for scientists using the system and for other centers operating HPC clusters. The article will be continuously updated to reflect the latest system setup and measurements.

Otus Supercomputer

TL;DR

Otus is PC2's flagship HPC system designed to double the compute capacity of its predecessor Noctua 2 while preserving familiar node types (CPU, GPU, FPGA). The paper details Otus's hardware (Lenovo/AMD/NVIDIA/Alveo V80), upgraded InfiniBand networking (NDR fat-tree), and a diverse storage subsystem (IBM Storage Scale) integrated with a highly energy-aware data center. It also documents the software stack (Rocky Linux, Slurm, Lmod, ClusterCockpit) and the PERSEUS compute-project management system, emphasizing modularity and automation. The practical impact lies in delivering significantly higher FP/memory throughput and energy-efficient operation within a national HPC framework, with lessons applicable to other centers seeking scalable HPC deployments and efficient data-center integration.

Abstract

Otus is a high-performance computing cluster that was launched in 2025 and is operated by the Paderborn Center for Parallel Computing (PC2) at Paderborn University in Germany. The system is part of the National High Performance Computing (NHR) initiative. Otus complements the previous supercomputer Noctua 2, offering approximately twice the computing power while retaining the three node types that were characteristic of Noctua 2: 1) CPU compute nodes with different memory capacities, 2) high-end GPU nodes, and 3) HPC-grade FPGA nodes. On the Top500 list, which ranks the 500 most powerful supercomputers in the world, Otus is in position 164 with the CPU partition and in position 255 with the GPU partition (June 2025). On the Green500 list, ranking the 500 most energy-efficient supercomputers in the world, Otus is in position 5 with the GPU partition (June 2025). This article provides a comprehensive overview of the system in terms of its hardware, software, system integration, and its overall integration into the data center building to ensure energy-efficient operation. The article aims to provide unique insights for scientists using the system and for other centers operating HPC clusters. The article will be continuously updated to reflect the latest system setup and measurements.

Paper Structure

This paper contains 41 sections, 5 equations, 20 figures, 11 tables.

Figures (20)

  • Figure 1: Otus supercomputer operated at the Paderborn Center for Parallel Computing.
  • Figure 2: Schematic floor plan of Otus, showing the racks containing nodes, network switches, and accelerators, along with the power and water cooling infrastructure. Three racks are grouped into a pod with the middle rack hosting management and networking devices. Single CPU nodes of normal and largemem variants are represented by a square. Individual GPU and FPGA accelerators are represented, respectively. The InfiniBand network connects all devices, including the storage. The cooling distribution unit pumps warm water to the racks and returns heated water. A heat exchanger exchanges the heat between the cluster water loop and the facility water loop .
  • Figure 3: InfiniBand network topology used in Otus. Every level 1 leaf switch is connected to all five level 2 spine switches. The (a) GPU, (b) FPGA and (c) other compute blades are connected to the leaf switches.
  • Figure 4: I/O side of a CPU compute blade. (a) SharedIO PCIe connector connects the InfiniBand adapter on the right to the PCIe slot of the left compute node. (b) InfiniBand card in the right compute node's PCIe slot.
  • Figure 5: Illustration showing the building blocks of the GPFS.
  • ...and 15 more figures