Security Testbed for Preempting Attacks against Supercomputing Infrastructure

Phuong Cao; Zbigniew Kalbarczyk; Ravishankar Iyer

Security Testbed for Preempting Attacks against Supercomputing Infrastructure

Phuong Cao, Zbigniew Kalbarczyk, Ravishankar Iyer

TL;DR

A security testbed embedded in live traffic of a supercomputer at the National Center for Supercomputing Applications (NCSA) is described to demonstrate attack preemption, i.e., stopping system compromise and data breaches at petascale supercomputers.

Abstract

Securing HPC has a unique threat model. Untrusted, malicious code exploiting the concentrated computing power may exert an outsized impact on the shared, open-networked environment in HPC, unlike well-isolated VM tenants in public clouds. Therefore, preempting attacks targeting supercomputing systems before damage remains the top security priority. The main challenge is that noisy attack attempts and unreliable alerts often mask \emph{real attacks}, causing permanent damages such as system integrity violations and data breaches. This paper describes a security testbed embedded in live traffic of a supercomputer at the National Center for Supercomputing Applications (NCSA). The objective is to demonstrate attack \textit{preemption}, i.e., stopping system compromise and data breaches at petascale supercomputers. Deployment of our testbed at NCSA enables the following key contributions: 1) Insights from characterizing unique \textit{attack patterns} found in real security logs of more than 200 security incidents curated in the past two decades at NCSA. 2) Deployment of an attack visualization tool to illustrate the challenges of identifying real attacks in HPC environments and to support security operators in interactive attack analyses. 3) Demonstrate the utility of the testbed by running novel models, such as Factor-Graph-based models, to preempt a real-world ransomware family.

Security Testbed for Preempting Attacks against Supercomputing Infrastructure

TL;DR

Abstract

Paper Structure (19 sections, 5 figures, 1 table)

This paper contains 19 sections, 5 figures, 1 table.

Introduction
Data Set, Graph Visualization, and Insights
Dataset
Graph visualization of mass scanners and attackers
Key Insights
Threat model and Related Work
Key alert concepts: successful attacks, attack attempts, and significant alerts.
Attacker and defender capabilities
Scope of this paper
Testbed architecture and deployment
Reproducing vulnerable services in a honeypot
Attracting attackers
Deploying vulnerable services in the live network
Case Study: Successful Ransomware Detection in Live Traffic
Initial entry of the ransomware
...and 4 more sections

Figures (5)

Figure 1: The challenge of finding real attacks hidden in a dense graph of noisy attack attempts intermingle with legitimate connections is illustrated in this graph. A) Most connections are generated by a mass scanner at the circle's center, attempting to scan open network ports across the entire NCSA's /16 IP address space (65,536 hosts). B) Real attacks are difficult to find due to the large scale of data, noisy attack attempts from other scanners (C), and legitimate connections (D), which often do not exhibit any clear pattern. Data source: NCSA's black hole router recorded 26.85 million scans on 2024/08/01 from 00:00 to 01:00. We sampled 10,000 most frequent scans from a mass scanner to include in part A in addition to legitimate network connections recorded by Zeek and a real attack. Graph drawing method: The graph contains 29,075 nodes and 27,336 edges and has been rendered using Gephi hu2005efficient. Annotation of attacker nodes has been done manually by cross-examining the ground truth of the attacker's IP addresses provided by the Factor-Graph-based attack detector cao2015preemptivecao2019preempting.
Figure 2: NCSA's monitors observe an average of 94,238 alerts per day (standard deviation = 23,547) in a sample month.
Figure 3: (a) The fractions of similar alerts between pairs of attacks in our dataset. (b) The count of LCS in our dataset.
Figure 4: Testbed workflow and architecture.
Figure 5: Recursive lateral movement of the ransomware by enumerating known hosts in the compromised instance.

Security Testbed for Preempting Attacks against Supercomputing Infrastructure

TL;DR

Abstract

Security Testbed for Preempting Attacks against Supercomputing Infrastructure

Authors

TL;DR

Abstract

Table of Contents

Figures (5)