Table of Contents
Fetching ...

Security Testbed for Preempting Attacks against Supercomputing Infrastructure

Phuong Cao, Zbigniew Kalbarczyk, Ravishankar Iyer

TL;DR

A security testbed embedded in live traffic of a supercomputer at the National Center for Supercomputing Applications (NCSA) is described to demonstrate attack preemption, i.e., stopping system compromise and data breaches at petascale supercomputers.

Abstract

Securing HPC has a unique threat model. Untrusted, malicious code exploiting the concentrated computing power may exert an outsized impact on the shared, open-networked environment in HPC, unlike well-isolated VM tenants in public clouds. Therefore, preempting attacks targeting supercomputing systems before damage remains the top security priority. The main challenge is that noisy attack attempts and unreliable alerts often mask \emph{real attacks}, causing permanent damages such as system integrity violations and data breaches. This paper describes a security testbed embedded in live traffic of a supercomputer at the National Center for Supercomputing Applications (NCSA). The objective is to demonstrate attack \textit{preemption}, i.e., stopping system compromise and data breaches at petascale supercomputers. Deployment of our testbed at NCSA enables the following key contributions: 1) Insights from characterizing unique \textit{attack patterns} found in real security logs of more than 200 security incidents curated in the past two decades at NCSA. 2) Deployment of an attack visualization tool to illustrate the challenges of identifying real attacks in HPC environments and to support security operators in interactive attack analyses. 3) Demonstrate the utility of the testbed by running novel models, such as Factor-Graph-based models, to preempt a real-world ransomware family.

Security Testbed for Preempting Attacks against Supercomputing Infrastructure

TL;DR

A security testbed embedded in live traffic of a supercomputer at the National Center for Supercomputing Applications (NCSA) is described to demonstrate attack preemption, i.e., stopping system compromise and data breaches at petascale supercomputers.

Abstract

Securing HPC has a unique threat model. Untrusted, malicious code exploiting the concentrated computing power may exert an outsized impact on the shared, open-networked environment in HPC, unlike well-isolated VM tenants in public clouds. Therefore, preempting attacks targeting supercomputing systems before damage remains the top security priority. The main challenge is that noisy attack attempts and unreliable alerts often mask \emph{real attacks}, causing permanent damages such as system integrity violations and data breaches. This paper describes a security testbed embedded in live traffic of a supercomputer at the National Center for Supercomputing Applications (NCSA). The objective is to demonstrate attack \textit{preemption}, i.e., stopping system compromise and data breaches at petascale supercomputers. Deployment of our testbed at NCSA enables the following key contributions: 1) Insights from characterizing unique \textit{attack patterns} found in real security logs of more than 200 security incidents curated in the past two decades at NCSA. 2) Deployment of an attack visualization tool to illustrate the challenges of identifying real attacks in HPC environments and to support security operators in interactive attack analyses. 3) Demonstrate the utility of the testbed by running novel models, such as Factor-Graph-based models, to preempt a real-world ransomware family.
Paper Structure (19 sections, 5 figures, 1 table)

This paper contains 19 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: The challenge of finding real attacks hidden in a dense graph of noisy attack attempts intermingle with legitimate connections is illustrated in this graph. A) Most connections are generated by a mass scanner at the circle's center, attempting to scan open network ports across the entire NCSA's /16 IP address space (65,536 hosts). B) Real attacks are difficult to find due to the large scale of data, noisy attack attempts from other scanners (C), and legitimate connections (D), which often do not exhibit any clear pattern. Data source: NCSA's black hole router recorded 26.85 million scans on 2024/08/01 from 00:00 to 01:00. We sampled 10,000 most frequent scans from a mass scanner to include in part A in addition to legitimate network connections recorded by Zeek and a real attack. Graph drawing method: The graph contains 29,075 nodes and 27,336 edges and has been rendered using Gephi hu2005efficient. Annotation of attacker nodes has been done manually by cross-examining the ground truth of the attacker's IP addresses provided by the Factor-Graph-based attack detector cao2015preemptivecao2019preempting.
  • Figure 2: NCSA's monitors observe an average of 94,238 alerts per day (standard deviation = 23,547) in a sample month.
  • Figure 3: (a) The fractions of similar alerts between pairs of attacks in our dataset. (b) The count of LCS in our dataset.
  • Figure 4: Testbed workflow and architecture.
  • Figure 5: Recursive lateral movement of the ransomware by enumerating known hosts in the compromised instance.