Table of Contents
Fetching ...

Using Malware Detection Techniques for HPC Application Classification

Thomas Jakobsche, Florina M. Ciorba

TL;DR

This research proposes an approach that uses similarity-preserving fuzzy hashes to classify HPC application executables by comparing the similarity of SSDeep fuzzy hashes, a Random Forest Classifier can accurately label applications executing on HPC systems including unknown samples.

Abstract

HPC systems face security and compliance challenges, particularly in preventing waste and misuse of computational resources by unauthorized or malicious software that deviates from allocation purpose. Existing methods to classify applications based on job names or resource usage are often unreliable or fail to capture applications that have different behavior due to different inputs or system noise. This research proposes an approach that uses similarity-preserving fuzzy hashes to classify HPC application executables. By comparing the similarity of SSDeep fuzzy hashes, a Random Forest Classifier can accurately label applications executing on HPC systems including unknown samples. We evaluate the Fuzzy Hash Classifier on a dataset of 92 application classes and 5333 distinct application samples. The proposed method achieved a macro f1-score of 90% (micro f1-score: 89%, weighted f1-score: 90%). Our approach addresses the critical need for more effective application classification in HPC environments, minimizing resource waste, and enhancing security and compliance.

Using Malware Detection Techniques for HPC Application Classification

TL;DR

This research proposes an approach that uses similarity-preserving fuzzy hashes to classify HPC application executables by comparing the similarity of SSDeep fuzzy hashes, a Random Forest Classifier can accurately label applications executing on HPC systems including unknown samples.

Abstract

HPC systems face security and compliance challenges, particularly in preventing waste and misuse of computational resources by unauthorized or malicious software that deviates from allocation purpose. Existing methods to classify applications based on job names or resource usage are often unreliable or fail to capture applications that have different behavior due to different inputs or system noise. This research proposes an approach that uses similarity-preserving fuzzy hashes to classify HPC application executables. By comparing the similarity of SSDeep fuzzy hashes, a Random Forest Classifier can accurately label applications executing on HPC systems including unknown samples. We evaluate the Fuzzy Hash Classifier on a dataset of 92 application classes and 5333 distinct application samples. The proposed method achieved a macro f1-score of 90% (micro f1-score: 89%, weighted f1-score: 90%). Our approach addresses the critical need for more effective application classification in HPC environments, minimizing resource waste, and enhancing security and compliance.

Paper Structure

This paper contains 6 sections, 2 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Proposed envisioned workflow for classifying applications and supporting decision-making about jobs.
  • Figure 2: Number of samples for 92 application classes on a logarithmic scale.
  • Figure 3: The f1-Score over confidence threshold of the grid search within the training set to handle unknown classes.