Table of Contents
Fetching ...

Command-line Risk Classification using Transformer-based Neural Architectures

Paolo Notaro, Soroush Haeri, Jorge Cardoso, Michael Gerndt

TL;DR

The paper tackles command risk classification for CLI in large cloud environments by replacing brittle rule-based interception with a transformer-based approach pretrained on command-language data. It introduces a Bash-focused language model using Byte-Pair Encoding tokenization and a BERT backbone, trained through a three-phase pipeline (dataset collection, BPE learning, and BERT pretraining/finetuning) to classify commands into SAFE, RISKY, and BLOCKED. On a realistic dataset of production commands, the method outperforms baselines, notably improving detection of rare dangerous commands and enabling effective online interception and auditing. The work demonstrates transfer learning benefits for security tasks and suggests broader applicability to related CLI security responsibilities, including auditing and context extraction, thereby strengthening operational safeguards in cloud environments.

Abstract

To protect large-scale computing environments necessary to meet increasing computing demand, cloud providers have implemented security measures to monitor Operations and Maintenance (O&M) activities and therefore prevent data loss and service interruption. Command interception systems are used to intercept, assess, and block dangerous Command-line Interface (CLI) commands before they can cause damage. Traditional solutions for command risk assessment include rule-based systems, which require expert knowledge and constant human revision to account for unseen commands. To overcome these limitations, several end-to-end learning systems have been proposed to classify CLI commands. These systems, however, have several other limitations, including the adoption of general-purpose text classifiers, which may not adapt to the language characteristics of scripting languages such as Bash or PowerShell, and may not recognize dangerous commands in the presence of an unbalanced class distribution. In this paper, we propose a transformer-based command risk classification system, which leverages the generalization power of Large Language Models (LLM) to provide accurate classification and the ability to identify rare dangerous commands effectively, by exploiting the power of transfer learning. We verify the effectiveness of our approach on a realistic dataset of production commands and show how to apply our model for other security-related tasks, such as dangerous command interception and auditing of existing rule-based systems.

Command-line Risk Classification using Transformer-based Neural Architectures

TL;DR

The paper tackles command risk classification for CLI in large cloud environments by replacing brittle rule-based interception with a transformer-based approach pretrained on command-language data. It introduces a Bash-focused language model using Byte-Pair Encoding tokenization and a BERT backbone, trained through a three-phase pipeline (dataset collection, BPE learning, and BERT pretraining/finetuning) to classify commands into SAFE, RISKY, and BLOCKED. On a realistic dataset of production commands, the method outperforms baselines, notably improving detection of rare dangerous commands and enabling effective online interception and auditing. The work demonstrates transfer learning benefits for security tasks and suggests broader applicability to related CLI security responsibilities, including auditing and context extraction, thereby strengthening operational safeguards in cloud environments.

Abstract

To protect large-scale computing environments necessary to meet increasing computing demand, cloud providers have implemented security measures to monitor Operations and Maintenance (O&M) activities and therefore prevent data loss and service interruption. Command interception systems are used to intercept, assess, and block dangerous Command-line Interface (CLI) commands before they can cause damage. Traditional solutions for command risk assessment include rule-based systems, which require expert knowledge and constant human revision to account for unseen commands. To overcome these limitations, several end-to-end learning systems have been proposed to classify CLI commands. These systems, however, have several other limitations, including the adoption of general-purpose text classifiers, which may not adapt to the language characteristics of scripting languages such as Bash or PowerShell, and may not recognize dangerous commands in the presence of an unbalanced class distribution. In this paper, we propose a transformer-based command risk classification system, which leverages the generalization power of Large Language Models (LLM) to provide accurate classification and the ability to identify rare dangerous commands effectively, by exploiting the power of transfer learning. We verify the effectiveness of our approach on a realistic dataset of production commands and show how to apply our model for other security-related tasks, such as dangerous command interception and auditing of existing rule-based systems.

Paper Structure

This paper contains 15 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Architecture of a rule-based risk assessment system. Commands executed by operators are intercepted by a bastion host (Bastion SSH) to be evaluated using a set of rules stored in a configuration DB (Rule Management). If the command is evaluated safe, it is forwarded to the Target Host, otherwise an error is reported. All risk evaluations are logged and periodically revised by security experts, who may update the rules.
  • Figure 2: AI Classifier architecture. The input command is preprocessed via Byte-Pair Encoding to construct an input sequence of tokens. The sequence is processed by the BERT backbone to produce a latent representation of the command, which encodes important language-related information learned during pretraining. This latent representation is given to the risk classification layer to estimate the final command risk.
  • Figure 3: System architecture during the three phases of construction of the AI classifier. During pretraining, a command corpus is used to learn the language tokens and their context relationships. During finetuning, a dataset of labeled commands is used to specialize the AI model for the risk classification task. Both commands and labels are originating from the interception system composed of a rule-based classifier. During inference, the AI classifier replaces the rule-based classifier providing online risk classification for all commands executed.
  • Figure 4: F1-score of RISKY + BLOCKED commands on test set, as a function of dataset size used for training. We can observe how our BERT approach can classify dangerous commands more accurately in the presence of limited training data. Missing points indicate the F1-score could not be computed due to no TP predictions.