A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection Attacks

Udi Aharon; Ran Dubin; Amit Dvir; Chen Hajaj

A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection Attacks

Udi Aharon, Ran Dubin, Amit Dvir, Chen Hajaj

TL;DR

The paper tackles the problem of detecting API Injection attacks, including zero-day variants, by proposing an unsupervised few-shot framework called FT-ANN. The approach combines a domain-agnostic FastText language model trained on normal API traffic with a Classification-by-Retrieval engine that uses ANN (HNSW) to compare incoming requests against a single, endpoint-agnostic representation and per-endpoint thresholds in the range $[0,1]$ for anomaly scoring. Key contributions include a novel API-focused tokenizer and preprocessing pipeline, a lightweight in-memory retrieval model capable of incremental updates, and extensive evaluation on CSIC 2010 and ATRDF 2023 showing superior accuracy over SOTA unsupervised baselines. The work demonstrates that rapid, data-efficient anomaly detection is feasible for real-time API security, with practical implications for deployment in dynamic API ecosystems.

Abstract

Application Programming Interface (API) Injection attacks refer to the unauthorized or malicious use of APIs, which are often exploited to gain access to sensitive data or manipulate online systems for illicit purposes. Identifying actors that deceitfully utilize an API poses a demanding problem. Although there have been notable advancements and contributions in the field of API security, there remains a significant challenge when dealing with attackers who use novel approaches that don't match the well-known payloads commonly seen in attacks. Also, attackers may exploit standard functionalities unconventionally and with objectives surpassing their intended boundaries. Thus, API security needs to be more sophisticated and dynamic than ever, with advanced computational intelligence methods, such as machine learning models that can quickly identify and respond to abnormal behavior. In response to these challenges, we propose a novel unsupervised few-shot anomaly detection framework composed of two main parts: First, we train a dedicated generic language model for API based on FastText embedding. Next, we use Approximate Nearest Neighbor search in a classification-by-retrieval approach. Our framework allows for training a fast, lightweight classification model using only a few examples of normal API requests. We evaluated the performance of our framework using the CSIC 2010 and ATRDF 2023 datasets. The results demonstrate that our framework improves API attack detection accuracy compared to the state-of-the-art (SOTA) unsupervised anomaly detection baselines.

A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection Attacks

TL;DR

for anomaly scoring. Key contributions include a novel API-focused tokenizer and preprocessing pipeline, a lightweight in-memory retrieval model capable of incremental updates, and extensive evaluation on CSIC 2010 and ATRDF 2023 showing superior accuracy over SOTA unsupervised baselines. The work demonstrates that rapid, data-efficient anomaly detection is feasible for real-time API security, with practical implications for deployment in dynamic API ecosystems.

Abstract

Paper Structure (10 sections, 4 equations, 10 figures, 5 tables)

This paper contains 10 sections, 4 equations, 10 figures, 5 tables.

Introduction
Related Work
Framework
Data Pre-Processing
Unsupervised ML Language Model
ANN
Experimental Design
Evaluation Metrics
Experimental Results
Discussion and Conclusions

Figures (10)

Figure 1: The FT-ANN Framework: Training and Inference Methodology.
Figure 2: Data pre-processing steps
Figure 3: Features extraction and classification-by-retrieval
Figure 4: An example of anomaly requests from CSIC 2010 and ATRDF 2023 datasets where endpoint definition is marked in green, malicious payload is marked in red.
Figure 5: Visualization of FastText embedding using t-SNE shows the planar representation of the internal high-dimensional organization of the two classes. Each class is labeled with a unique color for clarity. Dimensionality reduction techniques, such as t-SNE, arrange data points so that those close together represent requests that FastText perceives as having similar patterns.
...and 5 more figures

A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection Attacks

TL;DR

Abstract

A Classification-by-Retrieval Framework for Few-Shot Anomaly Detection to Detect API Injection Attacks

Authors

TL;DR

Abstract

Table of Contents

Figures (10)