Table of Contents
Fetching ...

MINES: Explainable Anomaly Detection through Web API Invariant Inference

Wenjie Zhang, Yun Lin, Chun Fung Amos Kwok, Xiwen Teoh, Xiaofei Xie, Frank Liauw, Hongyu Zhang, Jin Song Dong

TL;DR

MINES reframes web API anomaly detection as schema-level invariant inference by converting API signatures and database schemas into an augmented ER diagram and using LLMs to infer API-DB/API-API/API-Env constraints. The approach generates executable Python invariants refined against normal logs and supports offline runtime verification, including binary log history replay to maintain state consistency. Across Train-Ticket, NiceFish, and three real-world apps, MINES achieves near-perfect precision and significantly higher recall than state-of-the-art baselines, while maintaining efficiency and generalization. This schema-driven, explainable method reduces reliance on noisy raw logs, enables cross-domain observability, and offers practical deployment paths for web application security.

Abstract

Detecting the anomalies of web applications, important infrastructures for running modern companies and governments, is crucial for providing reliable web services. Many modern web applications operate on web APIs (e.g., RESTful, SOAP, and WebSockets), their exposure invites intended attacks or unintended illegal visits, causing abnormal system behaviors. However, such anomalies can share very similar logs with normal logs, missing crucial information (which could be in database) for log discrimination. Further, log instances can be also noisy, which can further mislead the state-of-the-art log learning solutions to learn spurious correlation, resulting superficial models and rules for anomaly detection. In this work, we propose MINES which infers explainable API invariants for anomaly detection from the schema level instead of detailed raw log instances, which can (1) significantly discriminate noise in logs to identify precise normalities and (2) detect abnormal behaviors beyond the instrumented logs. Technically, MINES (1) converts API signatures into table schema to enhance the original database shema; and (2) infers the potential database constraints on the enhanced database schema to capture the potential relationships between APIs and database tables. MINES uses LLM for extracting potential relationship based on two given table structures; and use normal log instances to reject and accept LLM-generated invariants. Finally, MINES translates the inferred constraints into invariants to generate Python code for verifying the runtime logs. We extensively evaluate MINES on web-tamper attacks on the benchmarks of TrainTicket, NiceFish, Gitea, Mastodon, and NextCloud against baselines such as LogRobust, LogFormer, and WebNorm. The results show that MINES achieves high recall for the anomalies while introducing almost zero false positives, indicating a new state-of-the-art.

MINES: Explainable Anomaly Detection through Web API Invariant Inference

TL;DR

MINES reframes web API anomaly detection as schema-level invariant inference by converting API signatures and database schemas into an augmented ER diagram and using LLMs to infer API-DB/API-API/API-Env constraints. The approach generates executable Python invariants refined against normal logs and supports offline runtime verification, including binary log history replay to maintain state consistency. Across Train-Ticket, NiceFish, and three real-world apps, MINES achieves near-perfect precision and significantly higher recall than state-of-the-art baselines, while maintaining efficiency and generalization. This schema-driven, explainable method reduces reliance on noisy raw logs, enables cross-domain observability, and offers practical deployment paths for web application security.

Abstract

Detecting the anomalies of web applications, important infrastructures for running modern companies and governments, is crucial for providing reliable web services. Many modern web applications operate on web APIs (e.g., RESTful, SOAP, and WebSockets), their exposure invites intended attacks or unintended illegal visits, causing abnormal system behaviors. However, such anomalies can share very similar logs with normal logs, missing crucial information (which could be in database) for log discrimination. Further, log instances can be also noisy, which can further mislead the state-of-the-art log learning solutions to learn spurious correlation, resulting superficial models and rules for anomaly detection. In this work, we propose MINES which infers explainable API invariants for anomaly detection from the schema level instead of detailed raw log instances, which can (1) significantly discriminate noise in logs to identify precise normalities and (2) detect abnormal behaviors beyond the instrumented logs. Technically, MINES (1) converts API signatures into table schema to enhance the original database shema; and (2) infers the potential database constraints on the enhanced database schema to capture the potential relationships between APIs and database tables. MINES uses LLM for extracting potential relationship based on two given table structures; and use normal log instances to reject and accept LLM-generated invariants. Finally, MINES translates the inferred constraints into invariants to generate Python code for verifying the runtime logs. We extensively evaluate MINES on web-tamper attacks on the benchmarks of TrainTicket, NiceFish, Gitea, Mastodon, and NextCloud against baselines such as LogRobust, LogFormer, and WebNorm. The results show that MINES achieves high recall for the anomalies while introducing almost zero false positives, indicating a new state-of-the-art.

Paper Structure

This paper contains 24 sections, 9 figures, 12 tables.

Figures (9)

  • Figure 1: A log anomaly example caused by a real web attack on Train-Ticket system trainticketsystem, which can successfully refund an order twice. The attack logs are similar to the normal logs, which makes log classification/regression models such as LogRobust zhang2019robust and LogFormer guo2024logformer ineffective. In addition, rule-learning based solution such as WebNorm liao2024detecting summarize a superficial rule from normal logs, which fails to detect such anomaly.
  • Figure 2: Approach Overview: Given a web application, MINES parses API signatures and database schema into an augmented ER (entity-relation) diagram. By inferring the reference constraints and customized constraints over the generated diagram by LLM, we infer the invariants as Python code for runtime verification. In addition, to avoid hallucination of LLM, we use normal logs to refine the generated constraints.
  • Figure 3: Example of Converting an API Signature to an API Entity Type.
  • Figure 4: Example of Environmental Information.
  • Figure 5: Prompt used for relationship inference. The green boxes represent the input information, and the pink box represents the output of the LLM.
  • ...and 4 more figures