Table of Contents
Fetching ...

PySAD: A Streaming Anomaly Detection Framework in Python

Selim F. Yilmaz, Suleyman S. Kozat

TL;DR

This work addresses streaming anomaly detection under strict constraints of single-pass, bounded memory, and constant-time per instance, enabling real-time analytics on evolving data. It introduces PySAD, a modular Python framework that unifies 17+ algorithms including LODA, Half-Space Trees, and xStream within a BaseModel-driven pipeline that also includes preprocessors, projectors, ensemblers, and calibrators. A core contribution is unsupervised probability calibration, including conformal prediction, to convert anomaly scores into interpretable probabilities, along with stream simulators and evaluation tools. The framework supports univariate and multivariate streams across supervised, semi-supervised, and unsupervised settings, and is designed for production deployment with memory safety, thread-safety, and sub-millisecond throughput. The model operates on a potentially infinite stream $ abla$ as a placeholder for the formal notation to ensure proper mathematical framing: $\mathcal{D} = \{ (\mathbf{x}_t, y_t) \mid t=1,2,\ldots \}$ with $\mathbf{x}_t \in \mathbb{R}^m$, generating per-instance scores.

Abstract

Streaming anomaly detection requires algorithms that operate under strict constraints: bounded memory, single-pass processing, and constant-time complexity. We present PySAD, a comprehensive Python framework addressing these challenges through a unified architecture. The framework implements 17+ streaming algorithms (LODA, Half-Space Trees, xStream) with specialized components including projectors, probability calibrators, and postprocessors. Unlike existing batch-focused frameworks, PySAD enables efficient real-time processing with bounded memory while maintaining compatibility with PyOD and scikit-learn. Supporting all learning paradigms for univariate and multivariate streams, PySAD provides the most comprehensive streaming anomaly detection toolkit in Python. The source code is publicly available at github.com/selimfirat/pysad.

PySAD: A Streaming Anomaly Detection Framework in Python

TL;DR

This work addresses streaming anomaly detection under strict constraints of single-pass, bounded memory, and constant-time per instance, enabling real-time analytics on evolving data. It introduces PySAD, a modular Python framework that unifies 17+ algorithms including LODA, Half-Space Trees, and xStream within a BaseModel-driven pipeline that also includes preprocessors, projectors, ensemblers, and calibrators. A core contribution is unsupervised probability calibration, including conformal prediction, to convert anomaly scores into interpretable probabilities, along with stream simulators and evaluation tools. The framework supports univariate and multivariate streams across supervised, semi-supervised, and unsupervised settings, and is designed for production deployment with memory safety, thread-safety, and sub-millisecond throughput. The model operates on a potentially infinite stream as a placeholder for the formal notation to ensure proper mathematical framing: with , generating per-instance scores.

Abstract

Streaming anomaly detection requires algorithms that operate under strict constraints: bounded memory, single-pass processing, and constant-time complexity. We present PySAD, a comprehensive Python framework addressing these challenges through a unified architecture. The framework implements 17+ streaming algorithms (LODA, Half-Space Trees, xStream) with specialized components including projectors, probability calibrators, and postprocessors. Unlike existing batch-focused frameworks, PySAD enables efficient real-time processing with bounded memory while maintaining compatibility with PyOD and scikit-learn. Supporting all learning paradigms for univariate and multivariate streams, PySAD provides the most comprehensive streaming anomaly detection toolkit in Python. The source code is publicly available at github.com/selimfirat/pysad.

Paper Structure

This paper contains 7 sections, 1 equation, 1 figure, 1 table.

Figures (1)

  • Figure 1: The usage of components in PySAD as a pipeline.