Table of Contents
Fetching ...

CapyMOA: Efficient Machine Learning for Data Streams in Python

Heitor Murilo Gomes, Anton Lee, Nuwan Gunasekara, Yibin Sun, Guilherme Weigert Cassales, Justin Liu, Marco Heyden, Vitor Cerqueira, Maroua Bahri, Yun Sing Koh, Bernhard Pfahringer, Albert Bifet

TL;DR

This work presents CapyMOA, an open-source framework for efficient machine learning on data streams that bridges Python usability with Java-backed efficiency. It introduces a structured data representation, flexible pipelines, drift-handling mechanisms, and robust evaluation methods to address the challenges of concept drift and real-time processing. The paper highlights interoperability with MOA and PyTorch for hybrid online and deep learning approaches, and provides benchmarking against existing frameworks to demonstrate competitive performance. Overall, CapyMOA aims to accelerate research and practice in dynamic, streaming environments by delivering efficiency, interoperability, and accessibility at scale.

Abstract

CapyMOA is an open-source library designed for efficient machine learning on streaming data. It provides a structured framework for real-time learning and evaluation, featuring a flexible data representation. CapyMOA includes an extensible architecture that allows integration with external frameworks such as MOA and PyTorch, facilitating hybrid learning approaches that combine traditional online algorithms with deep learning techniques. By emphasizing adaptability, scalability, and usability, CapyMOA allows researchers and practitioners to tackle dynamic learning challenges across various domains.

CapyMOA: Efficient Machine Learning for Data Streams in Python

TL;DR

This work presents CapyMOA, an open-source framework for efficient machine learning on data streams that bridges Python usability with Java-backed efficiency. It introduces a structured data representation, flexible pipelines, drift-handling mechanisms, and robust evaluation methods to address the challenges of concept drift and real-time processing. The paper highlights interoperability with MOA and PyTorch for hybrid online and deep learning approaches, and provides benchmarking against existing frameworks to demonstrate competitive performance. Overall, CapyMOA aims to accelerate research and practice in dynamic, streaming environments by delivering efficiency, interoperability, and accessibility at scale.

Abstract

CapyMOA is an open-source library designed for efficient machine learning on streaming data. It provides a structured framework for real-time learning and evaluation, featuring a flexible data representation. CapyMOA includes an extensible architecture that allows integration with external frameworks such as MOA and PyTorch, facilitating hybrid learning approaches that combine traditional online algorithms with deep learning techniques. By emphasizing adaptability, scalability, and usability, CapyMOA allows researchers and practitioners to tackle dynamic learning challenges across various domains.

Paper Structure

This paper contains 9 sections, 2 tables.