Table of Contents
Fetching ...

ALPACA -- Adaptive Learning Pipeline for Comprehensive AI

Simon Torka, Sahin Albayrak

TL;DR

ALPACA tackles the challenge of building an accessible, domain-agnostic AI pipeline that spans data collection, preprocessing, and model-driven analytics while serving diverse user groups. It proposes a modular web-based platform, orchestrated by Celery on Kubernetes and backed by MongoDB, Pandas, and GPU-capable preprocessing, with a Django/Plotly Dash frontend for interactive visualization. The paper details a three-stage pipeline—data collection, data preprocessing, and AI model pipeline—and demonstrates an Android APK similarity-detection use case, highlighting data provenance, reproducibility, and crowdsourcing capabilities. By combining reflections-based extensibility, LLM-enabled workflows, and scalable cloud deployment, ALPACA aims to democratize advanced AI analytics and lays groundwork for federated learning, continuous learning, and explainable AI in future work.

Abstract

The advancement of AI technologies has greatly increased the complexity of AI pipelines as they include many stages such as data collection, pre-processing, training, evaluation and visualisation. To provide effective and accessible AI solutions, it is important to design pipelines for different user groups such as experts, professionals from different fields and laypeople. Ease of use and trust play a central role in the acceptance of AI systems. The presented system, ALPACA (Adaptive Learning Pipeline for Advanced Comprehensive AI Analysis), offers a comprehensive AI pipeline that addresses the needs of diverse user groups. ALPACA integrates visual and code-based development and facilitates all key phases of the AI pipeline. Its architecture is based on Celery (with Redis backend) for efficient task management, MongoDB for seamless data storage and Kubernetes for cloud-based scalability and resource utilisation. Future versions of ALPACA will support modern techniques such as federated and continuous learning as well as explainable AI methods to further improve security, usability and trustworthiness. The application is demonstrated by an Android app for similarity recognition, which emphasises ALPACA's potential for use in everyday life.

ALPACA -- Adaptive Learning Pipeline for Comprehensive AI

TL;DR

ALPACA tackles the challenge of building an accessible, domain-agnostic AI pipeline that spans data collection, preprocessing, and model-driven analytics while serving diverse user groups. It proposes a modular web-based platform, orchestrated by Celery on Kubernetes and backed by MongoDB, Pandas, and GPU-capable preprocessing, with a Django/Plotly Dash frontend for interactive visualization. The paper details a three-stage pipeline—data collection, data preprocessing, and AI model pipeline—and demonstrates an Android APK similarity-detection use case, highlighting data provenance, reproducibility, and crowdsourcing capabilities. By combining reflections-based extensibility, LLM-enabled workflows, and scalable cloud deployment, ALPACA aims to democratize advanced AI analytics and lays groundwork for federated learning, continuous learning, and explainable AI in future work.

Abstract

The advancement of AI technologies has greatly increased the complexity of AI pipelines as they include many stages such as data collection, pre-processing, training, evaluation and visualisation. To provide effective and accessible AI solutions, it is important to design pipelines for different user groups such as experts, professionals from different fields and laypeople. Ease of use and trust play a central role in the acceptance of AI systems. The presented system, ALPACA (Adaptive Learning Pipeline for Advanced Comprehensive AI Analysis), offers a comprehensive AI pipeline that addresses the needs of diverse user groups. ALPACA integrates visual and code-based development and facilitates all key phases of the AI pipeline. Its architecture is based on Celery (with Redis backend) for efficient task management, MongoDB for seamless data storage and Kubernetes for cloud-based scalability and resource utilisation. Future versions of ALPACA will support modern techniques such as federated and continuous learning as well as explainable AI methods to further improve security, usability and trustworthiness. The application is demonstrated by an Android app for similarity recognition, which emphasises ALPACA's potential for use in everyday life.

Paper Structure

This paper contains 25 sections, 11 figures.

Figures (11)

  • Figure 1: Alpaca's Dataflow.
  • Figure 2: Alpaca's Container Architecture.
  • Figure 3: Voting System.
  • Figure 4: User Interface to Manage the Dataset Selection Process.
  • Figure 5: User Interface to Manage the Dataset Merger Process.
  • ...and 6 more figures