Table of Contents
Fetching ...

Evolution of A4L: A Data Architecture for AI-Augmented Learning

Ploy Thajchayapong, Suzanne Carbonaro, Tim Couper, Blaine Helmick, Spencer Rugaber, Ashok Goel

TL;DR

The paper tackles fragmented learner data across SIS, LMS, and AI tools hindering personalized learning at scale. It presents A4L2.0, a modular, open-standards data pipeline (Edu-API, Caliper Analytics, LTI) with Data Engine 2.0, analytics and visualization layers, and a human-AI teaming framework. The contributions include design guidelines, asynchronous ingestion, privacy-preserving preprocessing, an HCS-based analytics engine driven by JSON payloads, and architecturally integrated dashboards with LLM-assisted insights. The work demonstrates near real-time meso- and micro-learning analytics to support equitable, data-driven decisions for teachers, learners, and researchers, and outlines future directions toward distributed computing and federated learning.

Abstract

As artificial intelligence (AI) becomes more deeply integrated into educational ecosystems, the demand for scalable solutions that enable personalized learning continues to grow. These architectures must support continuous data flows that power personalized learning and access to meaningful insights to advance learner success at scale. At the National AI Institute for Adult Learning and Online Education (AI-ALOE), we have developed an Architecture for AI-Augmented Learning (A4L) to support analysis and personalization of online education for adult learners. A4L1.0, an early implementation by Georgia Tech's Design Intelligence Laboratory, demonstrated how the architecture supports analysis of meso- and micro-learning by integrating data from Learning Management Systems (LMS) and AI tools. These pilot studies informed the design of A4L2.0. In this chapter, we describe A4L2.0 that leverages 1EdTech Consortium's open standards such as Edu-API, Caliper Analytics, and Learning Tools Interoperability (LTI) to enable secure, interoperable data integration across data systems like Student Information Systems (SIS), LMS, and AI tools. The A4L2.0 data pipeline includes modules for data ingestion, preprocessing, organization, analytics, and visualization.

Evolution of A4L: A Data Architecture for AI-Augmented Learning

TL;DR

The paper tackles fragmented learner data across SIS, LMS, and AI tools hindering personalized learning at scale. It presents A4L2.0, a modular, open-standards data pipeline (Edu-API, Caliper Analytics, LTI) with Data Engine 2.0, analytics and visualization layers, and a human-AI teaming framework. The contributions include design guidelines, asynchronous ingestion, privacy-preserving preprocessing, an HCS-based analytics engine driven by JSON payloads, and architecturally integrated dashboards with LLM-assisted insights. The work demonstrates near real-time meso- and micro-learning analytics to support equitable, data-driven decisions for teachers, learners, and researchers, and outlines future directions toward distributed computing and federated learning.

Abstract

As artificial intelligence (AI) becomes more deeply integrated into educational ecosystems, the demand for scalable solutions that enable personalized learning continues to grow. These architectures must support continuous data flows that power personalized learning and access to meaningful insights to advance learner success at scale. At the National AI Institute for Adult Learning and Online Education (AI-ALOE), we have developed an Architecture for AI-Augmented Learning (A4L) to support analysis and personalization of online education for adult learners. A4L1.0, an early implementation by Georgia Tech's Design Intelligence Laboratory, demonstrated how the architecture supports analysis of meso- and micro-learning by integrating data from Learning Management Systems (LMS) and AI tools. These pilot studies informed the design of A4L2.0. In this chapter, we describe A4L2.0 that leverages 1EdTech Consortium's open standards such as Edu-API, Caliper Analytics, and Learning Tools Interoperability (LTI) to enable secure, interoperable data integration across data systems like Student Information Systems (SIS), LMS, and AI tools. The A4L2.0 data pipeline includes modules for data ingestion, preprocessing, organization, analytics, and visualization.

Paper Structure

This paper contains 29 sections, 8 figures.

Figures (8)

  • Figure 1: Conceptual architecture of the A4L (Architecture for AI-Augmented Learning) data infrastructure, illustrating the full end-to-end pipeline from data ingestion to visualization. The diagram consists of three main components - Data Engine 2.0 (left section of the figure): This section outlines the architecture’s data integration and processing backend. Educational data from Student Information Systems (SIS), Learning Management Systems (LMS), and AI tools (via LTI®) are routed through a centralized API Gateway and validated against JSON schemas. These validated events are sent to system-specific endpoints, where time-based or event-driven triggers enqueue them for processing. The pipeline performs various preprocessing tasks. Analytics Pipeline (top-right section): This component executes scheduled meso- and micro-learning analyses. A time-based scheduler triggers an AWS Step Functions workflow that orchestrates containerized Lambda functions to extract data from the warehouse, transform it, and apply statistical procedures. The configuration is controlled via a declarative JSON payload, enabling modular and reproducible workflows for learning analytics and research. Visualization Pipeline (bottom-right section): The visualization layer delivers insights to end users—teachers, researchers, and learners—through interactive dashboards hosted using JavaScript (React). Analysis outputs are routed to specific applications such as Jill Watson, SAMI, or VERA dashboards, each tailored to user roles and needs. This layer supports near real-time monitoring, AI-generated insights, and secure, role-specific data access.
  • Figure 2: A high-level diagram of the data analytics pipeline. This diagram illustrates the core workflow of A4L's modular and configurable data analytics pipeline. The process begins with a time-based Scheduled Job, which triggers the pipeline by passing a JSON-based payload—a structured configuration file that defines parameters for the analysis. The Data Fetch Module uses these parameters to generate SQL queries to the Data Warehouse and initiates a data extraction process. The extracted data is unloaded into Temporary File Storage in a columnar format (e.g., Parquet) for efficient downstream processing. Once available, the Analysis Module retrieves the files and performs cleaning, transformation, and statistical analysis operations as dictated by the same payload. The analysis results—typically in tabular form—are stored in Results Storage, ready to be used by visualization dashboards or exported for reporting and further research. This modular pipeline supports reproducible, scalable, and configurable analytics within A4L's event-driven infrastructure
  • Figure 3: A sequence diagram of the analysis module. This figure illustrates the internal execution flow of the A4L analysis module, which is triggered by a configuration payload passed from the main application. The module begins with the MesoAnalysis script, which initiates the data wrangling phase by reading raw datasets from the data warehouse, cleaning them, and applying initial transformations needed for analysis. The cleaning stage invokes preprocessing functions, while transformation logic is handled by transform functions, both of which are modularized for reuse and adaptability. After data preparation, the pipeline proceeds to the analysis phase. The system calls specified statistical methods based on the payload parameters, which in turn reference external statistical functions defined in the Stats module. These functions perform operations such as aggregations, group comparisons, or regression modeling. Results from the statistical analysis may involve merging multiple datasets before finalization. The pipeline concludes by uploading processed results to a designated output location for downstream consumption in dashboards or reporting tools.
  • Figure 4: The payload’s data model. This figure illustrates the hierarchical structure of the payload configuration used to drive customizable analyses in the A4L analytics pipeline. The payload is composed of two primary inputs: the Analysis Config and the Datasets Config. The Analysis Config defines high-level analysis parameters, such as the analysis label and a list of statistical procedures to be executed. Each Stat Procedures block specifies a statistical method (e.g., regression, ANOVA) along with the variables of interest, including independent variables, dependent variables, and optional grouping logic defined in merge_groups. It also contains metadata on how datasets should be merged using shared keys or full-merge logic. The Datasets Config outlines the data preparation workflow. The Dataset block specifies the data source (e.g., file location, SQL selection), the columns to include, and instructions for sanitize_actions and transform_actions. The Sanitize module includes row and column-level operations such as renaming, filtering, removing missing values, text cleaning, and data type conversions, some of which are further defined in row_actions and filter blocks. The Transform module supports additional post-sanitization transformations with parameterized actions.
  • Figure 5: Overview of the visualization pipelines. This figure presents the high-level architecture designed to visualize insights derived from student interactions with AI tools. All dashboards support LLM-powered responses through integration with GPT-4o and follow a similar structure: a client initiates data queries, which are processed and returned for visualization. The Jill Watson pipeline (See Figure 6 below) features a fully automated data flow from source to dashboard, enabling near real-time updates. In contrast, dashboards from other AI tools (e.g., SAMI and VERA) currently rely on manually inserted data for querying and display. Despite differences in data ingestion workflows, all three dashboards utilize a shared architecture for insight generation and visualization, supporting personalized feedback and instructional decision-making across distinct AI learning tools
  • ...and 3 more figures