Table of Contents
Fetching ...

Ilargi: a GPU Compatible Factorized ML Model Training Framework

Wenbo Sun, Rihan Hai

TL;DR

Ilargi tackles the data materialization bottleneck in ML training over dispersed sources by enabling GPU-compatible factorized learning through a matrix-based DI metadata representation. It unifies data integration and ML tasks as linear algebra operations and automatically rewrites training workflows into factorized forms. A tree-boosting cost estimator selects between factorization and materialization using data characteristics, algorithmic complexity, and hardware features, achieving significant speedups on GPUs and CPUs. The work demonstrates practical improvements in multi-source ML training and positions GPU-accelerated factorized learning as a viable approach in heterogeneous hardware environments.

Abstract

The machine learning (ML) training over disparate data sources traditionally involves materialization, which can impose substantial time and space overhead due to data movement and replication. Factorized learning, which leverages direct computation on disparate sources through linear algebra (LA) rewriting, has emerged as a viable alternative to improve computational efficiency. However, the adaptation of factorized learning to leverage the full capabilities of modern LA-friendly hardware like GPUs has been limited, often requiring manual intervention for algorithm compatibility. This paper introduces Ilargi, a novel factorized learning framework that utilizes matrix-represented data integration (DI) metadata to facilitate automatic factorization across CPU and GPU environments without the need for costly relational joins. Ilargi incorporates an ML-based cost estimator to intelligently selects between factorization and materialization based on data properties, algorithm complexity, hardware environments, and their interactions. This strategy ensures up to 8.9x speedups on GPUs and achieves over 20% acceleration in batch ML training workloads, thereby enhancing the practicability of ML training across diverse data integration scenarios and hardware platforms. To our knowledge, this work is the very first effort in GPU-compatible factorized learning.

Ilargi: a GPU Compatible Factorized ML Model Training Framework

TL;DR

Ilargi tackles the data materialization bottleneck in ML training over dispersed sources by enabling GPU-compatible factorized learning through a matrix-based DI metadata representation. It unifies data integration and ML tasks as linear algebra operations and automatically rewrites training workflows into factorized forms. A tree-boosting cost estimator selects between factorization and materialization using data characteristics, algorithmic complexity, and hardware features, achieving significant speedups on GPUs and CPUs. The work demonstrates practical improvements in multi-source ML training and positions GPU-accelerated factorized learning as a viable approach in heterogeneous hardware environments.

Abstract

The machine learning (ML) training over disparate data sources traditionally involves materialization, which can impose substantial time and space overhead due to data movement and replication. Factorized learning, which leverages direct computation on disparate sources through linear algebra (LA) rewriting, has emerged as a viable alternative to improve computational efficiency. However, the adaptation of factorized learning to leverage the full capabilities of modern LA-friendly hardware like GPUs has been limited, often requiring manual intervention for algorithm compatibility. This paper introduces Ilargi, a novel factorized learning framework that utilizes matrix-represented data integration (DI) metadata to facilitate automatic factorization across CPU and GPU environments without the need for costly relational joins. Ilargi incorporates an ML-based cost estimator to intelligently selects between factorization and materialization based on data properties, algorithm complexity, hardware environments, and their interactions. This strategy ensures up to 8.9x speedups on GPUs and achieves over 20% acceleration in batch ML training workloads, thereby enhancing the practicability of ML training across diverse data integration scenarios and hardware platforms. To our knowledge, this work is the very first effort in GPU-compatible factorized learning.

Paper Structure

This paper contains 18 sections, 6 equations, 4 figures, 9 tables, 1 algorithm.

Figures (4)

  • Figure 1: Ilargi's workflow.
  • Figure 2: Workflow of the estimator.
  • Figure 3: Speedups ($\frac{\text{Time}_{materialization}}{\text{Time}_{factorization}}$) of LA operators and model training w.r.t varying input parameters. Here we focus on the cases that factorization performs faster than materialization.
  • Figure 4: Speedups ($\frac{\text{Time}_{materialization}}{\text{Time}_{factorization}}$) w.r.t target table sparsity and complexity ratio on CPUs and GPUs. On CPUs, speedups increase when complexity ratio gets larger. No observable trend on GPUs.

Theorems & Definitions (2)

  • definition thmcounterdefinition: Mapping matrix
  • definition thmcounterdefinition: Indicator matrix chen2017towards