Table of Contents
Fetching ...

SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations

Md Saidul Hoque Anik, Ariful Azad

TL;DR

This work tackles the slow training of translation-based knowledge graph embeddings by identifying embedding gradient computation as a major bottleneck. It introduces SparseTransX, a sparse-matrix framework that replaces dense embedding gathering with high-performance $SpMM$ operations, unifying forward and backward passes through a sparse incidence matrix $A$ and enabling efficient training of models such as $TransE$, $TransR$, $TransH$, and $TorusE$. The approach yields substantial speedups on both CPU (up to 5.3×) and GPU (up to 4.2×) while reducing GPU memory usage, with accuracy remaining on par with established frameworks across seven datasets. The system includes a PyTorch-based library, scalable data loading, streaming embeddings, and configurable sparse backends, offering a path to large-batch training and broader applicability to other KGEs. Overall, SparseTransX demonstrates that leveraging sparse linear algebra for KG embedding training can significantly improve performance and scalability without sacrificing predictive quality.

Abstract

Knowledge graph (KG) learning offers a powerful framework for generating new knowledge and making inferences. Training KG embedding can take a significantly long time, especially for larger datasets. Our analysis shows that the gradient computation of embedding is one of the dominant functions in the translation-based KG embedding training loop. We address this issue by replacing the core embedding computation with SpMM (Sparse-Dense Matrix Multiplication) kernels. This allows us to unify multiple scatter (and gather) operations as a single operation, reducing training time and memory usage. We create a general framework for training KG models using sparse kernels and implement four models, namely TransE, TransR, TransH, and TorusE. Our sparse implementations exhibit up to 5.3x speedup on the CPU and up to 4.2x speedup on the GPU with a significantly low GPU memory footprint. The speedups are consistent across large and small datasets for a given model. Our proposed sparse approach can be extended to accelerate other translation-based (such as TransC, TransM, etc.) and non-translational (such as DistMult, ComplEx, RotatE, etc.) models as well. An implementation of the SpTransX framework is publicly available as a Python package in https://github.com/HipGraph/SpTransX.

SparseTransX: Efficient Training of Translation-Based Knowledge Graph Embeddings Using Sparse Matrix Operations

TL;DR

This work tackles the slow training of translation-based knowledge graph embeddings by identifying embedding gradient computation as a major bottleneck. It introduces SparseTransX, a sparse-matrix framework that replaces dense embedding gathering with high-performance operations, unifying forward and backward passes through a sparse incidence matrix and enabling efficient training of models such as , , , and . The approach yields substantial speedups on both CPU (up to 5.3×) and GPU (up to 4.2×) while reducing GPU memory usage, with accuracy remaining on par with established frameworks across seven datasets. The system includes a PyTorch-based library, scalable data loading, streaming embeddings, and configurable sparse backends, offering a path to large-batch training and broader applicability to other KGEs. Overall, SparseTransX demonstrates that leveraging sparse linear algebra for KG embedding training can significantly improve performance and scalability without sacrificing predictive quality.

Abstract

Knowledge graph (KG) learning offers a powerful framework for generating new knowledge and making inferences. Training KG embedding can take a significantly long time, especially for larger datasets. Our analysis shows that the gradient computation of embedding is one of the dominant functions in the translation-based KG embedding training loop. We address this issue by replacing the core embedding computation with SpMM (Sparse-Dense Matrix Multiplication) kernels. This allows us to unify multiple scatter (and gather) operations as a single operation, reducing training time and memory usage. We create a general framework for training KG models using sparse kernels and implement four models, namely TransE, TransR, TransH, and TorusE. Our sparse implementations exhibit up to 5.3x speedup on the CPU and up to 4.2x speedup on the GPU with a significantly low GPU memory footprint. The speedups are consistent across large and small datasets for a given model. Our proposed sparse approach can be extended to accelerate other translation-based (such as TransC, TransM, etc.) and non-translational (such as DistMult, ComplEx, RotatE, etc.) models as well. An implementation of the SpTransX framework is publicly available as a Python package in https://github.com/HipGraph/SpTransX.

Paper Structure

This paper contains 52 sections, 13 equations, 9 figures, 9 tables.

Figures (9)

  • Figure 1: Scatter and Gather operation in translational KG training
  • Figure 2: Top three CPU intensive functions for various translation-based KGE models and datasets (indicated in brackets). The redness represents the popularity of a function among models. The dark red box indicates that the corresponding function is used in several different models. Blue/Purple indicates that the function is typically exclusive to the current model. The dark gray box indicates the dataset loading time. The light gray box indicates the rest of the training time.
  • Figure 3: Computing common expressions using SpMM. Only the highlighted row is populated for demonstration.
  • Figure 4: SparseTransX Framework
  • Figure 5: Hits@10 accuracy w.r.t. embedding size for FB15K dataset. 100 epoch training with a batch size of 32768 and relation entity dimension as 8 (for TransH model). The TransH model encounters out-of-memory issues when the embedding size exceeds 256. Other models converge at an embedding size of approximately 2048 and show no improvement in Hits@10 accuracy for larger embeddings.
  • ...and 4 more figures