Scorch: A Library for Sparse Deep Learning
Bobby Yan, Alexander J. Root, Trevor Gale, David Broman, Fredrik Kjolstad
TL;DR
Scorch targets the inefficiency of dense computation for large-scale models by providing a unified sparse tensor framework within PyTorch, focused initially on CPU inference. It combines a compiler stack that auto-schedules loops, tiles computations, and infers output formats with a runtime that dispatches optimized kernels, achieving end-to-end speedups over PyTorch Sparse across diverse domains. The key contributions include a seamless PyTorch integration, a general sparse abstraction with format inference, and an auto-scheduler capable of generating efficient sparse kernels without extensive manual tuning. This work lowers the barrier to exploring sparsity in deep learning and lays groundwork for broader sparse computing within mainstream frameworks, with future paths toward AD and GPU backends.
Abstract
The rapid growth in the size of deep learning models strains the capabilities of traditional dense computation paradigms. Leveraging sparse computation has become increasingly popular for training and deploying large-scale models, but existing deep learning frameworks lack extensive support for sparse operations. To bridge this gap, we introduce Scorch, a library that seamlessly integrates efficient sparse tensor computation into the PyTorch ecosystem, with an initial focus on inference workloads on CPUs. Scorch provides a flexible and intuitive interface for sparse tensors, supporting diverse sparse data structures. Scorch introduces a compiler stack that automates key optimizations, including automatic loop ordering, tiling, and format inference. Combined with a runtime that adapts its execution to both dense and sparse data, Scorch delivers substantial speedups over hand-written PyTorch Sparse (torch.sparse) operations without sacrificing usability. More importantly, Scorch enables efficient computation of complex sparse operations that lack hand-optimized PyTorch implementations. This flexibility is crucial for exploring novel sparse architectures. We demonstrate Scorch's ease of use and performance gains on diverse deep learning models across multiple domains. With only minimal code changes, Scorch achieves 1.05-5.78x speedups over PyTorch Sparse on end-to-end tasks. Scorch's seamless integration and performance gains make it a valuable addition to the PyTorch ecosystem. We believe Scorch will enable wider exploration of sparsity as a tool for scaling deep learning and inform the development of other sparse libraries.
