Table of Contents
Fetching ...

Threadle: A Memory-Efficient Network Storage and Query Engine for Large, Multilayer, and Mixed-mode Networks

Carl Nordlund, Yukun Jiao

TL;DR

It is demonstrated that a network with 20 million nodes containing layers equivalent to 8 trillion projected edges can be stored in approximately 20 GB of RAM -- a compression ratio exceeding 2000:1 compared to materialized projection.

Abstract

We present Threadle, an open-source, high-performance, and memory-efficient network storage and query engine written in C#. Designed for working with full-population networks derived from administrative register data, which represent very large, multilayer, mixed-mode networks with millions of nodes and billions of edges, Threadle addresses a fundamental limitation of existing network libraries: the inability to efficiently handle two-mode (bipartite) data at scale. Threadle's core innovation is a pseudo-projection approach that allows two-mode layers to be queried as if they were projected into one-mode form, without ever materializing the memory-prohibitive projection. We demonstrate that a network with 20 million nodes containing layers equivalent to 8 trillion projected edges can be stored in approximately 20 GB of RAM -- a compression ratio exceeding 2000:1 compared to materialized projection. Additionally, Threadle provides native support for multilayer mixed-mode networks, an integrated node attribute manager, and a CLI frontend with 50+ commands for the construction, processing, file handling, and management of very large heterogeneous networks. Threadle is freely available at https://www.threadle.dev and can either be obtained as precompiled binaries for Win, macOS and Linux, or compiled directly from source. Supplementing Threadle is threadleR, an R frontend that enables advanced sampling- and traversal-based analyses on very large, heterogeneous, multilayer, mixed-mode population-scale networks.

Threadle: A Memory-Efficient Network Storage and Query Engine for Large, Multilayer, and Mixed-mode Networks

TL;DR

It is demonstrated that a network with 20 million nodes containing layers equivalent to 8 trillion projected edges can be stored in approximately 20 GB of RAM -- a compression ratio exceeding 2000:1 compared to materialized projection.

Abstract

We present Threadle, an open-source, high-performance, and memory-efficient network storage and query engine written in C#. Designed for working with full-population networks derived from administrative register data, which represent very large, multilayer, mixed-mode networks with millions of nodes and billions of edges, Threadle addresses a fundamental limitation of existing network libraries: the inability to efficiently handle two-mode (bipartite) data at scale. Threadle's core innovation is a pseudo-projection approach that allows two-mode layers to be queried as if they were projected into one-mode form, without ever materializing the memory-prohibitive projection. We demonstrate that a network with 20 million nodes containing layers equivalent to 8 trillion projected edges can be stored in approximately 20 GB of RAM -- a compression ratio exceeding 2000:1 compared to materialized projection. Additionally, Threadle provides native support for multilayer mixed-mode networks, an integrated node attribute manager, and a CLI frontend with 50+ commands for the construction, processing, file handling, and management of very large heterogeneous networks. Threadle is freely available at https://www.threadle.dev and can either be obtained as precompiled binaries for Win, macOS and Linux, or compiled directly from source. Supplementing Threadle is threadleR, an R frontend that enables advanced sampling- and traversal-based analyses on very large, heterogeneous, multilayer, mixed-mode population-scale networks.
Paper Structure (14 sections, 1 equation, 1 figure, 1 table)

This paper contains 14 sections, 1 equation, 1 figure, 1 table.

Figures (1)

  • Figure 1: Threadle system architecture. Threadle.Core implements all data structures and methods as a .NET 8.0 library. Threadle.CLIconsole exposes this functionality through a scripting language with text and JSON modes. The threadleR package provides seamless R integration via JSON mode, enabling researchers to combine Threadle's efficient storage with R's statistical capabilities. Threadle.Core can also be embedded directly as a project reference in existing C# solutions.