Table of Contents
Fetching ...

Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling

Talia Tseriotou, Ryan Sze-Yin Chan, Adam Tsakalidis, Iman Munire Bilal, Elena Kochkina, Terry Lyons, Maria Liakata

TL;DR

Sig-Networks introduces a pioneering toolkit for longitudinal NLP by leveraging path signatures and log-signatures to compress and aggregate sequential text data. It provides a complete, pip-installable pipeline (nlpsig preprocessing plus sig-networks PyTorch models) with flexible time-feature integration and hyperparameter tuning, achieving state-of-the-art performance on three temporally granular tasks. The approach combines Signature Window Network Units, attention-based variants, and Seq-Sig-Net to capture both short- and long-range temporal dependencies, with depth $N=3$ used for log-signatures, and demonstrates robustness across tasks ranging from seconds to hours in temporal granularity. This toolkit offers a practical, extensible framework for researchers and developers to plug in data and extend longitudinal NLP capabilities in real-world applications.

Abstract

We present an open-source, pip installable toolkit, Sig-Networks, the first of its kind for longitudinal language modelling. A central focus is the incorporation of Signature-based Neural Network models, which have recently shown success in temporal tasks. We apply and extend published research providing a full suite of signature-based models. Their components can be used as PyTorch building blocks in future architectures. Sig-Networks enables task-agnostic dataset plug-in, seamless pre-processing for sequential data, parameter flexibility, automated tuning across a range of models. We examine signature networks under three different NLP tasks of varying temporal granularity: counselling conversations, rumour stance switch and mood changes in social media threads, showing SOTA performance in all three, and provide guidance for future tasks. We release the Toolkit as a PyTorch package with an introductory video, Git repositories for preprocessing and modelling including sample notebooks on the modeled NLP tasks.

Sig-Networks Toolkit: Signature Networks for Longitudinal Language Modelling

TL;DR

Sig-Networks introduces a pioneering toolkit for longitudinal NLP by leveraging path signatures and log-signatures to compress and aggregate sequential text data. It provides a complete, pip-installable pipeline (nlpsig preprocessing plus sig-networks PyTorch models) with flexible time-feature integration and hyperparameter tuning, achieving state-of-the-art performance on three temporally granular tasks. The approach combines Signature Window Network Units, attention-based variants, and Seq-Sig-Net to capture both short- and long-range temporal dependencies, with depth used for log-signatures, and demonstrates robustness across tasks ranging from seconds to hours in temporal granularity. This toolkit offers a practical, extensible framework for researchers and developers to plug in data and extend longitudinal NLP capabilities in real-world applications.

Abstract

We present an open-source, pip installable toolkit, Sig-Networks, the first of its kind for longitudinal language modelling. A central focus is the incorporation of Signature-based Neural Network models, which have recently shown success in temporal tasks. We apply and extend published research providing a full suite of signature-based models. Their components can be used as PyTorch building blocks in future architectures. Sig-Networks enables task-agnostic dataset plug-in, seamless pre-processing for sequential data, parameter flexibility, automated tuning across a range of models. We examine signature networks under three different NLP tasks of varying temporal granularity: counselling conversations, rumour stance switch and mood changes in social media threads, showing SOTA performance in all three, and provide guidance for future tasks. We release the Toolkit as a PyTorch package with an introductory video, Git repositories for preprocessing and modelling including sample notebooks on the modeled NLP tasks.
Paper Structure (25 sections, 3 figures, 5 tables)

This paper contains 25 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Sig-Networks Tooklit Overview. nlpsig library (left side) obtains the input text, label and stream id per data point. The package allows for embedding extraction (i.e. SBERT) and its dimensionality reduction, with optional non-linguistic-feature processing and concatenation. For each data point a stream/window (padded if necessary) is formed including its ordered history. These are shifted and stacked for unit-based models. Data splitting with k-fold option is performed. sig-networks library (right side) enables PyTorch implementation for all Sig-Networks family and baseline models with user-specified training and hyper parameter inputs.
  • Figure 2: Signature Window Unit and its variations.
  • Figure 3: Seq-Sig-Net and its variations using SWNU (yellow, see Fig. \ref{['fig:swunit']}) on a sample length of 11 points.