Table of Contents
Fetching ...

GG-SSMs: Graph-Generating State Space Models

Nikola Zubić, Davide Scaramuzza

TL;DR

GG-SSMs address the fundamental limit of fixed 1D processing in traditional SSMs by dynamically constructing data-driven graphs and propagating state along a sparse MST backbone. The approach leverages Chazelle's MST to obtain near-linear-time graph construction, enabling robust long-range feature propagation across high-dimensional data in vision and time series tasks. Empirical results across 11 datasets—ranging from ImageNet classification to optical flow and event-based eye tracking—demonstrate state-of-the-art accuracy with fewer parameters and improved computational efficiency relative to prior SSM and transformer-based methods. The work highlights the practical impact of integrating dynamic graph structures into sequential models, offering a scalable and versatile framework for complex multi-dimensional dependency modeling in real-world applications.

Abstract

State Space Models (SSMs) are powerful tools for modeling sequential data in computer vision and time series analysis domains. However, traditional SSMs are limited by fixed, one-dimensional sequential processing, which restricts their ability to model non-local interactions in high-dimensional data. While methods like Mamba and VMamba introduce selective and flexible scanning strategies, they rely on predetermined paths, which fails to efficiently capture complex dependencies. We introduce Graph-Generating State Space Models (GG-SSMs), a novel framework that overcomes these limitations by dynamically constructing graphs based on feature relationships. Using Chazelle's Minimum Spanning Tree algorithm, GG-SSMs adapt to the inherent data structure, enabling robust feature propagation across dynamically generated graphs and efficiently modeling complex dependencies. We validate GG-SSMs on 11 diverse datasets, including event-based eye-tracking, ImageNet classification, optical flow estimation, and six time series datasets. GG-SSMs achieve state-of-the-art performance across all tasks, surpassing existing methods by significant margins. Specifically, GG-SSM attains a top-1 accuracy of 84.9% on ImageNet, outperforming prior SSMs by 1%, reducing the KITTI-15 error rate to 2.77%, and improving eye-tracking detection rates by up to 0.33% with fewer parameters. These results demonstrate that dynamic scanning based on feature relationships significantly improves SSMs' representational power and efficiency, offering a versatile tool for various applications in computer vision and beyond.

GG-SSMs: Graph-Generating State Space Models

TL;DR

GG-SSMs address the fundamental limit of fixed 1D processing in traditional SSMs by dynamically constructing data-driven graphs and propagating state along a sparse MST backbone. The approach leverages Chazelle's MST to obtain near-linear-time graph construction, enabling robust long-range feature propagation across high-dimensional data in vision and time series tasks. Empirical results across 11 datasets—ranging from ImageNet classification to optical flow and event-based eye tracking—demonstrate state-of-the-art accuracy with fewer parameters and improved computational efficiency relative to prior SSM and transformer-based methods. The work highlights the practical impact of integrating dynamic graph structures into sequential models, offering a scalable and versatile framework for complex multi-dimensional dependency modeling in real-world applications.

Abstract

State Space Models (SSMs) are powerful tools for modeling sequential data in computer vision and time series analysis domains. However, traditional SSMs are limited by fixed, one-dimensional sequential processing, which restricts their ability to model non-local interactions in high-dimensional data. While methods like Mamba and VMamba introduce selective and flexible scanning strategies, they rely on predetermined paths, which fails to efficiently capture complex dependencies. We introduce Graph-Generating State Space Models (GG-SSMs), a novel framework that overcomes these limitations by dynamically constructing graphs based on feature relationships. Using Chazelle's Minimum Spanning Tree algorithm, GG-SSMs adapt to the inherent data structure, enabling robust feature propagation across dynamically generated graphs and efficiently modeling complex dependencies. We validate GG-SSMs on 11 diverse datasets, including event-based eye-tracking, ImageNet classification, optical flow estimation, and six time series datasets. GG-SSMs achieve state-of-the-art performance across all tasks, surpassing existing methods by significant margins. Specifically, GG-SSM attains a top-1 accuracy of 84.9% on ImageNet, outperforming prior SSMs by 1%, reducing the KITTI-15 error rate to 2.77%, and improving eye-tracking detection rates by up to 0.33% with fewer parameters. These results demonstrate that dynamic scanning based on feature relationships significantly improves SSMs' representational power and efficiency, offering a versatile tool for various applications in computer vision and beyond.

Paper Structure

This paper contains 31 sections, 5 equations, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Illustration of the Graph-Generating State Space Model (GG-SSM). Given an input feature set $\{\mathbf{x}_i\}_{i=1}^L$, we construct a graph based on feature dissimilarities and apply an efficient algorithm to generate a minimum spanning tree $\mathcal{T}$. SSM state propagation is then performed along this tree to obtain improved feature representations.
  • Figure 2: Chazelle’s MST Overview. Soft heaps allow near-linear sorting of edges. MST edges (in blue) form a spanning structure with no cycles, connecting all vertices using the smallest weights $w$.