Table of Contents
Fetching ...

Views: a hardware-friendly graph database model for storing semantic information

Yanjun Yang, Adrian Wheeldon, Yihan Pan, Themis Prodromakis, Alex Serb

TL;DR

The paper addresses the bottlenecks of traditional graph databases when deployed on hardware accelerators, proposing Views as a hardware-friendly graph database model. Views encodes directed, labelled graphs with infinitely recursive labellability using a uniform, linked-list–based data structure (Viewstriplets, linknodes, headnodes) that maps well to near-memory computation. It introduces two hardware mappings (CNSM and Normalised) and the ASOCA accelerator, along with an ISA supporting parallel CAR/CAR2/AAR traversal, and demonstrates storage efficiency advantages over RDF/LPG implementations, plus practical operation examples in semantic reasoning and Copycat-like cognition. The work shows that co-design of data structure and hardware can yield substantial storage and traversal efficiencies while preserving compatibility with existing graph representations and enabling advanced cognitive tasks. This suggests a path toward practical hardware-accelerated graph reasoning for symbolic AI and retrieval-augmented generation pipelines.

Abstract

The graph database (GDB) is an increasingly common storage model for data involving relationships between entries. Beyond its widespread usage in database industries, the advantages of GDBs indicate a strong potential in constructing symbolic artificial intelligences (AIs) and retrieval-augmented generation (RAG), where knowledge of data inter-relationships takes a critical role in implementation. However, current GDB models are not optimised for hardware acceleration, leading to bottlenecks in storage capacity and computational efficiency. In this paper, we propose a hardware-friendly GDB model, called Views. We show its data structure and organisation tailored for efficient storage and retrieval of graph data and demonstrate its functional equivalence and storage performance advantage compared to represent traditional graph representations. We further demonstrate its symbolic processing abilities in semantic reasoning and cognitive modelling with practical examples and provide a short perspective on future developments.

Views: a hardware-friendly graph database model for storing semantic information

TL;DR

The paper addresses the bottlenecks of traditional graph databases when deployed on hardware accelerators, proposing Views as a hardware-friendly graph database model. Views encodes directed, labelled graphs with infinitely recursive labellability using a uniform, linked-list–based data structure (Viewstriplets, linknodes, headnodes) that maps well to near-memory computation. It introduces two hardware mappings (CNSM and Normalised) and the ASOCA accelerator, along with an ISA supporting parallel CAR/CAR2/AAR traversal, and demonstrates storage efficiency advantages over RDF/LPG implementations, plus practical operation examples in semantic reasoning and Copycat-like cognition. The work shows that co-design of data structure and hardware can yield substantial storage and traversal efficiencies while preserving compatibility with existing graph representations and enabling advanced cognitive tasks. This suggests a path toward practical hardware-accelerated graph reasoning for symbolic AI and retrieval-augmented generation pipelines.

Abstract

The graph database (GDB) is an increasingly common storage model for data involving relationships between entries. Beyond its widespread usage in database industries, the advantages of GDBs indicate a strong potential in constructing symbolic artificial intelligences (AIs) and retrieval-augmented generation (RAG), where knowledge of data inter-relationships takes a critical role in implementation. However, current GDB models are not optimised for hardware acceleration, leading to bottlenecks in storage capacity and computational efficiency. In this paper, we propose a hardware-friendly GDB model, called Views. We show its data structure and organisation tailored for efficient storage and retrieval of graph data and demonstrate its functional equivalence and storage performance advantage compared to represent traditional graph representations. We further demonstrate its symbolic processing abilities in semantic reasoning and cognitive modelling with practical examples and provide a short perspective on future developments.

Paper Structure

This paper contains 14 sections, 1 equation, 11 figures, 4 tables, 1 algorithm.

Figures (11)

  • Figure 1: Forms of basic data structure: (a) A "half-$K_2$" graph. Rectangles represent abstract graph vertices and the arrow represents an abstract graph edge. (b) The triplet in Viewsgdb model. Here, the bevelled rectangles represent data that can be stored in physical memory entries as numbers or pointers.
  • Figure 2: Viewsgdb model (a) proto-linknode, and (b) proto-headnode. Ellipses in this figure refer to the physical addresses corresponding to each portrayed link/headnode, but are not explicitly stored in each linknode's allocated memory space. Bevelled rectangles are data explicitly stored in the linknode's allocated memory. Therefore, link address diagrammatically represents the address of the current linknode and next is a physically stored pointer to the next linknode addresses while head ID in red is another physically stored pointer to the source vertex. Note that in the case of a headnode, head ID stores the same value as link address i.e. it points to itself. This also relates back to \ref{['fig:triplet']}.
  • Figure 3: A semantic sentence "Object 0x00a is a naughty black cat" equivalently stored in: (a) a graph where a vertex has degree 3, (b) a Views chain with length 4. Note that the headnode link address oval has been coloured red to highlight that it is a headnode. eoc is a special value used to indicate the end of a chain instead of valid linknodes.
  • Figure 4: Viewsgdb model (a) linknode, and (b) headnode taking the same format from \ref{['fig:proto_nodes']}. prop1 and prop2 are supplemented into our data structure. Note that the properties of the source vertex itself are stored in the location of prop1 in its headnode, to which its head ID points.
  • Figure 5: Examples of secondary labelling in (a) a traditional directed graph, (b) a Views-based gdb, where the "family" chain has been set up to refer to the taxonomic rank specifically, and (c) another Views-based gdb, where the shown "family" chain has been set up to represent a more generic concept of the term, only specifying the part of speech. Note that the white "family" is nothing but pointer to the head ID of a linked list, within which linknodes define it further.
  • ...and 6 more figures