Table of Contents
Fetching ...

Neural RELAGGS

Lukas Pensel, Stefan Kramer

TL;DR

This work addresses the challenge of applying neural models to relational data by proposing Neural RELAGGS (N-RELAGGS), which embeds propositionalization inside a neural topology using trainable composite aggregate functions. The approach enables end-to-end learning by jointly optimizing the data transformation and the prediction model, and it leverages segment operations to implement scalable aggregation across multi-relational structures. Empirical results across multiple datasets, including a large DBLP-derived benchmark, show that N-RELAGGS improves predictive performance over RELAGGS and compares favorably to state-of-the-art propositionalization baselines, particularly on larger data. The method also provides fixed-size relational embeddings that can power downstream propositional learners, highlighting its practical impact for scalable relational learning and retrieval tasks.

Abstract

Multi-relational databases are the basis of most consolidated data collections in science and industry today. Most learning and mining algorithms, however, require data to be represented in a propositional form. While there is a variety of specialized machine learning algorithms that can operate directly on multi-relational data sets, propositionalization algorithms transform multi-relational databases into propositional data sets, thereby allowing the application of traditional machine learning and data mining algorithms without their modification. One prominent propositionalization algorithm is RELAGGS by Krogel and Wrobel, which transforms the data by nested aggregations. We propose a new neural network based algorithm in the spirit of RELAGGS that employs trainable composite aggregate functions instead of the static aggregate functions used in the original approach. In this way, we can jointly train the propositionalization with the prediction model, or, alternatively, use the learned aggegrations as embeddings in other algorithms. We demonstrate the increased predictive performance by comparing N-RELAGGS with RELAGGS and multiple other state-of-the-art algorithms.

Neural RELAGGS

TL;DR

This work addresses the challenge of applying neural models to relational data by proposing Neural RELAGGS (N-RELAGGS), which embeds propositionalization inside a neural topology using trainable composite aggregate functions. The approach enables end-to-end learning by jointly optimizing the data transformation and the prediction model, and it leverages segment operations to implement scalable aggregation across multi-relational structures. Empirical results across multiple datasets, including a large DBLP-derived benchmark, show that N-RELAGGS improves predictive performance over RELAGGS and compares favorably to state-of-the-art propositionalization baselines, particularly on larger data. The method also provides fixed-size relational embeddings that can power downstream propositional learners, highlighting its practical impact for scalable relational learning and retrieval tasks.

Abstract

Multi-relational databases are the basis of most consolidated data collections in science and industry today. Most learning and mining algorithms, however, require data to be represented in a propositional form. While there is a variety of specialized machine learning algorithms that can operate directly on multi-relational data sets, propositionalization algorithms transform multi-relational databases into propositional data sets, thereby allowing the application of traditional machine learning and data mining algorithms without their modification. One prominent propositionalization algorithm is RELAGGS by Krogel and Wrobel, which transforms the data by nested aggregations. We propose a new neural network based algorithm in the spirit of RELAGGS that employs trainable composite aggregate functions instead of the static aggregate functions used in the original approach. In this way, we can jointly train the propositionalization with the prediction model, or, alternatively, use the learned aggegrations as embeddings in other algorithms. We demonstrate the increased predictive performance by comparing N-RELAGGS with RELAGGS and multiple other state-of-the-art algorithms.
Paper Structure (30 sections, 6 equations, 3 figures, 12 tables, 4 algorithms)

This paper contains 30 sections, 6 equations, 3 figures, 12 tables, 4 algorithms.

Figures (3)

  • Figure 1: Schematic of the neural network based composite aggregate function. We take $n$ instances with $m_1, \dots, m_n$ entries as input and each entry is passed through a feature generation network layer. Then the tensors are spanned in order to aggregate the entries of singular instances. The resulting tensor of aggregates is passed through a feature selection network layer and subsequently forwarded to the predictor model.
  • Figure 2: Relation of the aggregation plan $P$ for the MovieLens database.
  • Figure 3: Comparison between the N-RELAGGS algorithm, respectively the Fix N-RELAGGS algorithm, with all other tested algorithms. Green fields show superiority, red fields show inferiority and the saturation of the fields represents the significance of the difference. Additionally each field yields the p-value determined by a corrected repeated 10-fold cv test with two repetitions and the bar for significance set to $\alpha = 0.05$.

Theorems & Definitions (3)

  • Definition 1: Aggregate function
  • Definition 2: Trainable aggregate function
  • Definition 3: Composite aggregate function