Table of Contents
Fetching ...

Rel-MOSS: Towards Imbalanced Relational Deep Learning on Relational Databases

Jun Yin, Peng Huo, Bangguo Zhu, Hao Yan, Senzhang Wang, Shirui Pan, Chengqi Zhang

TL;DR

This work investigates, for the first time, class imbalance problem in RDB entity classification and design the relation-centric minority synthetic over-sampling GNN (Rel-MOSS), in order to fill a critical void in the current literature.

Abstract

In recent advances, to enable a fully data-driven learning paradigm on relational databases (RDB), relational deep learning (RDL) is proposed to structure the RDB as a heterogeneous entity graph and adopt the graph neural network (GNN) as the predictive model. However, existing RDL methods neglect the imbalance problem of relational data in RDBs and risk under-representing the minority entities, leading to an unusable model in practice. In this work, we investigate, for the first time, class imbalance problem in RDB entity classification and design the relation-centric minority synthetic over-sampling GNN (Rel-MOSS), in order to fill a critical void in the current literature. Specifically, to mitigate the issue of minority-related information being submerged by majority counterparts, we design the relation-wise gating controller to modulate neighborhood messages from each individual relation type. Based on the relational-gated representations, we further propose the relation-guided minority synthesizer for over-sampling, which integrates the entity relational signatures to maintain relational consistency. Extensive experiments on 12 entity classification datasets provide compelling evidence for the superiority of Rel-MOSS, yielding an average improvement of up to 2.46% and 4.00% in terms of Balanced Accuracy and G-Mean, compared with SOTA RDL methods and classic methods for handling class imbalance.

Rel-MOSS: Towards Imbalanced Relational Deep Learning on Relational Databases

TL;DR

This work investigates, for the first time, class imbalance problem in RDB entity classification and design the relation-centric minority synthetic over-sampling GNN (Rel-MOSS), in order to fill a critical void in the current literature.

Abstract

In recent advances, to enable a fully data-driven learning paradigm on relational databases (RDB), relational deep learning (RDL) is proposed to structure the RDB as a heterogeneous entity graph and adopt the graph neural network (GNN) as the predictive model. However, existing RDL methods neglect the imbalance problem of relational data in RDBs and risk under-representing the minority entities, leading to an unusable model in practice. In this work, we investigate, for the first time, class imbalance problem in RDB entity classification and design the relation-centric minority synthetic over-sampling GNN (Rel-MOSS), in order to fill a critical void in the current literature. Specifically, to mitigate the issue of minority-related information being submerged by majority counterparts, we design the relation-wise gating controller to modulate neighborhood messages from each individual relation type. Based on the relational-gated representations, we further propose the relation-guided minority synthesizer for over-sampling, which integrates the entity relational signatures to maintain relational consistency. Extensive experiments on 12 entity classification datasets provide compelling evidence for the superiority of Rel-MOSS, yielding an average improvement of up to 2.46% and 4.00% in terms of Balanced Accuracy and G-Mean, compared with SOTA RDL methods and classic methods for handling class imbalance.
Paper Structure (24 sections, 2 theorems, 18 equations, 8 figures, 8 tables)

This paper contains 24 sections, 2 theorems, 18 equations, 8 figures, 8 tables.

Key Result

Proposition 4.1

WLOG, consider a relational entity graph where entity classification is performed on a single entity type with binary labels $y \in \{0,1\}$, and assume that the minority entities are severely less than the majority ones. According to formula eq:message_pass, for any minority entity $e$, the magnitu

Figures (8)

  • Figure 1: Illustration of a). Aggregation with equal importance leads to indistinguishable representations and b) Decisive information resides in local relational structures.
  • Figure 2: Illustrative example of a). Relational database schema, and b). Tables within the relational database.
  • Figure 3: Overview of Rel-MOSS. Beginning with the entity feature encoding, the original entity features with diverse modalities is converted into unified representations by modality-specific feature encoder. For entity $e$, the relation-wise gating controller modulates the neighborhood message from each relation according to the gating factors $\{\Psi_{e,r}\}$. Subsequently, based on the gated representations, the relation-guided minority synthesizer integrates entity relational signatures into over-sampling process. Finally, the optimization of Rel-MOSS consists of the entity classification and relational signature reconstruction.
  • Figure 4: Comparison between Rel-MOSS and w/o Rel-Gate in terms of a). Euclidean distance, and b). Manhattan distance.
  • Figure 5: Visualization of entity representations learned by SMOTE, GraphSMOTE, and Rel-MOSS.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Proposition 4.1: Minority Information Collapse
  • Proposition 4.2: Relational Consistency