The Deep Latent Position Topic Model for Clustering and Representation of Networks with Textual Edges
Rémi Boutin, Pierre Latouche, Charles Bouveyron
TL;DR
Deep-LPTM addresses clustering in heterogeneous networks with text-rich edges by learning two aligned embedding spaces for nodes and edges via a variational graph auto-encoder, while simultaneously modeling edge topics with a probabilistic topic model. The approach enables simultaneous partitioning of nodes and interpretation of edge content, and introduces IC2L as a model selection criterion that balances clustering quality and visualization. In synthetic-data benchmarks, Deep-LPTM more accurately recovers node partitions than competing methods such as ETSBM and STBM. The Enron email network is analyzed to demonstrate meaningful graph visualizations and highlights of the graph structure, illustrating practical applicability to real-world, text-rich networks.
Abstract
Numerical interactions leading to users sharing textual content published by others are naturally represented by a network where the individuals are associated with the nodes and the exchanged texts with the edges. To understand those heterogeneous and complex data structures, clustering nodes into homogeneous groups as well as rendering a comprehensible visualisation of the data is mandatory. To address both issues, we introduce Deep-LPTM, a model-based clustering strategy relying on a variational graph auto-encoder approach as well as a probabilistic model to characterise the topics of discussion. Deep-LPTM allows to build a joint representation of the nodes and of the edges in two embeddings spaces. The parameters are inferred using a variational inference algorithm. We also introduce IC2L, a model selection criterion specifically designed to choose models with relevant clustering and visualisation properties. An extensive benchmark study on synthetic data is provided. In particular, we find that Deep-LPTM better recovers the partitions of the nodes than the state-of-the art ETSBM and STBM. Eventually, the emails of the Enron company are analysed and visualisations of the results are presented, with meaningful highlights of the graph structure.
