Table of Contents
Fetching ...

Simbanex: Similarity-based Exploration of IEEE VIS Publications

Daniel Witschard, Ilir Jusufi, Andreas Kerren

TL;DR

This paper introduces Simbanex, a visual analytics tool for interactive similarity exploration in bibliometric networks by decomposing a multivariate publication network into separately embeddable aspects such as topology, text, authors, and numerical counts. It proposes an aspect driven all embedding strategy that uses multiple embeddings to produce a homogeneous similarity framework and enables a novel similarity based clustering. The authors demonstrate two use cases citation link analysis and topic similarity to show how similarity patterns can reveal missing citations and sub topic structures, respectively. They discuss limitations including scalability and the need for human in the loop to interpret similarity in complex data, and argue for broad applicability of the approach beyond MVNs.

Abstract

Embeddings are powerful tools for transforming complex and unstructured data into numeric formats suitable for computational analysis tasks. In this work, we use multiple embeddings for similarity calculations to be applied in bibliometrics and scientometrics. We build a multivariate network (MVN) from a large set of scientific publications and explore an aspect-driven analysis approach to reveal similarity patterns in the given publication data. By dividing our MVN into separately embeddable aspects, we are able to obtain a flexible vector representation which we use as input to a novel method of similarity-based clustering. Based on these preprocessing steps, we developed a visual analytics application, called Simbanex, that has been designed for the interactive visual exploration of similarity patterns within the underlying publications.

Simbanex: Similarity-based Exploration of IEEE VIS Publications

TL;DR

This paper introduces Simbanex, a visual analytics tool for interactive similarity exploration in bibliometric networks by decomposing a multivariate publication network into separately embeddable aspects such as topology, text, authors, and numerical counts. It proposes an aspect driven all embedding strategy that uses multiple embeddings to produce a homogeneous similarity framework and enables a novel similarity based clustering. The authors demonstrate two use cases citation link analysis and topic similarity to show how similarity patterns can reveal missing citations and sub topic structures, respectively. They discuss limitations including scalability and the need for human in the loop to interpret similarity in complex data, and argue for broad applicability of the approach beyond MVNs.

Abstract

Embeddings are powerful tools for transforming complex and unstructured data into numeric formats suitable for computational analysis tasks. In this work, we use multiple embeddings for similarity calculations to be applied in bibliometrics and scientometrics. We build a multivariate network (MVN) from a large set of scientific publications and explore an aspect-driven analysis approach to reveal similarity patterns in the given publication data. By dividing our MVN into separately embeddable aspects, we are able to obtain a flexible vector representation which we use as input to a novel method of similarity-based clustering. Based on these preprocessing steps, we developed a visual analytics application, called Simbanex, that has been designed for the interactive visual exploration of similarity patterns within the underlying publications.
Paper Structure (13 sections, 9 figures)

This paper contains 13 sections, 9 figures.

Figures (9)

  • Figure 1: A schematic view of how the aspect-driven approach has been applied to the data set. The underlying MVN is partitioned into four different, node-based aspects which are embedded separately. For each aspect, the pairwise similarity classifications (i.e., similar, dissimilar, or uncertain) are calculated.
  • Figure 2: The user interface of Simbanex. In the Clustering View [A], the result of clustering with the current similarity criteria is displayed. In the Intra Cluster View [B], the similarity network and the adjacency matrix of a selected cluster can be assessed. The Target-to-all View [C] shows an overview of the matches and near misses for a selected article. Finally, the detailed pairwise comparisons can be assessed in the Similarity Assessment View [D].
  • Figure 3: The Similarity Network View. Clicking a cluster circle displays the similarity network of the cluster. In this example, the user is hovering the mouse cursor over an article icon to highlight similarity matches and near misses as well as the node's position in the adjacency matrix.
  • Figure 4: TheTarget-to-All View facilitates at-a-glance assessments of matches for a selected target. The colored charts indicate the current setting of activated sliders (the small white frames within the colored areas), and marks indicate whether a full match, or a near miss, was achieved. For non-activated sliders (no white frame present in the colored area), an indication is given for the setting that would result in a match for the corresponding aspect. In this example, there are 5 matches and 3 near misses. The user is hovering the node of the comparison with article A1644 to display co-occurring authors and co-occurring words.
  • Figure 5: In the first three steps of Use Case 1 (in order from background to foreground), the user explores intra-set citations and self-citations.
  • ...and 4 more figures