Table of Contents
Fetching ...

Fine-grained classification of journal articles by relying on multiple layers of information through similarity network fusion: the case of the Cambridge Journal of Economics

Alberto Baccini, Federica Baccini, Lucio Barabesi, Martina Cioni, Eugenio Petrovich, Daria Pignalosa

TL;DR

This paper tackles fine-grained classification of scientific papers by integrating content-based and citation-based information via Similarity Network Fusion (SNF) applied to the Cambridge Journal of Economics (1985–2013). It constructs two content layers (BoW and LDA topics) and a citation layer (bibliographic coupling via Jaccard similarity), then uses SNF to fuse them into multiple fused networks, comparing their structure and clustering to single-layer approaches and to alternative hybrids. The results show that SNF yields stable, interpretable, and more fine-grained communities, with the citation layer contributing most to the fused structure; simple weighted-mean hybrids are less effective. The study demonstrates SNF’s value for complex, heterogeneous journals and outlines future work to add layers and scale up analyses.

Abstract

In order to explore the suitability of a fine-grained classification of journal articles by exploiting multiple sources of information, articles are organized in a two-layer multiplex. The first layer conveys similarities based on the full-text of articles, and the second similarities based on cited references. The information of the two layers are only weakly associated. The Similarity Network Fusion process is adopted to combine the two layers into a new single-layer network. A clustering algorithm is applied to the fused network and the classification of articles is obtained. In order to evaluate its coherence, this classification is compared with the ones obtained by applying the same algorithm to each of two layers. Moreover, the classification obtained for the fused network is also compared with the classifications obtained when the layers of information are integrated using different methods available in literature. In the case of the Cambridge Journal of Economics, Similarity Network Fusion appears to be the best option. Moreover, the achieved classification appears to be fine-grained enough to represent the extreme heterogeneity characterizing the contributions published in the journal.

Fine-grained classification of journal articles by relying on multiple layers of information through similarity network fusion: the case of the Cambridge Journal of Economics

TL;DR

This paper tackles fine-grained classification of scientific papers by integrating content-based and citation-based information via Similarity Network Fusion (SNF) applied to the Cambridge Journal of Economics (1985–2013). It constructs two content layers (BoW and LDA topics) and a citation layer (bibliographic coupling via Jaccard similarity), then uses SNF to fuse them into multiple fused networks, comparing their structure and clustering to single-layer approaches and to alternative hybrids. The results show that SNF yields stable, interpretable, and more fine-grained communities, with the citation layer contributing most to the fused structure; simple weighted-mean hybrids are less effective. The study demonstrates SNF’s value for complex, heterogeneous journals and outlines future work to add layers and scale up analyses.

Abstract

In order to explore the suitability of a fine-grained classification of journal articles by exploiting multiple sources of information, articles are organized in a two-layer multiplex. The first layer conveys similarities based on the full-text of articles, and the second similarities based on cited references. The information of the two layers are only weakly associated. The Similarity Network Fusion process is adopted to combine the two layers into a new single-layer network. A clustering algorithm is applied to the fused network and the classification of articles is obtained. In order to evaluate its coherence, this classification is compared with the ones obtained by applying the same algorithm to each of two layers. Moreover, the classification obtained for the fused network is also compared with the classifications obtained when the layers of information are integrated using different methods available in literature. In the case of the Cambridge Journal of Economics, Similarity Network Fusion appears to be the best option. Moreover, the achieved classification appears to be fine-grained enough to represent the extreme heterogeneity characterizing the contributions published in the journal.
Paper Structure (16 sections, 6 equations, 6 tables)