Explainable Graph Spectral Clustering For GloVe-like Text Embeddings
Mieczysław A. Kłopotek, Sławomir T. Wierzchoń, Bartłomiej Starosta, Piotr Borkowski, Dariusz Czerski, Eryk Laskowski
TL;DR
This work extends explainable graph spectral clustering from term-vector spaces to GloVe-like embeddings, addressing how to interpret cluster membership when documents are embedded via word vectors. It develops multiple embedding-based formulations (L-, N-, R-, K-, B- embeddings) and proves approximate equivalences among them, enabling explainable clustering results that tie back to document words. Through experiments on WikiGloVe, TweetGloVe, and TVS representations, the paper assesses clustering quality and explanation clarity across variants, highlighting contexts where GloVe aids or hinders performance and where explanations remain intuitive. The results provide practical guidance on when to prefer GloVe-based explainability versus traditional term-vector approaches, and lay groundwork for integrating non-linear embeddings in future work.
Abstract
In a previous paper, we proposed an introduction to the explainability of Graph Spectral Clustering results for textual documents, given that document similarity is computed as cosine similarity in term vector space. In this paper, we generalize this idea by considering other embeddings of documents, in particular, based on the GloVe embedding idea.
