Table of Contents
Fetching ...

evclust: Python library for evidential clustering

Armel Soubeiga, Violaine Antoine

TL;DR

The paper tackles clustering under uncertainty by employing evidential clustering based on belief functions to form credal partitions. It introduces evclust, a Python library that implements a broad suite of algorithms (eg ECM, RECM, k-EVCLUS, CatECM, EGMM, BPEC, ECMdd, MECM, WMVEC, MECMdd, CCM) for attribute, proximity, and multi-view data, paired with visualization and evaluation tools. Clusters are represented by mass functions on subsets of the cluster set, enabling analysis with metrics such as nonspecificity and credal Rand index, thereby capturing imprecision and conflict in assignments. The work demonstrates practical utility on Iris and a multi-view dataset and outlines a roadmap for future extensions and community-driven development, highlighting the practical impact of probabilistic and belief-based clustering in real data analysis.

Abstract

A recent developing trend in clustering is the advancement of algorithms that not only identify clusters within data, but also express and capture the uncertainty of cluster membership. Evidential clustering addresses this by using the Dempster-Shafer theory of belief functions, a framework designed to manage and represent uncertainty. This approach results in a credal partition, a structured set of mass functions that quantify the uncertain assignment of each object to potential groups. The Python framework evclust, presented in this paper, offers a suite of efficient evidence clustering algorithms as well as tools for visualizing, evaluating and analyzing credal partitions.

evclust: Python library for evidential clustering

TL;DR

The paper tackles clustering under uncertainty by employing evidential clustering based on belief functions to form credal partitions. It introduces evclust, a Python library that implements a broad suite of algorithms (eg ECM, RECM, k-EVCLUS, CatECM, EGMM, BPEC, ECMdd, MECM, WMVEC, MECMdd, CCM) for attribute, proximity, and multi-view data, paired with visualization and evaluation tools. Clusters are represented by mass functions on subsets of the cluster set, enabling analysis with metrics such as nonspecificity and credal Rand index, thereby capturing imprecision and conflict in assignments. The work demonstrates practical utility on Iris and a multi-view dataset and outlines a roadmap for future extensions and community-driven development, highlighting the practical impact of probabilistic and belief-based clustering in real data analysis.

Abstract

A recent developing trend in clustering is the advancement of algorithms that not only identify clusters within data, but also express and capture the uncertainty of cluster membership. Evidential clustering addresses this by using the Dempster-Shafer theory of belief functions, a framework designed to manage and represent uncertainty. This approach results in a credal partition, a structured set of mass functions that quantify the uncertain assignment of each object to potential groups. The Python framework evclust, presented in this paper, offers a suite of efficient evidence clustering algorithms as well as tools for visualizing, evaluating and analyzing credal partitions.

Paper Structure

This paper contains 23 sections, 12 equations, 1 figure, 4 tables.

Figures (1)

  • Figure 1: Illustration of visualization of credal partition using ev_plot and ev_pcaplot functions.