Table of Contents
Fetching ...

Approaches For Multi-View Redescription Mining

Matej Mihelčić, Tomislav Šmuc

TL;DR

This work presents a memory efficient, extensible multi-view redescription mining framework that can be used to relate multiple, i.e. more than two views, disjoint sets of attributes describing one set of entities.

Abstract

The task of redescription mining explores ways to re-describe different subsets of entities contained in a dataset and to reveal non-trivial associations between different subsets of attributes, called views. This interesting and challenging task is encountered in different scientific fields, and is addressed by a number of approaches that obtain redescriptions and allow for the exploration and analyses of attribute associations. The main limitation of existing approaches to this task is their inability to use more than two views. Our work alleviates this drawback. We present a memory efficient, extensible multi-view redescription mining framework that can be used to relate multiple, i.e. more than two views, disjoint sets of attributes describing one set of entities. The framework can use any multi-target regression or multi-label classification algorithm, with models that can be represented as sets of rules, to generate redescriptions. Multi-view redescriptions are built using incremental view-extending heuristic from initially created two-view redescriptions. In this work, we use different types of Predictive Clustering trees algorithms (regular, extra, with random output selection) and the Random Forest thereof in order to improve the quality of final redescription sets and/or execution time needed to generate them. We provide multiple performance analyses of the proposed framework and compare it against the naive approach to multi-view redescription mining. We demonstrate the usefulness of the proposed multi-view extension on several datasets, including a use-case on understanding of machine learning models - a topic of growing importance in machine learning and artificial intelligence in general.

Approaches For Multi-View Redescription Mining

TL;DR

This work presents a memory efficient, extensible multi-view redescription mining framework that can be used to relate multiple, i.e. more than two views, disjoint sets of attributes describing one set of entities.

Abstract

The task of redescription mining explores ways to re-describe different subsets of entities contained in a dataset and to reveal non-trivial associations between different subsets of attributes, called views. This interesting and challenging task is encountered in different scientific fields, and is addressed by a number of approaches that obtain redescriptions and allow for the exploration and analyses of attribute associations. The main limitation of existing approaches to this task is their inability to use more than two views. Our work alleviates this drawback. We present a memory efficient, extensible multi-view redescription mining framework that can be used to relate multiple, i.e. more than two views, disjoint sets of attributes describing one set of entities. The framework can use any multi-target regression or multi-label classification algorithm, with models that can be represented as sets of rules, to generate redescriptions. Multi-view redescriptions are built using incremental view-extending heuristic from initially created two-view redescriptions. In this work, we use different types of Predictive Clustering trees algorithms (regular, extra, with random output selection) and the Random Forest thereof in order to improve the quality of final redescription sets and/or execution time needed to generate them. We provide multiple performance analyses of the proposed framework and compare it against the naive approach to multi-view redescription mining. We demonstrate the usefulness of the proposed multi-view extension on several datasets, including a use-case on understanding of machine learning models - a topic of growing importance in machine learning and artificial intelligence in general.

Paper Structure

This paper contains 28 sections, 15 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: The framework uses a generalized version of the CLUS-RM algorithm (Section \ref{['rw']}, Mihelcic15LNAI) to create two-view redescriptions on all pairs of views. Views are combined as denoted by numbers ($W_1$,$W_2$) first, ($W_1$,$W_3$) second, ($W_{n-1},W_n$) last. The produced redescriptions form targets used to construct an arbitrary rule-transformable multi-target (multi-label) prediction model utilized to obtain corresponding rules on other views. Rule-producing models can be enhanced by using a Random Forest of arbitrary rule-transformable models as a supplementing model MihelcicRF2017 (we use PCTs with BreskvarROS and without KocevSO random output selections and the Extra multi-target PCTs KocevET) . The final redescription set $T_n$ is used to create a set of redescription sets $\mathcal{R}$ using the generalized redescription set construction procedure (GRSC) mihelcic2017framework.
  • Figure 2: The memory model used in the framework for multi-view redescription mining. The available memory is divided in two initially empty parts: the work set and the diversity set. The example shows memory management during iterations on data containing three views. After the number of redescriptions (complete and incomplete) in the redescription set exceeds the work set size, incomplete $2$-view redescriptions are discarded (after iteration $2$). Discarding of incomplete redescriptions continues until the number of redescriptions in the set is smaller or equal $t = (\mathcal{C}.MaxExpansionSize + \mathcal{C}.WorkSetSize)/2$ (the red mark). If the number of complete redescriptions exceeds $t$, the generalized redescription set construction procedure is called, selecting $r=2$ redescriptions (iteration $5$).
  • Figure 3: Comparison results on the Country dataset.
  • Figure 4: Comparison results on the Slovenian Water dataset
  • Figure 5: Comparison results on the Phenotype dataset
  • ...and 4 more figures

Theorems & Definitions (1)

  • Definition 1