Table of Contents
Fetching ...

FDive: Learning Relevance Models using Pattern-based Similarity Measures

Frederik L. Dennig, Tom Polk, Zudi Lin, Tobias Schreck, Hanspeter Pfister, Michael Behrisch

TL;DR

FDive tackles the challenge of extracting relevant patterns from high-dimensional data by automatically ranking pattern-based similarity measures and learning a SOM-based relevance model that is visually explorable and refinable through context-aware feedback. The core idea is to treat relevance as a binary task solved by selecting an FD–distance pair (pattern-based similarity measure) via the Similarity Advisor, then building a hierarchical SOM classifier that highlights uncertain regions near decision boundaries for user refinement. The paper introduces Inter-Group-Distance and Intra-Group-Distance as lightweight quality metrics for similarity measures, and demonstrates that the Similarity Advisor can perform comparably to, and sometimes better than, traditional feature-selection baselines in low-label settings. A real-world case study on electron microscopy images of brain cells shows FDive guiding experts toward convergent similarity measures and progressively refined decision boundaries, underscoring its practical impact for brain research and other domains requiring interpretable, interactive pattern discovery.

Abstract

The detection of interesting patterns in large high-dimensional datasets is difficult because of their dimensionality and pattern complexity. Therefore, analysts require automated support for the extraction of relevant patterns. In this paper, we present FDive, a visual active learning system that helps to create visually explorable relevance models, assisted by learning a pattern-based similarity. We use a small set of user-provided labels to rank similarity measures, consisting of feature descriptor and distance function combinations, by their ability to distinguish relevant from irrelevant data. Based on the best-ranked similarity measure, the system calculates an interactive Self-Organizing Map-based relevance model, which classifies data according to the cluster affiliation. It also automatically prompts further relevance feedback to improve its accuracy. Uncertain areas, especially near the decision boundaries, are highlighted and can be refined by the user. We evaluate our approach by comparison to state-of-the-art feature selection techniques and demonstrate the usefulness of our approach by a case study classifying electron microscopy images of brain cells. The results show that FDive enhances both the quality and understanding of relevance models and can thus lead to new insights for brain research.

FDive: Learning Relevance Models using Pattern-based Similarity Measures

TL;DR

FDive tackles the challenge of extracting relevant patterns from high-dimensional data by automatically ranking pattern-based similarity measures and learning a SOM-based relevance model that is visually explorable and refinable through context-aware feedback. The core idea is to treat relevance as a binary task solved by selecting an FD–distance pair (pattern-based similarity measure) via the Similarity Advisor, then building a hierarchical SOM classifier that highlights uncertain regions near decision boundaries for user refinement. The paper introduces Inter-Group-Distance and Intra-Group-Distance as lightweight quality metrics for similarity measures, and demonstrates that the Similarity Advisor can perform comparably to, and sometimes better than, traditional feature-selection baselines in low-label settings. A real-world case study on electron microscopy images of brain cells shows FDive guiding experts toward convergent similarity measures and progressively refined decision boundaries, underscoring its practical impact for brain research and other domains requiring interpretable, interactive pattern discovery.

Abstract

The detection of interesting patterns in large high-dimensional datasets is difficult because of their dimensionality and pattern complexity. Therefore, analysts require automated support for the extraction of relevant patterns. In this paper, we present FDive, a visual active learning system that helps to create visually explorable relevance models, assisted by learning a pattern-based similarity. We use a small set of user-provided labels to rank similarity measures, consisting of feature descriptor and distance function combinations, by their ability to distinguish relevant from irrelevant data. Based on the best-ranked similarity measure, the system calculates an interactive Self-Organizing Map-based relevance model, which classifies data according to the cluster affiliation. It also automatically prompts further relevance feedback to improve its accuracy. Uncertain areas, especially near the decision boundaries, are highlighted and can be refined by the user. We evaluate our approach by comparison to state-of-the-art feature selection techniques and demonstrate the usefulness of our approach by a case study classifying electron microscopy images of brain cells. The results show that FDive enhances both the quality and understanding of relevance models and can thus lead to new insights for brain research.

Paper Structure

This paper contains 23 sections, 4 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: (1) Users label query items as relevant or irrelevant and therein express their notion of relevance. (2) This selection is used to automatically determine the best-fitting similarity measure, which distinguishes relevant from irrelevant data. (3) The system adapts the model using the relevance labels and similarity measure. The model is explorable and refinable by the users, to improve its accuracy.
  • Figure 2: Context-aware Relevance Feedback: (1) Status display showing the current analysis state. (2) Scatter plot highlighting newly labeled data. (3) Scatter plot of the current classification result. Both allow judging the impact of new labels. (5) Queried neutral data items. Data items labeled as relevant (4) and irrelevant (6).
  • Figure 3: Left -- The Similarity Advisor uses a set of FDs and distance functions. FDs model the data based on perceptible patterns in the data or image space. Distance functions describe the relationship between two points in the FD space. In FDive, we consider all pair-wise combinations as potentially useful measures. We call a combination of an FD and a distance function a pattern-based similarity measure. Right -- The Similarity Advisor ranks all pair-wise combinations of FDs and distance functions according to their ability to distinguish relevant from irrelevant data. A bar indicates the score and a scatter plot shows the topology of implied data distribution allowing users to judge its usefulness.
  • Figure 4: We propose two quality metrics to evaluate similarity measures. Inter-Group-Distance describes the distance between the centroids of the relevant and irrelevant data, measuring how well a similarity measure separates both groups. The Intra-Group-Distance is defined as the maximum distance in the relevant or irrelevant data, measuring whether a similarity measure describes elements of the same group to be dissimilar.
  • Figure 5: Visual Exploration of SOM Model: (1) Classifier tree. (1a) Parent of the currently observed SOM. (1b) Children of the current SOM. (2) Detailed SOM Display. (3) Scatter plot highlighting data of the SOM node.
  • ...and 1 more figures