FDive: Learning Relevance Models using Pattern-based Similarity Measures
Frederik L. Dennig, Tom Polk, Zudi Lin, Tobias Schreck, Hanspeter Pfister, Michael Behrisch
TL;DR
FDive tackles the challenge of extracting relevant patterns from high-dimensional data by automatically ranking pattern-based similarity measures and learning a SOM-based relevance model that is visually explorable and refinable through context-aware feedback. The core idea is to treat relevance as a binary task solved by selecting an FD–distance pair (pattern-based similarity measure) via the Similarity Advisor, then building a hierarchical SOM classifier that highlights uncertain regions near decision boundaries for user refinement. The paper introduces Inter-Group-Distance and Intra-Group-Distance as lightweight quality metrics for similarity measures, and demonstrates that the Similarity Advisor can perform comparably to, and sometimes better than, traditional feature-selection baselines in low-label settings. A real-world case study on electron microscopy images of brain cells shows FDive guiding experts toward convergent similarity measures and progressively refined decision boundaries, underscoring its practical impact for brain research and other domains requiring interpretable, interactive pattern discovery.
Abstract
The detection of interesting patterns in large high-dimensional datasets is difficult because of their dimensionality and pattern complexity. Therefore, analysts require automated support for the extraction of relevant patterns. In this paper, we present FDive, a visual active learning system that helps to create visually explorable relevance models, assisted by learning a pattern-based similarity. We use a small set of user-provided labels to rank similarity measures, consisting of feature descriptor and distance function combinations, by their ability to distinguish relevant from irrelevant data. Based on the best-ranked similarity measure, the system calculates an interactive Self-Organizing Map-based relevance model, which classifies data according to the cluster affiliation. It also automatically prompts further relevance feedback to improve its accuracy. Uncertain areas, especially near the decision boundaries, are highlighted and can be refined by the user. We evaluate our approach by comparison to state-of-the-art feature selection techniques and demonstrate the usefulness of our approach by a case study classifying electron microscopy images of brain cells. The results show that FDive enhances both the quality and understanding of relevance models and can thus lead to new insights for brain research.
