Hierarchy Representation of Data in Machine Learnings
Han Yegang, Park Minjun, Byun Duwon, Park Inkyu
TL;DR
This work addresses the problem of uncovering data-driven hierarchies among evaluation targets to improve ML learning. It adapts knowledge space theory by formalizing a knowledge structure $(Q,\mathcal{K})$, the surmise relation $p\rightarrow q$, and a probabilistic relaxation $p\hookrightarrow q$ with threshold $\alpha$, plus a discriminative reduction and visualization framework to reveal target dependencies. The key contributions include a formal hierarchy representation for targets, a practical method to visualize and analyze dependencies (via constructs like $\text{Ord}$ and $\text{Has}$), and a demonstration on a 10-target, 12-model setting. The approach enables model monitoring and data curation by pinpointing hierarchical data challenges that influence learning outcomes.
Abstract
When there are models with clear-cut judgment results for several data points, it is possible that most models exhibit a relationship where if they correctly judge one target, they also correctly judge another target. Conversely, if most models incorrectly judge one target, they may also incorrectly judge another target. We propose a method for visualizing this hierarchy among targets. This information is expected to be beneficial for model improvement.
