Unveiling Interesting Insights: Monte Carlo Tree Search for Knowledge Discovery
Pietro Totis, Alberto Pozanco, Daniel Borrajo
TL;DR
This paper tackles automated knowledge discovery by framing the search for interesting data insights as a single-player Monte Carlo Tree Search over a space of data transformations and pattern-mining models. It introduces AIDE, an extensible framework where data actions (select, derive, where, groupby) and model actions are structured as labelled trees, and where the search balances exploration and exploitation through UCT-based policies and progressive widening. Interestingness is defined unsupervisedly via intr, with pattern-type-specific calculations that reward peculiarity and meaningful relations (e.g., trees, outliers, clustering, trends, association rules). Experimental results on synthetic and real datasets show that MCTS configurations generally outperform random baselines in discovering patterns, highlighting AIDE’s potential to automate discovery while leaving room for incorporating domain knowledge and user feedback to further align the results with goals.
Abstract
Organizations are increasingly focused on leveraging data from their processes to gain insights and drive decision-making. However, converting this data into actionable knowledge remains a difficult and time-consuming task. There is often a gap between the volume of data collected and the ability to process and understand it, which automated knowledge discovery aims to fill. Automated knowledge discovery involves complex open problems, including effectively navigating data, building models to extract implicit relationships, and considering subjective goals and knowledge. In this paper, we introduce a novel method for Automated Insights and Data Exploration (AIDE), that serves as a robust foundation for tackling these challenges through the use of Monte Carlo Tree Search (MCTS). We evaluate AIDE using both real-world and synthetic data, demonstrating its effectiveness in identifying data transformations and models that uncover interesting data patterns. Among its strengths, AIDE's MCTS-based framework offers significant extensibility, allowing for future integration of additional pattern extraction strategies and domain knowledge. This makes AIDE a valuable step towards developing a comprehensive solution for automated knowledge discovery.
