A Methodology and System For Big-Thick Data Collection
Ivan Kayongo, Haonan Zhao, Leonardo Malcotti, Fausto Giunchiglia
TL;DR
The paper tackles the gap between objective sensor data and subjective human context by proposing a three-component system for Big-Thick data collection: a data collection tool, an experiment planning/execution monitoring module, and a learning-enabled scheduler. It combines a knowledge-graph style context representation with a dashboard for real-time monitoring to enable adaptive, human-in-the-loop data collection and quality control. The approach emphasizes minimal disruption to participants while maintaining high data quality and provides empirical evidence from WeNet-derived data emphasizing adaptive timing and participant engagement. The work highlights practical implications for privacy and ethics and sets the stage for real-world evaluation and comparison with existing platforms.
Abstract
Pervasive sensors have become essential in research for gathering real-world data. However, current studies often focus solely on objective data, neglecting subjective human contributions. We introduce an approach and system for collecting big-thick data, combining extensive sensor data (big data) with qualitative human feedback (thick data). This fusion enables effective collaboration between humans and machines, allowing machine learning to benefit from human behavior and interpretations. Emphasizing data quality, our system incorporates continuous monitoring and adaptive learning mechanisms to optimize data collection timing and context, ensuring relevance, accuracy, and reliability. The system comprises three key components: a) a tool for collecting sensor data and user feedback, b) components for experiment planning and execution monitoring, and c) a machine-learning component that enhances human-machine interaction.
