SensorQA: A Question Answering Benchmark for Daily-Life Monitoring
Benjamin Reichman, Xiaofan Yu, Lanxiang Hu, Jack Truxal, Atishay Jain, Rushil Chandrupatla, Tajana Šimunić Rosing, Larry Heck
TL;DR
SensorQA tackles the challenge of making long-term wearable sensor data accessible via natural language QA. By building a human-created dataset from real-world ExtraSensory data and visualized multi-time-scale activity graphs, it captures diverse user interests and realistic QA scenarios. Benchmark results reveal substantial gaps between state-of-the-art models and practical QA performance, especially for long-duration sensor data and edge-device efficiency, underscoring the need for new sensor-text fusion and deployment-friendly approaches. The dataset and code are openly available to spur advances in real-world, user-centric QA over sensor streams for daily-life monitoring.
Abstract
With the rapid growth in sensor data, effectively interpreting and interfacing with these data in a human-understandable way has become crucial. While existing research primarily focuses on learning classification models, fewer studies have explored how end users can actively extract useful insights from sensor data, often hindered by the lack of a proper dataset. To address this gap, we introduce SensorQA, the first human-created question-answering (QA) dataset for long-term time-series sensor data for daily life monitoring. SensorQA is created by human workers and includes 5.6K diverse and practical queries that reflect genuine human interests, paired with accurate answers derived from sensor data. We further establish benchmarks for state-of-the-art AI models on this dataset and evaluate their performance on typical edge devices. Our results reveal a gap between current models and optimal QA performance and efficiency, highlighting the need for new contributions. The dataset and code are available at: https://github.com/benjamin-reichman/SensorQA.
