DiversityOne: A Multi-Country Smartphone Sensor Dataset for Everyday Life Behavior Modeling
Matteo Busso, Andrea Bontempelli, Leonardo Javier Malcotti, Lakmal Meegahapola, Peter Kun, Shyam Diwakar, Chaitanya Nutakki, Marcelo Dario Rodas Britez, Hao Xu, Donglei Song, Salvador Ruiz Correa, Andrea-Rebeca Mendoza-Lara, George Gaskell, Sally Stares, Miriam Bidoglia, Amarsanaa Ganbold, Altangerel Chagnaa, Luca Cernuzzi, Alethia Hume, Ronald Chenu-Abente, Roy Alia Asiku, Ivan Kayongo, Daniel Gatica-Perez, Amalia de Götzen, Ivano Bison, Fausto Giunchiglia
TL;DR
DiversityOne tackles the lack of cross-country smartphone sensing data by introducing a large-scale, multi-country dataset collected from 782 college students in eight countries over four weeks. The project combines 26 raw smartphone sensor modalities with intensive in-situ self-reports (time diaries and psychosocial surveys) via the iLog app, enabling domain adaptation and cross-cultural generalization studies while adhering to GDPR and ethical standards. The paper details a comprehensive methodology including cross-country questionnaire translation, intensive longitudinal data collection, privacy-preserving data management, and a flexible data catalog with secure access, as well as validation results on engagement and data quality. These resources support advanced multimodal modeling, personalized and region-specific analyses, and methodological guidance for future cross-country digital-health and ubiquitous computing research.
Abstract
Understanding everyday life behavior of young adults through personal devices, e.g., smartphones and smartwatches, is key for various applications, from enhancing the user experience in mobile apps to enabling appropriate interventions in digital health apps. Towards this goal, previous studies have relied on datasets combining passive sensor data with human-provided annotations or self-reports. However, many existing datasets are limited in scope, often focusing on specific countries primarily in the Global North, involving a small number of participants, or using a limited range of pre-processed sensors. These limitations restrict the ability to capture cross-country variations of human behavior, including the possibility of studying model generalization, and robustness. To address this gap, we introduce DiversityOne, a dataset which spans eight countries (China, Denmark, India, Italy, Mexico, Mongolia, Paraguay, and the United Kingdom) and includes data from 782 college students over four weeks. DiversityOne contains data from 26 smartphone sensor modalities and 350K+ self-reports. As of today, it is one of the largest and most diverse publicly available datasets, while featuring extensive demographic and psychosocial survey data. DiversityOne opens the possibility of studying important research problems in ubiquitous computing, particularly in domain adaptation and generalization across countries, all research areas so far largely underexplored because of the lack of adequate datasets.
