Table of Contents
Fetching ...

DiversityOne: A Multi-Country Smartphone Sensor Dataset for Everyday Life Behavior Modeling

Matteo Busso, Andrea Bontempelli, Leonardo Javier Malcotti, Lakmal Meegahapola, Peter Kun, Shyam Diwakar, Chaitanya Nutakki, Marcelo Dario Rodas Britez, Hao Xu, Donglei Song, Salvador Ruiz Correa, Andrea-Rebeca Mendoza-Lara, George Gaskell, Sally Stares, Miriam Bidoglia, Amarsanaa Ganbold, Altangerel Chagnaa, Luca Cernuzzi, Alethia Hume, Ronald Chenu-Abente, Roy Alia Asiku, Ivan Kayongo, Daniel Gatica-Perez, Amalia de Götzen, Ivano Bison, Fausto Giunchiglia

TL;DR

DiversityOne tackles the lack of cross-country smartphone sensing data by introducing a large-scale, multi-country dataset collected from 782 college students in eight countries over four weeks. The project combines 26 raw smartphone sensor modalities with intensive in-situ self-reports (time diaries and psychosocial surveys) via the iLog app, enabling domain adaptation and cross-cultural generalization studies while adhering to GDPR and ethical standards. The paper details a comprehensive methodology including cross-country questionnaire translation, intensive longitudinal data collection, privacy-preserving data management, and a flexible data catalog with secure access, as well as validation results on engagement and data quality. These resources support advanced multimodal modeling, personalized and region-specific analyses, and methodological guidance for future cross-country digital-health and ubiquitous computing research.

Abstract

Understanding everyday life behavior of young adults through personal devices, e.g., smartphones and smartwatches, is key for various applications, from enhancing the user experience in mobile apps to enabling appropriate interventions in digital health apps. Towards this goal, previous studies have relied on datasets combining passive sensor data with human-provided annotations or self-reports. However, many existing datasets are limited in scope, often focusing on specific countries primarily in the Global North, involving a small number of participants, or using a limited range of pre-processed sensors. These limitations restrict the ability to capture cross-country variations of human behavior, including the possibility of studying model generalization, and robustness. To address this gap, we introduce DiversityOne, a dataset which spans eight countries (China, Denmark, India, Italy, Mexico, Mongolia, Paraguay, and the United Kingdom) and includes data from 782 college students over four weeks. DiversityOne contains data from 26 smartphone sensor modalities and 350K+ self-reports. As of today, it is one of the largest and most diverse publicly available datasets, while featuring extensive demographic and psychosocial survey data. DiversityOne opens the possibility of studying important research problems in ubiquitous computing, particularly in domain adaptation and generalization across countries, all research areas so far largely underexplored because of the lack of adequate datasets.

DiversityOne: A Multi-Country Smartphone Sensor Dataset for Everyday Life Behavior Modeling

TL;DR

DiversityOne tackles the lack of cross-country smartphone sensing data by introducing a large-scale, multi-country dataset collected from 782 college students in eight countries over four weeks. The project combines 26 raw smartphone sensor modalities with intensive in-situ self-reports (time diaries and psychosocial surveys) via the iLog app, enabling domain adaptation and cross-cultural generalization studies while adhering to GDPR and ethical standards. The paper details a comprehensive methodology including cross-country questionnaire translation, intensive longitudinal data collection, privacy-preserving data management, and a flexible data catalog with secure access, as well as validation results on engagement and data quality. These resources support advanced multimodal modeling, personalized and region-specific analyses, and methodological guidance for future cross-country digital-health and ubiquitous computing research.

Abstract

Understanding everyday life behavior of young adults through personal devices, e.g., smartphones and smartwatches, is key for various applications, from enhancing the user experience in mobile apps to enabling appropriate interventions in digital health apps. Towards this goal, previous studies have relied on datasets combining passive sensor data with human-provided annotations or self-reports. However, many existing datasets are limited in scope, often focusing on specific countries primarily in the Global North, involving a small number of participants, or using a limited range of pre-processed sensors. These limitations restrict the ability to capture cross-country variations of human behavior, including the possibility of studying model generalization, and robustness. To address this gap, we introduce DiversityOne, a dataset which spans eight countries (China, Denmark, India, Italy, Mexico, Mongolia, Paraguay, and the United Kingdom) and includes data from 782 college students over four weeks. DiversityOne contains data from 26 smartphone sensor modalities and 350K+ self-reports. As of today, it is one of the largest and most diverse publicly available datasets, while featuring extensive demographic and psychosocial survey data. DiversityOne opens the possibility of studying important research problems in ubiquitous computing, particularly in domain adaptation and generalization across countries, all research areas so far largely underexplored because of the lack of adequate datasets.

Paper Structure

This paper contains 51 sections, 8 figures, 14 tables.

Figures (8)

  • Figure 1: Study set-up and data collection process. Invitation, Selection, and Closing procedures were done using the LimeSurvey platform. The 1st Phase and 2nd Phase were done using the iLog mobile app.
  • Figure 2: The iLog app adopted for intensive longitudinal survey and sensor data collection.
  • Figure 3: Distribution of participants based on the number of daily diaries completed. Each dot represents a participant.
  • Figure 4: Percentage of participants at each pilot site providing sensor data for each day.
  • Figure 5: Average percentage of participants that provided sensor data for each site.
  • ...and 3 more figures