Table of Contents
Fetching ...

A methodology and a platform for high-quality rich personal data

Ivan Kayongo, Leonardo Malcotti, Haonan Zhao, Fausto Giunchiglia

TL;DR

This work tackles the challenge of generating rich big-thick data by fusing sensor streams with qualitative input through a methodology and the iLog platform. It introduces a situational-context model, a temporal scheduling language (iLogCal), a governance dashboard, and a machine learning component to optimize when to ask questions, showing improved data quality in a university student study. Key findings include the feasibility of collecting 34 sensors alongside time-diaries, and the ability to predict answer quality using contextual signals with reasonable accuracy, enabling targeted, less-intrusive data collection. The platform emphasizes GDPR-compliant, participant-centric data control and has practical implications for diverse domains requiring nuanced, context-aware data collection in real-world settings.

Abstract

In the last years the pervasive use of sensors, as they exist in smart devices, e.g., phones, watches, medical devices, has increased dramatically the availability of personal data. However, existing research on data collection primarily focuses on the objective view of reality, as provided, for instance, by sensors, often neglecting the integration of subjective human input, as provided, for instance, by user answers to questionnaires. This limits substantially the exploitability of the collected data. In this paper we present a methodology and a platform specifically designed for the collection of a combination of large-scale sensor data and qualitative human feedback. The methodology has been designed to be deployed on top, and enriches the functionalities of, an existing data collection APP, called iLog, which has been used in large scale, worldwide data collection experiments. The main goal is to put the key actors involved in an experiment, i.e., the researcher in charge, the participant, and iLog in better control of the experiment itself, thus enabling a much improved quality and richness of the data collected. The novel functionalities of the resulting platform are: (i) a time-wise representation of the situational context within which the data collection is performed, (ii) an explicit representation of the temporal context within which the data collection is performed, (iii) a calendar-based dashboard for the real-time monitoring of the data collection context(s), and, finally, (iv) a mechanism for the run-time revision of the data collection plan. The practicality and utility of the proposed functionalities are demonstrated by showing how they apply to a case study involving 350 University students.

A methodology and a platform for high-quality rich personal data

TL;DR

This work tackles the challenge of generating rich big-thick data by fusing sensor streams with qualitative input through a methodology and the iLog platform. It introduces a situational-context model, a temporal scheduling language (iLogCal), a governance dashboard, and a machine learning component to optimize when to ask questions, showing improved data quality in a university student study. Key findings include the feasibility of collecting 34 sensors alongside time-diaries, and the ability to predict answer quality using contextual signals with reasonable accuracy, enabling targeted, less-intrusive data collection. The platform emphasizes GDPR-compliant, participant-centric data control and has practical implications for diverse domains requiring nuanced, context-aware data collection in real-world settings.

Abstract

In the last years the pervasive use of sensors, as they exist in smart devices, e.g., phones, watches, medical devices, has increased dramatically the availability of personal data. However, existing research on data collection primarily focuses on the objective view of reality, as provided, for instance, by sensors, often neglecting the integration of subjective human input, as provided, for instance, by user answers to questionnaires. This limits substantially the exploitability of the collected data. In this paper we present a methodology and a platform specifically designed for the collection of a combination of large-scale sensor data and qualitative human feedback. The methodology has been designed to be deployed on top, and enriches the functionalities of, an existing data collection APP, called iLog, which has been used in large scale, worldwide data collection experiments. The main goal is to put the key actors involved in an experiment, i.e., the researcher in charge, the participant, and iLog in better control of the experiment itself, thus enabling a much improved quality and richness of the data collected. The novel functionalities of the resulting platform are: (i) a time-wise representation of the situational context within which the data collection is performed, (ii) an explicit representation of the temporal context within which the data collection is performed, (iii) a calendar-based dashboard for the real-time monitoring of the data collection context(s), and, finally, (iv) a mechanism for the run-time revision of the data collection plan. The practicality and utility of the proposed functionalities are demonstrated by showing how they apply to a case study involving 350 University students.

Paper Structure

This paper contains 11 sections, 2 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Sample questions captured in the WeNet project
  • Figure 2: An example of everyday life sequence.
  • Figure 3: The knowledge Graph of the third situation context in Fig. \ref{['fig:context-everyday-life']}.
  • Figure 4: Experiment General schedule.
  • Figure 5: Question collection.
  • ...and 8 more figures