Table of Contents
Fetching ...

Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset

Mahta Bakhshizadeh, Christian Jilek, Markus Schröder, Heiko Maus, Andreas Dengel

TL;DR

RLKWiC addresses the scarcity of public, multi-dimensional datasets for real-life knowledge work by recording explicit contexts, textual content, and semantics from eight volunteers over two months using a cSpaces/UAT pipeline built on PIMO-inspired personal knowledge graphs. It introduces a structured dataset comprising contexts, documents, terms/semantics, events, and sessions, augmented with privacy-oriented post-filtering and session annotation to produce a rich, analyzable resource. The dataset captures over 61,000 desktop events, 56 contexts, 211 concepts, 393 DBpedia resources, 6.4k KG nodes, 3.1k KG SPOs, and 650+ sessions with 12 knowledge-work actions, enabling offline benchmarking for PIM, PID, and context-aware information retrieval. By providing explicated contexts together with content and semantics, RLKWiC supports advances in user-behavior modeling and the development of personal knowledge assistants for knowledge workers.

Abstract

Over the years, various approaches have been employed to enhance the productivity of knowledge workers, from addressing psychological well-being to the development of personal knowledge assistants. A significant challenge in this research area has been the absence of a comprehensive, publicly accessible dataset that mirrors real-world knowledge work. Although a handful of datasets exist, many are restricted in access or lack vital information dimensions, complicating meaningful comparison and benchmarking in the domain. This paper presents RLKWiC, a novel dataset of Real-Life Knowledge Work in Context, derived from monitoring the computer interactions of eight participants over a span of two months. As the first publicly available dataset offering a wealth of essential information dimensions (such as explicated contexts, textual contents, and semantics), RLKWiC seeks to address the research gap in the personal information management domain, providing valuable insights for modeling user behavior.

Data Collection of Real-Life Knowledge Work in Context: The RLKWiC Dataset

TL;DR

RLKWiC addresses the scarcity of public, multi-dimensional datasets for real-life knowledge work by recording explicit contexts, textual content, and semantics from eight volunteers over two months using a cSpaces/UAT pipeline built on PIMO-inspired personal knowledge graphs. It introduces a structured dataset comprising contexts, documents, terms/semantics, events, and sessions, augmented with privacy-oriented post-filtering and session annotation to produce a rich, analyzable resource. The dataset captures over 61,000 desktop events, 56 contexts, 211 concepts, 393 DBpedia resources, 6.4k KG nodes, 3.1k KG SPOs, and 650+ sessions with 12 knowledge-work actions, enabling offline benchmarking for PIM, PID, and context-aware information retrieval. By providing explicated contexts together with content and semantics, RLKWiC supports advances in user-behavior modeling and the development of personal knowledge assistants for knowledge workers.

Abstract

Over the years, various approaches have been employed to enhance the productivity of knowledge workers, from addressing psychological well-being to the development of personal knowledge assistants. A significant challenge in this research area has been the absence of a comprehensive, publicly accessible dataset that mirrors real-world knowledge work. Although a handful of datasets exist, many are restricted in access or lack vital information dimensions, complicating meaningful comparison and benchmarking in the domain. This paper presents RLKWiC, a novel dataset of Real-Life Knowledge Work in Context, derived from monitoring the computer interactions of eight participants over a span of two months. As the first publicly available dataset offering a wealth of essential information dimensions (such as explicated contexts, textual contents, and semantics), RLKWiC seeks to address the research gap in the personal information management domain, providing valuable insights for modeling user behavior.
Paper Structure (17 sections, 1 figure, 3 tables)

This paper contains 17 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: An overview of the post-annotation app consisted of five sections; A: Context information, B: Adding DBpedia resources to context with a suggestion feature, C: Selecting the general activity (set of KWA) for the context, D: An overview of a session in this sample context including various information, E: Session relevance (in/out of context) and session's activity.