Lifelogging As An Extreme Form of Personal Information Management -- What Lessons To Learn
Ly-Duyen Tran, Cathal Gurrin, Alan F. Smeaton
TL;DR
The paper analyzes lifelogging as an extreme form of personal information management and surveys how lifelog data are generated, stored, and made accessible. It reviews historical systems like MyLifeBits and modern benchmarks such as the Lifelog Search Challenge and TimelineQA to understand processing, indexing, and retrieval workflows. A multi-modal processing pipeline is discussed, including visual processing (low-level features, concepts, captions, embeddings), metadata handling, and organizational strategies, with emphasis on cross-modal retrieval and temporal search. It also highlights the potential of fine-tuning large language models via Retrieval Augmented Generation to unify across data sources and enable conversational access, while emphasizing privacy and data-protection considerations as a critical challenge for widespread adoption.
Abstract
Personal data includes the digital footprints that we leave behind as part of our everyday activities, both online and offline in the real world. It includes data we collect ourselves, such as from wearables, as well as the data collected by others about our online behaviour and activities. Sometimes we are able to use the personal data we ourselves collect, in order to examine some parts of our lives but for the most part, our personal data is leveraged by third parties including internet companies, for services like targeted advertising and recommendations. Lifelogging is a form of extreme personal data gathering and in this article we present an overview of the tools used to manage access to lifelogs as demonstrated at the most recent of the annual Lifelog Search Challenge benchmarking workshops. Here, experimental systems are showcased in live, real time information seeking tasks by real users. This overview of these systems' capabilities show the range of possibilities for accessing our own personal data which may, in time, become more easily available as consumer-level services.
