Searching Personal Collections
Michael Bendersky, Donald Metzler, Marc Najork, Xuanhui Wang
TL;DR
Searching Personal Collections surveys the evolution of information retrieval for personal document collections—emails, files, and lifelog-like assets—contrasting them with public collections and emphasizing refinding known items. It covers organization, labeling, automatic classification, email threading, known-item and refinding tasks, episodic memory, test collections, ranking, desktop search, file recommendation, cloud infrastructure, and human digital memory, illustrating a shift toward recall-driven design and memory augmentation. The chapter also addresses privacy and security considerations in cloud-based personal collections and discusses infrastructure trade-offs such as indexing strategies and document-sharing mechanisms. Looking forward, it envisions federated search, task-based assistance, and deeper integration with virtual assistants and large-language models to support sophisticated, privacy-preserving personal information retrieval.
Abstract
This article describes the history of information retrieval on personal document collections.
