Table of Contents
Fetching ...

Searching Personal Collections

Michael Bendersky, Donald Metzler, Marc Najork, Xuanhui Wang

TL;DR

Searching Personal Collections surveys the evolution of information retrieval for personal document collections—emails, files, and lifelog-like assets—contrast­ing them with public collections and emphasizing refinding known items. It covers organization, labeling, automatic classification, email threading, known-item and refinding tasks, episodic memory, test collections, ranking, desktop search, file recommendation, cloud infrastructure, and human digital memory, illustrating a shift toward recall-driven design and memory augmentation. The chapter also addresses privacy and security considerations in cloud-based personal collections and discusses infrastructure trade-offs such as indexing strategies and document-sharing mechanisms. Looking forward, it envisions federated search, task-based assistance, and deeper integration with virtual assistants and large-language models to support sophisticated, privacy-preserving personal information retrieval.

Abstract

This article describes the history of information retrieval on personal document collections.

Searching Personal Collections

TL;DR

Searching Personal Collections surveys the evolution of information retrieval for personal document collections—emails, files, and lifelog-like assets—contrast­ing them with public collections and emphasizing refinding known items. It covers organization, labeling, automatic classification, email threading, known-item and refinding tasks, episodic memory, test collections, ranking, desktop search, file recommendation, cloud infrastructure, and human digital memory, illustrating a shift toward recall-driven design and memory augmentation. The chapter also addresses privacy and security considerations in cloud-based personal collections and discusses infrastructure trade-offs such as indexing strategies and document-sharing mechanisms. Looking forward, it envisions federated search, task-based assistance, and deeper integration with virtual assistants and large-language models to support sophisticated, privacy-preserving personal information retrieval.

Abstract

This article describes the history of information retrieval on personal document collections.

Paper Structure

This paper contains 18 sections.