Table of Contents
Fetching ...

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Shuai Liu, Shulin Tian, Kairui Hu, Yuhao Dong, Zhe Yang, Bo Li, Jingkang Yang, Chen Change Loy, Ziwei Liu

Abstract

Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human-AI interaction; however, effective personalization remains limited by severe data constraints, as strict privacy barriers and the difficulty of jointly collecting multimodal real-world traces prevent scalable training and evaluation, and existing methods remain interaction-centric while overlooking dense behavioral traces in file-system operations; to address this gap, we propose FileGram, a comprehensive framework that grounds agent memory and personalization in file-system behavioral traces, comprising three core components: (1) FileGramEngine, a scalable persona-driven data engine that simulates realistic workflows and generates fine-grained multimodal action sequences at scale; (2) FileGramBench, a diagnostic benchmark grounded in file-system behavioral traces for evaluating memory systems on profile reconstruction, trace disentanglement, persona drift detection, and multimodal grounding; and (3) FileGramOS, a bottom-up memory architecture that builds user profiles directly from atomic actions and content deltas rather than dialogue summaries, encoding these traces into procedural, semantic, and episodic channels with query-time abstraction; extensive experiments show that FileGramBench remains challenging for state-of-the-art memory systems and that FileGramEngine and FileGramOS are effective, and by open-sourcing the framework, we hope to support future research on personalized memory-centric file-system agents.

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Abstract

Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human-AI interaction; however, effective personalization remains limited by severe data constraints, as strict privacy barriers and the difficulty of jointly collecting multimodal real-world traces prevent scalable training and evaluation, and existing methods remain interaction-centric while overlooking dense behavioral traces in file-system operations; to address this gap, we propose FileGram, a comprehensive framework that grounds agent memory and personalization in file-system behavioral traces, comprising three core components: (1) FileGramEngine, a scalable persona-driven data engine that simulates realistic workflows and generates fine-grained multimodal action sequences at scale; (2) FileGramBench, a diagnostic benchmark grounded in file-system behavioral traces for evaluating memory systems on profile reconstruction, trace disentanglement, persona drift detection, and multimodal grounding; and (3) FileGramOS, a bottom-up memory architecture that builds user profiles directly from atomic actions and content deltas rather than dialogue summaries, encoding these traces into procedural, semantic, and episodic channels with query-time abstraction; extensive experiments show that FileGramBench remains challenging for state-of-the-art memory systems and that FileGramEngine and FileGramOS are effective, and by open-sourcing the framework, we hope to support future research on personalized memory-centric file-system agents.

Paper Structure

This paper contains 35 sections, 2 equations, 7 figures, 14 tables.

Figures (7)

  • Figure 1: Overview of the FileGram Project. FileGram introduces a personalized AI coworker natively integrated into the user file system. By consolidating cross-session activities and file outputs into long-term behavioral memory, the agent infers intent and proactively synchronizes workspaces, establishing a new paradigm for real-world interactive coworking.
  • Figure 2: Data generation pipeline. FileGramEngine generates one trajectory per profile--task pair. Agents execute in profile-isolated workspaces for each task; raw tool traces are filtered and canonicalized to retain real action signals while removing simulation artifacts, and outputs are materialized as standardized behavioral traces with aligned text/document/visual views for cross-modal evaluation.
  • Figure 3: Data distribution. 20 profiles $\times$ 32 tasks yield 640 trajectories comprising ${\sim}$10K output files and 20,028 atomic actions.
  • Figure 4: FileGramQA distribution. 4.6K questions by track (inner) and sub-task (outer).
  • Figure 5: QA examples from FileGramBench. Representative questions from the four tracks, including both MCQ and open-ended formats.
  • ...and 2 more figures