Table of Contents
Fetching ...

Multiversion Hindsight Logging for Continuous Training

Rolando Garcia, Anusha Dandamudi, Gabriel Matute, Lehan Wan, Joseph Gonzalez, Joseph M. Hellerstein, Koushik Sen

TL;DR

FlorDB introduces Multiversion Hindsight Logging, which allows engineers to use the most recent version's logging statements to query past versions, even when older versions logged different data, and presents a unified relational model for efficient handling of historical queries.

Abstract

Production Machine Learning involves continuous training: hosting multiple versions of models over time, often with many model versions running at once. When model performance does not meet expectations, Machine Learning Engineers (MLEs) debug issues by exploring and analyzing numerous prior versions of code and training data to identify root causes and mitigate problems. Traditional debugging and logging tools often fall short in managing this experimental, multi-version context. FlorDB introduces Multiversion Hindsight Logging, which allows engineers to use the most recent version's logging statements to query past versions, even when older versions logged different data. Log statement propagation enables consistent injection of logging statements into past code versions, regardless of changes to the codebase. Once log statements are propagated across code versions, the remaining challenge in Multiversion Hindsight Logging is to efficiently replay the new log statements based on checkpoints from previous runs. Finally, a coherent user experience is required to help MLEs debug across all versions of code and data. To this end, FlorDB presents a unified relational model for efficient handling of historical queries, offering a comprehensive view of the log history to simplify the exploration of past code iterations. We present a performance evaluation on diverse benchmarks confirming its scalability and the ability to deliver real-time query responses, leveraging query-based filtering and checkpoint-based parallelism for efficient replay.

Multiversion Hindsight Logging for Continuous Training

TL;DR

FlorDB introduces Multiversion Hindsight Logging, which allows engineers to use the most recent version's logging statements to query past versions, even when older versions logged different data, and presents a unified relational model for efficient handling of historical queries.

Abstract

Production Machine Learning involves continuous training: hosting multiple versions of models over time, often with many model versions running at once. When model performance does not meet expectations, Machine Learning Engineers (MLEs) debug issues by exploring and analyzing numerous prior versions of code and training data to identify root causes and mitigate problems. Traditional debugging and logging tools often fall short in managing this experimental, multi-version context. FlorDB introduces Multiversion Hindsight Logging, which allows engineers to use the most recent version's logging statements to query past versions, even when older versions logged different data. Log statement propagation enables consistent injection of logging statements into past code versions, regardless of changes to the codebase. Once log statements are propagated across code versions, the remaining challenge in Multiversion Hindsight Logging is to efficiently replay the new log statements based on checkpoints from previous runs. Finally, a coherent user experience is required to help MLEs debug across all versions of code and data. To this end, FlorDB presents a unified relational model for efficient handling of historical queries, offering a comprehensive view of the log history to simplify the exploration of past code iterations. We present a performance evaluation on diverse benchmarks confirming its scalability and the ability to deliver real-time query responses, leveraging query-based filtering and checkpoint-based parallelism for efficient replay.
Paper Structure (33 sections, 11 figures, 1 table)

This paper contains 33 sections, 11 figures, 1 table.

Figures (11)

  • Figure 1: Average training losses and F1-scores for Alice's object detection model over the last 6 months. The model undergoes continuous training, with batches of labeled data added approximately twice a month. These batch dumps result in temporary fluctuations in the loss. F1-round is the F1-score for roundabouts; F1-score is the global F1-score.
  • Figure 2: BDD100K dashcam images with bounding boxes. Top row contains sample images used by Alice for fine-tuning on roundabouts; bottom row corresponds to images for which the model fails to detect pedestrians.
  • Figure 3: Alice's PyTorch training with Flor API.
  • Figure 4: Alice's before and after hindsight logging (black and green text, respectively).
  • Figure 5: FlorDB architecture diagram with subsection headers in parentheses.
  • ...and 6 more figures