Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments

Angie Boggust; Venkatesh Sivaraman; Yannick Assogba; Donghao Ren; Dominik Moritz; Fred Hohman

Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments

Angie Boggust, Venkatesh Sivaraman, Yannick Assogba, Donghao Ren, Dominik Moritz, Fred Hohman

TL;DR

Compress and Compare tackles the lack of integrated tools for comparing multiple ML compression experiments. It introduces an interactive visualization system combining a Model Map (provenance), a Model Scatterplot, a Selection Details view, and a Performance Comparison view to surface accuracy-efficiency trade-offs and compression-induced behavior changes. Through case studies on generative language models and image classifiers and an expert user study, it demonstrates how unified views help debug failed compressions, identify artifacts, and build intuition about compression strategies. The work highlights compression-specific design considerations and suggests how visualization tools can improve real-world compression workflows and collaboration.

Abstract

To deploy machine learning models on-device, practitioners use compression algorithms to shrink and speed up models while maintaining their high-quality output. A critical aspect of compression in practice is model comparison, including tracking many compression experiments, identifying subtle changes in model behavior, and negotiating complex accuracy-efficiency trade-offs. However, existing compression tools poorly support comparison, leading to tedious and, sometimes, incomplete analyses spread across disjoint tools. To support real-world comparative workflows, we develop an interactive visual system called Compress and Compare. Within a single interface, Compress and Compare surfaces promising compression strategies by visualizing provenance relationships between compressed models and reveals compression-induced behavior changes by comparing models' predictions, weights, and activations. We demonstrate how Compress and Compare supports common compression analysis tasks through two case studies, debugging failed compression on generative language models and identifying compression artifacts in image classification models. We further evaluate Compress and Compare in a user study with eight compression experts, illustrating its potential to provide structure to compression workflows, help practitioners build intuition about compression, and encourage thorough analysis of compression's effect on model behavior. Through these evaluations, we identify compression-specific challenges that future visual analytics tools should consider and Compress and Compare visualizations that may generalize to broader model comparison tasks.

Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments

TL;DR

Abstract

Paper Structure (23 sections, 4 figures, 1 table)

This paper contains 23 sections, 4 figures, 1 table.

Introduction
Background and Related Work
Techniques for Model Compression
Pitfalls in Evaluating Compressed Models
Visualization for Model Understanding and Comparison
Design Challenges for Compression
Design of Compress and Compare
Compression Overview
Performance Comparison
Setup and Implementation Details
Case Studies of Common Compression Tasks
Repairing Models Broken By Compression
Discovering Compression Artifacts
User Study with ML Practitioners
Study Methods
...and 8 more sections

Figures (4)

Figure 1: Hovering over a Model Map model displays a tooltip containing the models' top-level metrics, including latency, size, sparsity, accuracy, and compression operation. Here, the selected model has been $50\%$ pruned, improving its latency and size but reducing its accuracy.
Figure 4: The Performance Comparison view provides an in-depth comparison of two or more models. The Behaviors tab (left) displays differences between models' predictions, distributions of comparison metrics, and a breakdown of the selected comparison metric at the class or instance level. Meanwhile, the Layers tab (right) compares the sparsity, weights, and activations across layers in the models using a file tree structure.
Figure 5: Compress and Compare helps debug compression experiments. On a generative question-answering task (A), the Behaviors view reveals that global magnitude pruning severely deteriorates generation quality (B), whereas layer-specific pruning matches the original model's behavior (C).
Figure 6: Compress and Compare can help identify compression induced bias and perform data auditing. The Behaviors view reveals that compressing image classification models (A) disproportionately impacts rare classes (B) by forgetting hard-to-classify images (C).

Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments

TL;DR

Abstract

Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments

Authors

TL;DR

Abstract

Table of Contents

Figures (4)