Table of Contents
Fetching ...

Compressing VAE-Based Out-of-Distribution Detectors for Embedded Deployment

Aditya Bansal, Michael Yuhas, Arvind Easwaran

TL;DR

This work considers the class of variational autoencoder (VAE) based OOD detectors where OOD detection is performed in latent space, and proposes a design methodology that combines all three compression techniques and yields a significant decrease in memory and execution time while maintaining AUROC for a given OOD detector.

Abstract

Out-of-distribution (OOD) detectors can act as safety monitors in embedded cyber-physical systems by identifying samples outside a machine learning model's training distribution to prevent potentially unsafe actions. However, OOD detectors are often implemented using deep neural networks, which makes it difficult to meet real-time deadlines on embedded systems with memory and power constraints. We consider the class of variational autoencoder (VAE) based OOD detectors where OOD detection is performed in latent space, and apply quantization, pruning, and knowledge distillation. These techniques have been explored for other deep models, but no work has considered their combined effect on latent space OOD detection. While these techniques increase the VAE's test loss, this does not correspond to a proportional decrease in OOD detection performance and we leverage this to develop lean OOD detectors capable of real-time inference on embedded CPUs and GPUs. We propose a design methodology that combines all three compression techniques and yields a significant decrease in memory and execution time while maintaining AUROC for a given OOD detector. We demonstrate this methodology with two existing OOD detectors on a Jetson Nano and reduce GPU and CPU inference time by 20% and 28% respectively while keeping AUROC within 5% of the baseline.

Compressing VAE-Based Out-of-Distribution Detectors for Embedded Deployment

TL;DR

This work considers the class of variational autoencoder (VAE) based OOD detectors where OOD detection is performed in latent space, and proposes a design methodology that combines all three compression techniques and yields a significant decrease in memory and execution time while maintaining AUROC for a given OOD detector.

Abstract

Out-of-distribution (OOD) detectors can act as safety monitors in embedded cyber-physical systems by identifying samples outside a machine learning model's training distribution to prevent potentially unsafe actions. However, OOD detectors are often implemented using deep neural networks, which makes it difficult to meet real-time deadlines on embedded systems with memory and power constraints. We consider the class of variational autoencoder (VAE) based OOD detectors where OOD detection is performed in latent space, and apply quantization, pruning, and knowledge distillation. These techniques have been explored for other deep models, but no work has considered their combined effect on latent space OOD detection. While these techniques increase the VAE's test loss, this does not correspond to a proportional decrease in OOD detection performance and we leverage this to develop lean OOD detectors capable of real-time inference on embedded CPUs and GPUs. We propose a design methodology that combines all three compression techniques and yields a significant decrease in memory and execution time while maintaining AUROC for a given OOD detector. We demonstrate this methodology with two existing OOD detectors on a Jetson Nano and reduce GPU and CPU inference time by 20% and 28% respectively while keeping AUROC within 5% of the baseline.
Paper Structure (8 sections, 7 figures, 2 tables)

This paper contains 8 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Our design methodology for compressing a VAE-based OOD detector using pruning informed knowledge distillation and quantization.
  • Figure 2: Block diagrams of the OOD detectors considered in our case studies.
  • Figure 3: Sample images reconstructed using the $\beta$-VAE model
  • Figure 4: AUROC and total reconstruction loss for the $\beta$-VAE detector at different sparsity levels using pruning
  • Figure 5: Reconstruction loss and AUROC of the $\beta$-VAE detector across different compression techniques
  • ...and 2 more figures