Table of Contents
Fetching ...

Understanding Practitioners Perspectives on Monitoring Machine Learning Systems

Hira Naveed, John Grundy, Chetan Arora, Hourieh Khalajzadeh, Omar Haggag

TL;DR

The paper tackles the problem of monitoring production ML systems amid non-determinism and data distribution shifts by conducting a global survey of 91 practitioners. It combines qualitative and quantitative analyses to map runtime issues, monitoring practices, and improvement needs, revealing prevalent concerns around model performance, latency, and security/fairness. Practitioners prefer automated monitoring but face setup complexity and alert fatigue, and they call for automated monitor generation, improved performance/fairness monitoring, and domain-specific tooling. The study offers practical guidance for practitioners and tool builders, highlighting design-for-monitoring, low-code solutions, and responsible ML monitoring as key directions for future ML monitoring tools. These insights aim to close the gap between research and industry needs and drive more usable, scalable monitoring in real-world ML systems.

Abstract

Given the inherent non-deterministic nature of machine learning (ML) systems, their behavior in production environments can lead to unforeseen and potentially dangerous outcomes. For a timely detection of unwanted behavior and to prevent organizations from financial and reputational damage, monitoring these systems is essential. This paper explores the strategies, challenges, and improvement opportunities for monitoring ML systems from the practitioners perspective. We conducted a global survey of 91 ML practitioners to collect diverse insights into current monitoring practices for ML systems. We aim to complement existing research through our qualitative and quantitative analyses, focusing on prevalent runtime issues, industrial monitoring and mitigation practices, key challenges, and desired enhancements in future monitoring tools. Our findings reveal that practitioners frequently struggle with runtime issues related to declining model performance, exceeding latency, and security violations. While most prefer automated monitoring for its increased efficiency, many still rely on manual approaches due to the complexity or lack of appropriate automation solutions. Practitioners report that the initial setup and configuration of monitoring tools is often complicated and challenging, particularly when integrating with ML systems and setting alert thresholds. Moreover, practitioners find that monitoring adds extra workload, strains resources, and causes alert fatigue. The desired improvements from the practitioners perspective are: automated generation and deployment of monitors, improved support for performance and fairness monitoring, and recommendations for resolving runtime issues. These insights offer valuable guidance for the future development of ML monitoring tools that are better aligned with practitioners needs.

Understanding Practitioners Perspectives on Monitoring Machine Learning Systems

TL;DR

The paper tackles the problem of monitoring production ML systems amid non-determinism and data distribution shifts by conducting a global survey of 91 practitioners. It combines qualitative and quantitative analyses to map runtime issues, monitoring practices, and improvement needs, revealing prevalent concerns around model performance, latency, and security/fairness. Practitioners prefer automated monitoring but face setup complexity and alert fatigue, and they call for automated monitor generation, improved performance/fairness monitoring, and domain-specific tooling. The study offers practical guidance for practitioners and tool builders, highlighting design-for-monitoring, low-code solutions, and responsible ML monitoring as key directions for future ML monitoring tools. These insights aim to close the gap between research and industry needs and drive more usable, scalable monitoring in real-world ML systems.

Abstract

Given the inherent non-deterministic nature of machine learning (ML) systems, their behavior in production environments can lead to unforeseen and potentially dangerous outcomes. For a timely detection of unwanted behavior and to prevent organizations from financial and reputational damage, monitoring these systems is essential. This paper explores the strategies, challenges, and improvement opportunities for monitoring ML systems from the practitioners perspective. We conducted a global survey of 91 ML practitioners to collect diverse insights into current monitoring practices for ML systems. We aim to complement existing research through our qualitative and quantitative analyses, focusing on prevalent runtime issues, industrial monitoring and mitigation practices, key challenges, and desired enhancements in future monitoring tools. Our findings reveal that practitioners frequently struggle with runtime issues related to declining model performance, exceeding latency, and security violations. While most prefer automated monitoring for its increased efficiency, many still rely on manual approaches due to the complexity or lack of appropriate automation solutions. Practitioners report that the initial setup and configuration of monitoring tools is often complicated and challenging, particularly when integrating with ML systems and setting alert thresholds. Moreover, practitioners find that monitoring adds extra workload, strains resources, and causes alert fatigue. The desired improvements from the practitioners perspective are: automated generation and deployment of monitors, improved support for performance and fairness monitoring, and recommendations for resolving runtime issues. These insights offer valuable guidance for the future development of ML monitoring tools that are better aligned with practitioners needs.

Paper Structure

This paper contains 29 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Geographical Location
  • Figure 2: Domain
  • Figure 3: Monitoring Technique with respect to Experience Level
  • Figure 4: Time Taken To Identify and Mitigate Runtime Issues
  • Figure 5: Monitoring Priorities