Table of Contents
Fetching ...

Datasheets for Machine Learning Sensors

Matthew Stewart, Yuke Zhang, Pete Warden, Yasmine Omri, Shvetank Prakash, Jacob Huckelberry, Joao Henrique Santos, Shawn Hymel, Benjamin Yeager Brown, Jim MacArthur, Nat Jeffries, Emanuel Moss, Mona Sloane, Brian Plancher, Vijay Janapa Reddi

TL;DR

ML sensors fuse sensing hardware with embedded ML, creating transparency and auditability requirements. To address this, the authors propose a dedicated datasheet framework for ML sensors organized into Standard Sensor, IoT, AI, and ML Sensor components, developed through stakeholder engagement and open-sourcing. They demonstrate the framework on two person-detection sensors (one open-source, one commercial) to show how end-to-end performance, environmental impact, privacy, and model–data details can be documented. Aligned with FAIR principles, the framework aims to support reproducibility, regulatory compliance, and responsible deployment of edge AI in diverse real-world contexts.

Abstract

Machine learning (ML) is becoming prevalent in embedded AI sensing systems. These "ML sensors" enable context-sensitive, real-time data collection and decision-making across diverse applications ranging from anomaly detection in industrial settings to wildlife tracking for conservation efforts. As such, there is a need to provide transparency in the operation of such ML-enabled sensing systems through comprehensive documentation. This is needed to enable their reproducibility, to address new compliance and auditing regimes mandated in regulation and industry-specific policy, and to verify and validate the responsible nature of their operation. To address this gap, we introduce the datasheet for ML sensors framework. We provide a comprehensive template, collaboratively developed in academia-industry partnerships, that captures the distinct attributes of ML sensors, including hardware specifications, ML model and dataset characteristics, end-to-end performance metrics, and environmental impacts. Our framework addresses the continuous streaming nature of sensor data, real-time processing requirements, and embeds benchmarking methodologies that reflect real-world deployment conditions, ensuring practical viability. Aligned with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability), our approach enhances the transparency and reusability of ML sensor documentation across academic, industrial, and regulatory domains. To show the application of our approach, we present two datasheets: the first for an open-source ML sensor designed in-house and the second for a commercial ML sensor developed by industry collaborators, both performing computer vision-based person detection.

Datasheets for Machine Learning Sensors

TL;DR

ML sensors fuse sensing hardware with embedded ML, creating transparency and auditability requirements. To address this, the authors propose a dedicated datasheet framework for ML sensors organized into Standard Sensor, IoT, AI, and ML Sensor components, developed through stakeholder engagement and open-sourcing. They demonstrate the framework on two person-detection sensors (one open-source, one commercial) to show how end-to-end performance, environmental impact, privacy, and model–data details can be documented. Aligned with FAIR principles, the framework aims to support reproducibility, regulatory compliance, and responsible deployment of edge AI in diverse real-world contexts.

Abstract

Machine learning (ML) is becoming prevalent in embedded AI sensing systems. These "ML sensors" enable context-sensitive, real-time data collection and decision-making across diverse applications ranging from anomaly detection in industrial settings to wildlife tracking for conservation efforts. As such, there is a need to provide transparency in the operation of such ML-enabled sensing systems through comprehensive documentation. This is needed to enable their reproducibility, to address new compliance and auditing regimes mandated in regulation and industry-specific policy, and to verify and validate the responsible nature of their operation. To address this gap, we introduce the datasheet for ML sensors framework. We provide a comprehensive template, collaboratively developed in academia-industry partnerships, that captures the distinct attributes of ML sensors, including hardware specifications, ML model and dataset characteristics, end-to-end performance metrics, and environmental impacts. Our framework addresses the continuous streaming nature of sensor data, real-time processing requirements, and embeds benchmarking methodologies that reflect real-world deployment conditions, ensuring practical viability. Aligned with the FAIR principles (Findability, Accessibility, Interoperability, and Reusability), our approach enhances the transparency and reusability of ML sensor documentation across academic, industrial, and regulatory domains. To show the application of our approach, we present two datasheets: the first for an open-source ML sensor designed in-house and the second for a commercial ML sensor developed by industry collaborators, both performing computer vision-based person detection.
Paper Structure (19 sections, 7 figures, 1 table)

This paper contains 19 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: The ML sensor paradigm: deploying machine learning models directly on the sensor for privacy-promoting and energy-efficient edge intelligence.
  • Figure 2: Examples of existing ML sensors; (top) Seeed Studio's SenseCAP LoRaWAN sensor seeed_panda_detector for long-range data collection IoT scenarios like smart farming, (bottom left) our own person detection sensor whose design is open-source (link redacted for double blind), and (bottom right), Useful Sensor's person sensor useful_sensors1. We use these person detection sensors for our case study applying our ML sensor datasheet to real ML sensors in Section \ref{['sec:caseStudy']}.
  • Figure 3: Schematic of the proposed datasheet template for ML sensors.
  • Figure 4: (left) Device diagram of person detection ML sensor, (middle) standard for data communication, and (right) schema for communication of data off-sensor.
  • Figure 5: Primary IoT security and privacy label for the open-source person detection ML sensor (left), as well as its data nutrition label summary statistics (center), and the ROC curve of the person detection model evaluated on a test set (right).
  • ...and 2 more figures