WhACC: Whisker Automatic Contact Classifier with Expert Human-Level Performance

Phillip Maire; Samson G. King; Jonathan Andrew Cheung; Stefanie Walker; Samuel Andrew Hires

WhACC: Whisker Automatic Contact Classifier with Expert Human-Level Performance

Phillip Maire, Samson G. King, Jonathan Andrew Cheung, Stefanie Walker, Samuel Andrew Hires

TL;DR

This work introduces Whisker Automatic Contact Classifier (WhACC), a python package designed to identify touch periods from high-speed videos of head-fixed behaving rodents with human-level performance and offers an easy way to select and curate a subset of data to adaptively retrain WhACC.

Abstract

The rodent vibrissal system is pivotal in advancing neuroscience research, particularly for studies of cortical plasticity, learning, decision-making, sensory encoding, and sensorimotor integration. Despite the advantages, curating touch events is labor intensive and often requires >3 hours per million video frames, even after leveraging automated tools like the Janelia Whisker Tracker. We address this limitation by introducing Whisker Automatic Contact Classifier (WhACC), a python package designed to identify touch periods from high-speed videos of head-fixed behaving rodents with human-level performance. WhACC leverages ResNet50V2 for feature extraction, combined with LightGBM for Classification. Performance is assessed against three expert human curators on over one million frames. Pairwise touch classification agreement on 99.5% of video frames, equal to between-human agreement. Finally, we offer a custom retraining interface to allow model customization on a small subset of data, which was validated on four million frames across 16 single-unit electrophysiology recordings. Including this retraining step, we reduce human hours required to curate a 100 million frame dataset from ~333 hours to ~6 hours.

WhACC: Whisker Automatic Contact Classifier with Expert Human-Level Performance

TL;DR

Abstract

Paper Structure (20 sections, 6 figures, 2 tables)

This paper contains 20 sections, 6 figures, 2 tables.

Introduction
Design and Implementation
Establishing touch ground truth and error metrics
Selection of training, validation, and test set
Model selection and evaluation
Discussion
Summary of WhACC
Potential limitations
Videos with different frame rates
Multi-whisker video and alternative contact objects
Why is retraining required?
Final considerations and related work
Materials and Methods
Data selection and preprocessing
Training CNNs
...and 5 more sections

Figures (6)

Figure 1: Flow diagram of WhACC video pre-processing and design implementation. A) Sample touch frame from high-speed (1,000 fps) video and extracted object-centered window for CNN input (red box) and corresponding spike train from a touch responsive neuron (inset) B) Three consecutive extracted frames combined into three color channels (left) and example augmented images (right). C) Representation of ResNet50V2 model used to extract features. D) Demonstrative sample of features extracted from ResNet50V2 (left), representation of feature engineering (center) and feature selection (right) for final WhACC model. E) Final WhACC model was trained using LightGBM with Optuna to achieve the best performance.
Figure 2: Touch frame scoring and variation in human curation. A) Example of disparity between three human curators. Majority touch (dark blue), majority non-touch (light blue) were used for training models. Consensus frames (green) were used when evaluating curator versus paired consensus in C. B) Example of scored touch array (human majority) and the corresponding edge errors (deduct and append), and touch count errors (split, ghost, miss and join). C) Individual and mean error rate for each human curator compared against the consensus of the other two curators for touch count errors (left) and edge errors (right).
Figure 3: Data selection and model performance.(A) Data selected for un-augmented (green) and augmented images (blue) using all frames within 80 or 3 frames from majority scored touch frames respectively. (B) Composition of training, validation and test datasets used to train and evaluate each model (C) Performance of four CNN models across three different image augmentation approaches, ResNet50V2 model features used as input into LSTMs, and ResNet50V2 model features used as input into LightGBM models before and after feature engineering and selection.
Figure 4: Feature engineering and selection. A) The original 2,048 features extracted from the penultimate layer of Resnet 50 V2, (zoom) enlarged for detail (white box). Additional features generated by (B) shifting, (C) smoothing, (D) taking the rolling standard deviation, and (E) taking the discrete difference for each of the original 2,048 features. F) Standard deviation of the original and 40 additional engineered feature sets across feature space (columns). G) Model performance across feature engineering and reduction.
Figure 5: WhACC shows expert human level performance. (A) Human vs WhACC touch count error rate for each error type (top) and in total (bottom), error bars indicate 95% CI. (B) Same as A for edge errors. (C) Difference in error rate for human versus WhACC. Negative values indicate WhACC outperforming human curators on average. (D) Percent correct for individual and mean performance of human curators versus WhACC.
...and 1 more figures

WhACC: Whisker Automatic Contact Classifier with Expert Human-Level Performance

TL;DR

Abstract

WhACC: Whisker Automatic Contact Classifier with Expert Human-Level Performance

Authors

TL;DR

Abstract

Table of Contents

Figures (6)