Evaluating ROCKET and Catch22 features for calf behaviour classification from accelerometer data using Machine Learning models
Oshana Dissanayake, Sarah E. McPherson, Joseph Allyndree, Emer Kennedy, Padraig Cunningham, Lucile Riaboff
TL;DR
This study addresses automatic classification of pre-weaned calf behaviours from neck-collar accelerometer data. It compares time-series feature representations—ROCKET, Catch22—and Hand-Crafted features across RF, XGB, and RidgeClassifierCV models, using 3-second windows and a calf-independent train-test split. ROCKET and Catch22 outperform Hand-Crafted features on balanced accuracy, with ROCKET+RCV achieving the best overall performance (~0.77 BA) and Catch22+RF also performing strongly (~0.73 BA). The findings imply that time-series-specific representations are valuable for livestock welfare monitoring, supporting future development of robust, non-invasive behavioural assessment tools across farms. However, performance remains behavior-specific and dataset-limited, underscoring the need for more data and potentially ensemble or binary classification strategies to optimize per-behaviour predictions.
Abstract
Monitoring calf behaviour continuously would be beneficial to identify routine practices (e.g., weaning, dehorning, etc.) that impact calf welfare in dairy farms. In that regard, accelerometer data collected from neck collars can be used along with Machine Learning models to classify calf behaviour automatically. Hand-crafted features are commonly used in Machine Learning models, while ROCKET and Catch22 features are specifically designed for time-series classification problems in related fields. This study aims to compare the performance of ROCKET and Catch22 features to Hand-Crafted features. 30 Irish Holstein Friesian and Jersey pre-weaned calves were monitored using accelerometer sensors allowing for 27.4 hours of annotated behaviors. Additional time-series were computed from the raw X, Y and Z-axis and split into 3-second time windows. ROCKET, Catch22 and Hand-Crafted features were calculated for each time window, and the dataset was then split into the train, validation and test sets. Each set of features was used to train three Machine Learning models (Random Forest, eXtreme Gradient Boosting, and RidgeClassifierCV) to classify six behaviours indicative of pre-weaned calf welfare (drinking milk, grooming, lying, running, walking and other). Models were tuned with the validation set, and the performance of each feature-model combination was evaluated with the test set. The best performance across the three models was obtained with ROCKET [average balanced accuracy +/- standard deviation] (0.70 +/- 0.07), followed by Catch22 (0.69 +/- 0.05), surpassing Hand-Crafted (0.65 +/- 0.034). The best balanced accuracy (0.77) was obtained with ROCKET and RidgeClassifierCV, followed by Catch22 and Random Forest (0.73). Thus, tailoring these approaches for specific behaviours and contexts will be crucial in advancing precision livestock farming and enhancing animal welfare on a larger scale.
