Table of Contents
Fetching ...

Pig aggression classification using CNN, Transformers and Recurrent Networks

Junior Silva Souza, Eduardo Bedin, Gabriel Toshio Hirokawa Higa, Newton Loebens, Hemerson Pistori

TL;DR

This study targets automatic detection of aggressive behavior in pigs using video data collected in a local breeding setting. It systematically compares transformer-based video models (ViViT, STAM, TimeSformer) with CNN+RNN variants (ResNet3D, Resnet(2+1)D, CNN-LSTM) under 5-fold cross-validation, after preprocessing frames and patches for computational feasibility. TimeSformer consistently yields the strongest performance overall, particularly in precision, while STAM lags behind and CNN-based approaches show mixed results; statistical analysis confirms significant differences among methods. The work contributes a locally sourced pig aggression dataset and demonstrates the practicality of transformer-based video analysis for real-time animal welfare monitoring, with potential to reduce labor and improve decision-making on farms.

Abstract

The development of techniques that can be used to analyze and detect animal behavior is a crucial activity for the livestock sector, as it is possible to monitor the stress and animal welfare and contributes to decision making in the farm. Thus, the development of applications can assist breeders in making decisions to improve production performance and reduce costs, once the animal behavior is analyzed by humans and this can lead to susceptible errors and time consumption. Aggressiveness in pigs is an example of behavior that is studied to reduce its impact through animal classification and identification. However, this process is laborious and susceptible to errors, which can be reduced through automation by visually classifying videos captured in controlled environment. The captured videos can be used for training and, as a result, for classification through computer vision and artificial intelligence, employing neural network techniques. The main techniques utilized in this study are variants of transformers: STAM, TimeSformer, and ViViT, as well as techniques using convolutions, such as ResNet3D2, Resnet(2+1)D, and CnnLstm. These techniques were employed for pig video classification with the objective of identifying aggressive and non-aggressive behaviors. In this work, various techniques were compared to analyze the contribution of using transformers, in addition to the effectiveness of the convolution technique in video classification. The performance was evaluated using accuracy, precision, and recall. The TimerSformer technique showed the best results in video classification, with median accuracy of 0.729.

Pig aggression classification using CNN, Transformers and Recurrent Networks

TL;DR

This study targets automatic detection of aggressive behavior in pigs using video data collected in a local breeding setting. It systematically compares transformer-based video models (ViViT, STAM, TimeSformer) with CNN+RNN variants (ResNet3D, Resnet(2+1)D, CNN-LSTM) under 5-fold cross-validation, after preprocessing frames and patches for computational feasibility. TimeSformer consistently yields the strongest performance overall, particularly in precision, while STAM lags behind and CNN-based approaches show mixed results; statistical analysis confirms significant differences among methods. The work contributes a locally sourced pig aggression dataset and demonstrates the practicality of transformer-based video analysis for real-time animal welfare monitoring, with potential to reduce labor and improve decision-making on farms.

Abstract

The development of techniques that can be used to analyze and detect animal behavior is a crucial activity for the livestock sector, as it is possible to monitor the stress and animal welfare and contributes to decision making in the farm. Thus, the development of applications can assist breeders in making decisions to improve production performance and reduce costs, once the animal behavior is analyzed by humans and this can lead to susceptible errors and time consumption. Aggressiveness in pigs is an example of behavior that is studied to reduce its impact through animal classification and identification. However, this process is laborious and susceptible to errors, which can be reduced through automation by visually classifying videos captured in controlled environment. The captured videos can be used for training and, as a result, for classification through computer vision and artificial intelligence, employing neural network techniques. The main techniques utilized in this study are variants of transformers: STAM, TimeSformer, and ViViT, as well as techniques using convolutions, such as ResNet3D2, Resnet(2+1)D, and CnnLstm. These techniques were employed for pig video classification with the objective of identifying aggressive and non-aggressive behaviors. In this work, various techniques were compared to analyze the contribution of using transformers, in addition to the effectiveness of the convolution technique in video classification. The performance was evaluated using accuracy, precision, and recall. The TimerSformer technique showed the best results in video classification, with median accuracy of 0.729.
Paper Structure (10 sections, 4 figures, 5 tables)

This paper contains 10 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: In this image, we can visualize the process of classification from an input video, where clips are obtained.
  • Figure 2: The image shows three sequential frames depicting two instances of aggression. We can observe more highlighted details in frame three, located on the right, where the animals are attempting to bite.
  • Figure 3: Image depicting three frames illustrating non-aggressive animal behavior.
  • Figure 4: The confusion matrix was obtained from the testing set using the TimeSformer technique.