Table of Contents
Fetching ...

LostPaw: Finding Lost Pets using a Contrastive Learning-based Transformer with Visual Input

Andrei Voinea, Robin Kock, Maruf A. Dhali

TL;DR

This paper addresses the challenge of locating lost pets by building a contrastive learning model on visual data. It employs a Vision Transformer backbone with DETR-based cropping and AutoAugment to learn discriminative image embeddings, evaluated via three-fold cross-validation and a held-out test set, achieving around 90% test accuracy after 350 epochs. Key findings show strong generalization and robust latent representations, though with some false positives that could aid broad-area searches. The work lays groundwork for a web-based lost-pet search tool that can notify owners of potential matches and suggests avenues for extending to other animal types and integration with DETR and ViT components for improved robustness.

Abstract

Losing pets can be highly distressing for pet owners, and finding a lost pet is often challenging and time-consuming. An artificial intelligence-based application can significantly improve the speed and accuracy of finding lost pets. To facilitate such an application, this study introduces a contrastive neural network model capable of accurately distinguishing between images of pets. The model was trained on a large dataset of dog images and evaluated through 3-fold cross-validation. Following 350 epochs of training, the model achieved a test accuracy of 90%. Furthermore, overfitting was avoided, as the test accuracy closely matched the training accuracy. Our findings suggest that contrastive neural network models hold promise as a tool for locating lost pets. This paper presents the foundational framework for a potential web application designed to assist users in locating their missing pets. The application will allow users to upload images of their lost pets and provide notifications when matching images are identified within its image database. This functionality aims to enhance the efficiency and accuracy with which pet owners can search for and reunite with their beloved animals.

LostPaw: Finding Lost Pets using a Contrastive Learning-based Transformer with Visual Input

TL;DR

This paper addresses the challenge of locating lost pets by building a contrastive learning model on visual data. It employs a Vision Transformer backbone with DETR-based cropping and AutoAugment to learn discriminative image embeddings, evaluated via three-fold cross-validation and a held-out test set, achieving around 90% test accuracy after 350 epochs. Key findings show strong generalization and robust latent representations, though with some false positives that could aid broad-area searches. The work lays groundwork for a web-based lost-pet search tool that can notify owners of potential matches and suggests avenues for extending to other animal types and integration with DETR and ViT components for improved robustness.

Abstract

Losing pets can be highly distressing for pet owners, and finding a lost pet is often challenging and time-consuming. An artificial intelligence-based application can significantly improve the speed and accuracy of finding lost pets. To facilitate such an application, this study introduces a contrastive neural network model capable of accurately distinguishing between images of pets. The model was trained on a large dataset of dog images and evaluated through 3-fold cross-validation. Following 350 epochs of training, the model achieved a test accuracy of 90%. Furthermore, overfitting was avoided, as the test accuracy closely matched the training accuracy. Our findings suggest that contrastive neural network models hold promise as a tool for locating lost pets. This paper presents the foundational framework for a potential web application designed to assist users in locating their missing pets. The application will allow users to upload images of their lost pets and provide notifications when matching images are identified within its image database. This functionality aims to enhance the efficiency and accuracy with which pet owners can search for and reunite with their beloved animals.
Paper Structure (15 sections, 2 equations, 7 figures, 2 tables)

This paper contains 15 sections, 2 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Data collection process. The top nodes represent the individual steps that are taken for each image. The diagrams at the bottom show possible configurations of each step.
  • Figure 2: Example data pairs with labels underneath. Some of the images have been augmented.
  • Figure 3: Architecture of the Contrastive Vision Transformer model.
  • Figure 4: Mean train accuracy and loss of the contrastive ViT model, averaged over three model runs. The data for accuracy was smoothed by averaging the values every five epochs.
  • Figure 5: Type I and II errors of the model on the test set at every epoch. The data for the errors were smoothed by averaging the values every five epochs.
  • ...and 2 more figures