Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety

Lucas Choi; Ross Greer

Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety

Lucas Choi, Ross Greer

TL;DR

This paper evaluates the use of vision-language models (VLMs) for zero-shot detection and association of hardhats to enhance construction safety, highlighting the strengths and weaknesses of current foundation models in safety perception domains.

Abstract

This paper evaluates the use of vision-language models (VLMs) for zero-shot detection and association of hardhats to enhance construction safety. Given the significant risk of head injuries in construction, proper enforcement of hardhat use is critical. We investigate the applicability of foundation models, specifically OWLv2, for detecting hardhats in real-world construction site images. Our contributions include the creation of a new benchmark dataset, Hardhat Safety Detection Dataset, by filtering and combining existing datasets and the development of a cascaded detection approach. Experimental results on 5,210 images demonstrate that the OWLv2 model achieves an average precision of 0.6493 for hardhat detection. We further analyze the limitations and potential improvements for real-world applications, highlighting the strengths and weaknesses of current foundation models in safety perception domains.

Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety

TL;DR

Abstract

Evaluating Cascaded Methods of Vision-Language Models for Zero-Shot Detection and Association of Hardhats for Increased Construction Safety

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)