Table of Contents
Fetching ...

A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images

David Torpey, Lawrence Pratt, Richard Klein

TL;DR

This work tackles defect detection in electroluminescence (EL) solar-cell images under limited labeled data by benchmarking a broad spectrum of pretraining paradigms: supervised (COCO, ImageNet), self-supervised (SimCLR, MoCov2 on ImageNet and EL), and semi-supervised (CCT, U2PL). It systematically analyzes both out-of-distribution and in-distribution pretraining, revealing that supervised COCO, self-supervised ImageNet, and semi-supervised CCT achieve statistically equivalent $mIoU$ on SCDD, with U2PL underperforming. The study introduces a large unlabelled EL image dataset (22,000 publicly released) and a 642-image, ground-truth semantic segmentation benchmark, achieving a new state-of-the-art and highlighting better handling of underrepresented defects in some regimes. The findings suggest that domain-tailored self-supervision remains challenging for EL data, while large-scale OOD pretraining provides robust, deployable gains, and it provides datasets to spur further research in this domain.$mIoU$ and $wIoU$ are used to quantify segmentation performance. The work thus informs practical model selection for SCDD and motivates future domain-specific self-/semi-supervised technique development.

Abstract

Pretraining has been shown to improve performance in many domains, including semantic segmentation, especially in domains with limited labelled data. In this work, we perform a large-scale evaluation and benchmarking of various pretraining methods for Solar Cell Defect Detection (SCDD) in electroluminescence images, a field with limited labelled datasets. We cover supervised training with semantic segmentation, semi-supervised learning, and two self-supervised techniques. We also experiment with both in-distribution and out-of-distribution (OOD) pretraining and observe how this affects downstream performance. The results suggest that supervised training on a large OOD dataset (COCO), self-supervised pretraining on a large OOD dataset (ImageNet), and semi-supervised pretraining (CCT) all yield statistically equivalent performance for mean Intersection over Union (mIoU). We achieve a new state-of-the-art for SCDD and demonstrate that certain pretraining schemes result in superior performance on underrepresented classes. Additionally, we provide a large-scale unlabelled EL image dataset of $22000$ images, and a $642$-image labelled semantic segmentation EL dataset, for further research in developing self- and semi-supervised training techniques in this domain.

A Large-scale Evaluation of Pretraining Paradigms for the Detection of Defects in Electroluminescence Solar Cell Images

TL;DR

This work tackles defect detection in electroluminescence (EL) solar-cell images under limited labeled data by benchmarking a broad spectrum of pretraining paradigms: supervised (COCO, ImageNet), self-supervised (SimCLR, MoCov2 on ImageNet and EL), and semi-supervised (CCT, U2PL). It systematically analyzes both out-of-distribution and in-distribution pretraining, revealing that supervised COCO, self-supervised ImageNet, and semi-supervised CCT achieve statistically equivalent on SCDD, with U2PL underperforming. The study introduces a large unlabelled EL image dataset (22,000 publicly released) and a 642-image, ground-truth semantic segmentation benchmark, achieving a new state-of-the-art and highlighting better handling of underrepresented defects in some regimes. The findings suggest that domain-tailored self-supervision remains challenging for EL data, while large-scale OOD pretraining provides robust, deployable gains, and it provides datasets to spur further research in this domain. and are used to quantify segmentation performance. The work thus informs practical model selection for SCDD and motivates future domain-specific self-/semi-supervised technique development.

Abstract

Pretraining has been shown to improve performance in many domains, including semantic segmentation, especially in domains with limited labelled data. In this work, we perform a large-scale evaluation and benchmarking of various pretraining methods for Solar Cell Defect Detection (SCDD) in electroluminescence images, a field with limited labelled datasets. We cover supervised training with semantic segmentation, semi-supervised learning, and two self-supervised techniques. We also experiment with both in-distribution and out-of-distribution (OOD) pretraining and observe how this affects downstream performance. The results suggest that supervised training on a large OOD dataset (COCO), self-supervised pretraining on a large OOD dataset (ImageNet), and semi-supervised pretraining (CCT) all yield statistically equivalent performance for mean Intersection over Union (mIoU). We achieve a new state-of-the-art for SCDD and demonstrate that certain pretraining schemes result in superior performance on underrepresented classes. Additionally, we provide a large-scale unlabelled EL image dataset of images, and a -image labelled semantic segmentation EL dataset, for further research in developing self- and semi-supervised training techniques in this domain.
Paper Structure (13 sections, 7 figures, 3 tables)

This paper contains 13 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: EL images of multi-crystalline (top) and mono-crystalline (bottom) silicon solar cells
  • Figure 2: Number of images containing each defect class (top) and feature class (bottom)
  • Figure 3: EL and ground truth from UCF fioresi2022automated (top) and the CSIR dataset pratt2023 (bottom)
  • Figure 4: mIoU by class type (extrinsic defect and intrinsic feature).
  • Figure 5: wIoU by class type (extrinsic defect and intrinsic feature).
  • ...and 2 more figures