Table of Contents
Fetching ...

Self-Supervised Learning Strategies for a Platform to Test the Toxicity of New Chemicals and Materials

Thomas Lautenschlager, Nils Friederich, Angelo Jovin Yamachui Sitcheu, Katja Nau, Gaëlle Hayot, Thomas Dickmeis, Ralf Mikut

TL;DR

This work addresses the need for scalable, automated toxicity testing by leveraging self-supervised learning to learn representations from EmbryoNet zebrafish embryo images. Using SimCLR with a ResNet50 backbone, the authors demonstrate label-efficient downstream analysis via a linear probe, achieving 79.9% test accuracy and revealing clustering by toxicant modes-of-action in the latent space. The results indicate that continuous, concentration-aware phenotypic representations can improve MOA identification while reducing labeled data requirements. The study discusses integrating SSL-based approaches into the TOXBOX high-throughput toxicity platform and outlines avenues for explainability, concept drift management, and future downstream tasks such as segmentation and transfer learning.

Abstract

High-throughput toxicity testing offers a fast and cost-effective way to test large amounts of compounds. A key component for such systems is the automated evaluation via machine learning models. In this paper, we address critical challenges in this domain and demonstrate how representations learned via self-supervised learning can effectively identify toxicant-induced changes. We provide a proof-of-concept that utilizes the publicly available EmbryoNet dataset, which contains ten zebrafish embryo phenotypes elicited by various chemical compounds targeting different processes in early embryonic development. Our analysis shows that the learned representations using self-supervised learning are suitable for effectively distinguishing between the modes-of-action of different compounds. Finally, we discuss the integration of machine learning models in a physical toxicity testing device in the context of the TOXBOX project.

Self-Supervised Learning Strategies for a Platform to Test the Toxicity of New Chemicals and Materials

TL;DR

This work addresses the need for scalable, automated toxicity testing by leveraging self-supervised learning to learn representations from EmbryoNet zebrafish embryo images. Using SimCLR with a ResNet50 backbone, the authors demonstrate label-efficient downstream analysis via a linear probe, achieving 79.9% test accuracy and revealing clustering by toxicant modes-of-action in the latent space. The results indicate that continuous, concentration-aware phenotypic representations can improve MOA identification while reducing labeled data requirements. The study discusses integrating SSL-based approaches into the TOXBOX high-throughput toxicity platform and outlines avenues for explainability, concept drift management, and future downstream tasks such as segmentation and transfer learning.

Abstract

High-throughput toxicity testing offers a fast and cost-effective way to test large amounts of compounds. A key component for such systems is the automated evaluation via machine learning models. In this paper, we address critical challenges in this domain and demonstrate how representations learned via self-supervised learning can effectively identify toxicant-induced changes. We provide a proof-of-concept that utilizes the publicly available EmbryoNet dataset, which contains ten zebrafish embryo phenotypes elicited by various chemical compounds targeting different processes in early embryonic development. Our analysis shows that the learned representations using self-supervised learning are suitable for effectively distinguishing between the modes-of-action of different compounds. Finally, we discuss the integration of machine learning models in a physical toxicity testing device in the context of the TOXBOX project.

Paper Structure

This paper contains 17 sections, 5 equations, 7 figures.

Figures (7)

  • Figure 1: Diagram of different computational models for toxicity predictions. The green box denotes the approaches more closely discussed in this paper. Models that incorporate are marked with an asterisk and highlighted in yellow.
  • Figure 2:
  • Figure 3: Confusion matrix of the linear classifier trained using SimCLR representations
  • Figure 4: UMAP visualization of SimCLR representations
  • Figure 5: Mean cosine similarities between class centers and representations of the class
  • ...and 2 more figures