Learning image representations for anomaly detection: application to discovery of histological alterations in drug development
Igor Zingman, Birgit Stierstorfer, Charlotte Lempp, Fabian Heinemann
TL;DR
This paper tackles anomaly detection in histopathology by learning domain-adapted image representations through an auxiliary tissue-classification task across species, organs, and stains, paired with a center-loss to produce compact normal representations. A tile-based system combines a CNN encoder with a one-class SVM to detect anomalies in histology tiles and aggregates tile decisions to a whole-slide score. The approach outperforms state-of-the-art AD methods and SSL baselines on NAFLD-related liver anomalies and demonstrates potential for early toxicity screening in drug development, with ablation analyses highlighting the value of the auxiliary task, center-loss, and color-mix augmentation. The work also shows that the learned representations can match or exceed specialized NAFLD quantification methods, suggesting broad applicability to preclinical safety assessment and reduction of late-stage attrition. It further provides a public dataset of healthy-tissue tiles to support reproducibility and benchmarking in histopathology anomaly detection.
Abstract
We present a system for anomaly detection in histopathological images. In histology, normal samples are usually abundant, whereas anomalous (pathological) cases are scarce or not available. Under such settings, one-class classifiers trained on healthy data can detect out-of-distribution anomalous samples. Such approaches combined with pre-trained Convolutional Neural Network (CNN) representations of images were previously employed for anomaly detection (AD). However, pre-trained off-the-shelf CNN representations may not be sensitive to abnormal conditions in tissues, while natural variations of healthy tissue may result in distant representations. To adapt representations to relevant details in healthy tissue we propose training a CNN on an auxiliary task that discriminates healthy tissue of different species, organs, and staining reagents. Almost no additional labeling workload is required, since healthy samples come automatically with aforementioned labels. During training we enforce compact image representations with a center-loss term, which further improves representations for AD. The proposed system outperforms established AD methods on a published dataset of liver anomalies. Moreover, it provided comparable results to conventional methods specifically tailored for quantification of liver anomalies. We show that our approach can be used for toxicity assessment of candidate drugs at early development stages and thereby may reduce expensive late-stage drug attrition.
