Table of Contents
Fetching ...

PLUTO-4: Frontier Pathology Foundation Models

Harshith Padigela, Shima Nofallah, Atchuth Naveen Chilaparasetti, Ryun Han, Andrew Walker, Judy Shen, Chintan Shah, Blake Martin, Aashish Sood, Elliot Miller, Ben Glass, Andy Beck, Harsha Pokkalla, Syed Ashar Javed

TL;DR

PLUTO-4 introduces two pathology foundation model families (PLUTO-4S and PLUTO-4G) trained on a massive multi-institutional WSI corpus, leveraging self-supervised learning to achieve state-of-the-art performance across tile-level, segmentation, and slide-level tasks. PLUTO-4S offers scalable, high-throughput performance with multi-scale inputs via FlexiViT and 2D-RoPE, while PLUTO-4G scales to frontier capacity using a single patch size to maximize representation power. Across diverse benchmarks including HEST spatial transcriptomics and dermatopathology, PLUTO-4G sets new standards, with PLUTO-4S delivering strong, deployable performance. The work demonstrates the practical potential of large-scale pathology FMs as backbone representations for translational research and clinical workflows, while acknowledging the need for task-specific adapters and deployment considerations. Overall, PLUTO-4 advances the scalability and applicability of foundation models in digital pathology, balancing accuracy and deployability across real-world use cases.

Abstract

Foundation models trained on large-scale pathology image corpora have demonstrated strong transfer capabilities across diverse histopathology tasks. Building on this progress, we introduce PLUTO-4, our next generation of pathology foundation models that extend the Pathology-Universal Transformer (PLUTO) to frontier scale. We share two complementary Vision Transformer architectures in the PLUTO-4 family: a compact and efficient PLUTO-4S model optimized for multi-scale deployment using a FlexiViT setup with 2D-RoPE embeddings, and a frontier-scale PLUTO-4G model trained with a single patch size to maximize representation capacity and stability. Both models are pretrained using a self-supervised objective derived from DINOv2 on a large multi-institutional corpus containing 551,164 WSIs from 137,144 patients across over 50 institutions, spanning over 60 disease types and over 100 stains. Comprehensive evaluation across public and internal benchmarks demonstrates that PLUTO-4 achieves state-of-the-art performance on tasks requiring varying spatial and biological context, including tile classification, segmentation, and slide-level diagnosis. The compact PLUTO-4S provides high-throughput and robust performance for practical deployment, while PLUTO-4G establishes new performance frontiers across multiple pathology benchmarks, including an 11% improvement in dermatopathology diagnosis. These diverse improvements underscore PLUTO-4's potential to transform real-world applications as a backbone for translational research and diagnostic use cases.

PLUTO-4: Frontier Pathology Foundation Models

TL;DR

PLUTO-4 introduces two pathology foundation model families (PLUTO-4S and PLUTO-4G) trained on a massive multi-institutional WSI corpus, leveraging self-supervised learning to achieve state-of-the-art performance across tile-level, segmentation, and slide-level tasks. PLUTO-4S offers scalable, high-throughput performance with multi-scale inputs via FlexiViT and 2D-RoPE, while PLUTO-4G scales to frontier capacity using a single patch size to maximize representation power. Across diverse benchmarks including HEST spatial transcriptomics and dermatopathology, PLUTO-4G sets new standards, with PLUTO-4S delivering strong, deployable performance. The work demonstrates the practical potential of large-scale pathology FMs as backbone representations for translational research and clinical workflows, while acknowledging the need for task-specific adapters and deployment considerations. Overall, PLUTO-4 advances the scalability and applicability of foundation models in digital pathology, balancing accuracy and deployability across real-world use cases.

Abstract

Foundation models trained on large-scale pathology image corpora have demonstrated strong transfer capabilities across diverse histopathology tasks. Building on this progress, we introduce PLUTO-4, our next generation of pathology foundation models that extend the Pathology-Universal Transformer (PLUTO) to frontier scale. We share two complementary Vision Transformer architectures in the PLUTO-4 family: a compact and efficient PLUTO-4S model optimized for multi-scale deployment using a FlexiViT setup with 2D-RoPE embeddings, and a frontier-scale PLUTO-4G model trained with a single patch size to maximize representation capacity and stability. Both models are pretrained using a self-supervised objective derived from DINOv2 on a large multi-institutional corpus containing 551,164 WSIs from 137,144 patients across over 50 institutions, spanning over 60 disease types and over 100 stains. Comprehensive evaluation across public and internal benchmarks demonstrates that PLUTO-4 achieves state-of-the-art performance on tasks requiring varying spatial and biological context, including tile classification, segmentation, and slide-level diagnosis. The compact PLUTO-4S provides high-throughput and robust performance for practical deployment, while PLUTO-4G establishes new performance frontiers across multiple pathology benchmarks, including an 11% improvement in dermatopathology diagnosis. These diverse improvements underscore PLUTO-4's potential to transform real-world applications as a backbone for translational research and diagnostic use cases.

Paper Structure

This paper contains 36 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Distribution of the PLUTO-4 dataset across organs, diseases, stain groups and scanners.
  • Figure 2: Training throughput scaling across architectures and hardware. ViT-B shows near-linear scaling across both A40 and H200 clusters, while ViT-G throughput degrades beyond two nodes due to communication bottlenecks in DDP. Additionally we can also see, ViT-G with patch-token size 8 is approximately 3.5$\times$ slower than ViT-G with patch-token size 14.
  • Figure 3: Optimizing ViT-G training throughput. Enabling GPUDirect RDMA and tuning DDP parameters (bucket_cap_mb, gradient_as_bucket_view) restores near-linear throughput scaling and saturates InfiniBand bandwidth.