Vision Foundry: A System for Training Foundational Vision AI Models
Authors
Mahmut S. Gokmen, Mitchell A. Klusty, Evan W. Damron, W. Vaiden Logan, Aaron D. Mullen, Caroline N. Leach, Emily B. Collier, Samuel E. Armstrong, V. K. Cody Bumgardner
Abstract
Self-supervised learning (SSL) leverages vast unannotated medical datasets, yet steep technical barriers limit adoption by clinical researchers. We introduce Vision Foundry, a code-free, HIPAA-compliant platform that democratizes pre-training, adaptation, and deployment of foundational vision models. The system integrates the DINO-MX framework, abstracting distributed infrastructure complexities while implementing specialized strategies like Magnification-Aware Distillation (MAD) and Parameter-Efficient Fine-Tuning (PEFT). We validate the platform across domains, including neuropathology segmentation, lung cellularity estimation, and coronary calcium scoring. Our experiments demonstrate that models trained via Vision Foundry significantly outperform generic baselines in segmentation fidelity and regression accuracy, while exhibiting robust zero-shot generalization across imaging protocols. By bridging the gap between advanced representation learning and practical application, Vision Foundry enables domain experts to develop state-of-the-art clinical AI tools with minimal annotation overhead, shifting focus from engineering optimization to clinical discovery.