Model Lake: a New Alternative for Machine Learning Models Management and Governance
Moncef Garouani, Franck Ravat, Nathalie Valles-Parlangeau
TL;DR
Organizations face fragmentation and governance gaps in ML model management as AI adoption grows. The paper proposes Model Lake, an integrated ecosystem that unifies data, code, and models with a centralized registry and a collaborative workspace, organized into Data, Analysis, and Governance zones. It details a three-zone architecture and a metadata framework based on the 5W1H principle to ensure provenance, reproducibility, and auditability, along with ingestion, versioning, and lineage capabilities to prevent a model swamp. The approach aims to improve lifecycle management, discovery, reusability, and compliance across enterprise AI pipelines, with future work including broader artifact support and a recommender system for better search and reuse.
Abstract
The rise of artificial intelligence and data science across industries underscores the pressing need for effective management and governance of machine learning (ML) models. Traditional approaches to ML models management often involve disparate storage systems and lack standardized methodologies for versioning, audit, and re-use. Inspired by data lake concepts, this paper develops the concept of ML Model Lake as a centralized management framework for datasets, codes, and models within organizations environments. We provide an in-depth exploration of the Model Lake concept, delineating its architectural foundations, key components, operational benefits, and practical challenges. We discuss the transformative potential of adopting a Model Lake approach, such as enhanced model lifecycle management, discovery, audit, and reusability. Furthermore, we illustrate a real-world application of Model Lake and its transformative impact on data, code and model management practices.
