Table of Contents
Fetching ...

PrismSSL: One Interface, Many Modalities; A Single-Interface Library for Multimodal Self-Supervised Learning

Melika Shirian, Kianoosh Vadaei, Kian Majlessi, Audrina Ebrahimi, Arshia Hemmat, Peyman Adibi, Hossein Karshenas

TL;DR

Self-supervised learning across audio, vision, graphs, and cross-modal domains is fragmented across domain-specific repositories, hindering fair comparisons and reproducibility. PrismSSL offers a unified, modular Python library that provides a single interface to configure, train, and evaluate SSL methods across modalities, with a registry, a generic trainer, and optional UI, while supporting distributed training, LoRA, HPO, and W&B. Key contributions include a unified trainer and registry for decoupling method/data/runtime, a reproducible artifact with compact benchmarks, plug-ins for HuggingFace backbones and Optuna HPO, and an extensibility recipe for adding new SSL objectives. The framework enables reproducible, scalable cross-domain SSL experiments and accelerates method synthesis and benchmark development.

Abstract

We present PrismSSL, a Python library that unifies state-of-the-art self-supervised learning (SSL) methods across audio, vision, graphs, and cross-modal settings in a single, modular codebase. The goal of the demo is to show how researchers and practitioners can: (i) install, configure, and run pretext training with a few lines of code; (ii) reproduce compact benchmarks; and (iii) extend the framework with new modalities or methods through clean trainer and dataset abstractions. PrismSSL is packaged on PyPI, released under the MIT license, integrates tightly with HuggingFace Transformers, and provides quality-of-life features such as distributed training in PyTorch, Optuna-based hyperparameter search, LoRA fine-tuning for Transformer backbones, animated embedding visualizations for sanity checks, Weights & Biases logging, and colorful, structured terminal logs for improved usability and clarity. In addition, PrismSSL offers a graphical dashboard - built with Flask and standard web technologies - that enables users to configure and launch training pipelines with minimal coding. The artifact (code and data recipes) will be publicly available and reproducible.

PrismSSL: One Interface, Many Modalities; A Single-Interface Library for Multimodal Self-Supervised Learning

TL;DR

Self-supervised learning across audio, vision, graphs, and cross-modal domains is fragmented across domain-specific repositories, hindering fair comparisons and reproducibility. PrismSSL offers a unified, modular Python library that provides a single interface to configure, train, and evaluate SSL methods across modalities, with a registry, a generic trainer, and optional UI, while supporting distributed training, LoRA, HPO, and W&B. Key contributions include a unified trainer and registry for decoupling method/data/runtime, a reproducible artifact with compact benchmarks, plug-ins for HuggingFace backbones and Optuna HPO, and an extensibility recipe for adding new SSL objectives. The framework enables reproducible, scalable cross-domain SSL experiments and accelerates method synthesis and benchmark development.

Abstract

We present PrismSSL, a Python library that unifies state-of-the-art self-supervised learning (SSL) methods across audio, vision, graphs, and cross-modal settings in a single, modular codebase. The goal of the demo is to show how researchers and practitioners can: (i) install, configure, and run pretext training with a few lines of code; (ii) reproduce compact benchmarks; and (iii) extend the framework with new modalities or methods through clean trainer and dataset abstractions. PrismSSL is packaged on PyPI, released under the MIT license, integrates tightly with HuggingFace Transformers, and provides quality-of-life features such as distributed training in PyTorch, Optuna-based hyperparameter search, LoRA fine-tuning for Transformer backbones, animated embedding visualizations for sanity checks, Weights & Biases logging, and colorful, structured terminal logs for improved usability and clarity. In addition, PrismSSL offers a graphical dashboard - built with Flask and standard web technologies - that enables users to configure and launch training pipelines with minimal coding. The artifact (code and data recipes) will be publicly available and reproducible.

Paper Structure

This paper contains 19 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: PrismSSL pipeline overview.
  • Figure 2: PrismSSL layered architecture overview.
  • Figure 3: Proportional overview of supported PrismSSL modalities and methods.
  • Figure 4: PrismSSL dashboard for low-/no-code experimentation. The UI lets users (1) paste a PyTorch Dataset for train/val (test support coming in future versions), (2) select modality and SSL method in the Trainer specification and set core hyperparameters (batch size, optimizer, LR/WD, HPO, paths), (3) optionally define an evaluation template (classes, epochs, freeze-backbone), (4) review the structured run sheet and export/import JSON, and (5) launch runs via ‘Generate preview’ or ‘Start pretext training.’ All choices compile into a reproducible configuration.
  • Figure 5: Zero-shot Wav2CLIP probabilities on the cat–dog set.
  • ...and 1 more figures