Table of Contents
Fetching ...

A Partial Replication of MaskFormer in TensorFlow on TPUs for the TensorFlow Model Garden

Vishal Purohit, Wenxin Jiang, Akshath R. Ravikiran, James C. Davis

TL;DR

The paper tackles reproducibility by attempting to replicate MaskFormer from PyTorch to TensorFlow on TPUs, highlighting cross framework challenges. It implements MaskFormer in TensorFlow using TensorFlow Model Garden components, adapting data pipelines, backbones, transformer decoders, and a multi task loss with Dice, Focal, and classification terms. Verification is performed via shape testing, unit tests, and differential testing, but the replication remains partial due to the lack of open TensorFlow checkpoints and TPU-specific hurdles, including weight transfer. Despite these limitations, the work provides a practical blueprint, including a custom PyTorch to TensorFlow weight converter and a public repository, to advance reproducibility and cross framework reuse in segmentation research.

Abstract

This paper undertakes the task of replicating the MaskFormer model a universal image segmentation model originally developed using the PyTorch framework, within the TensorFlow ecosystem, specifically optimized for execution on Tensor Processing Units (TPUs). Our implementation exploits the modular constructs available within the TensorFlow Model Garden (TFMG), encompassing elements such as the data loader, training orchestrator, and various architectural components, tailored and adapted to meet the specifications of the MaskFormer model. We address key challenges encountered during the replication, non-convergence issues, slow training, adaptation of loss functions, and the integration of TPU-specific functionalities. We verify our reproduced implementation and present qualitative results on the COCO dataset. Although our implementation meets some of the objectives for end-to-end reproducibility, we encountered challenges in replicating the PyTorch version of MaskFormer in TensorFlow. This replication process is not straightforward and requires substantial engineering efforts. Specifically, it necessitates the customization of various components within the TFMG, alongside thorough verification and hyper-parameter tuning. The replication is available at: https://github.com/PurdueDualityLab/tf-maskformer/tree/main/official/projects/maskformer

A Partial Replication of MaskFormer in TensorFlow on TPUs for the TensorFlow Model Garden

TL;DR

The paper tackles reproducibility by attempting to replicate MaskFormer from PyTorch to TensorFlow on TPUs, highlighting cross framework challenges. It implements MaskFormer in TensorFlow using TensorFlow Model Garden components, adapting data pipelines, backbones, transformer decoders, and a multi task loss with Dice, Focal, and classification terms. Verification is performed via shape testing, unit tests, and differential testing, but the replication remains partial due to the lack of open TensorFlow checkpoints and TPU-specific hurdles, including weight transfer. Despite these limitations, the work provides a practical blueprint, including a custom PyTorch to TensorFlow weight converter and a public repository, to advance reproducibility and cross framework reuse in segmentation research.

Abstract

This paper undertakes the task of replicating the MaskFormer model a universal image segmentation model originally developed using the PyTorch framework, within the TensorFlow ecosystem, specifically optimized for execution on Tensor Processing Units (TPUs). Our implementation exploits the modular constructs available within the TensorFlow Model Garden (TFMG), encompassing elements such as the data loader, training orchestrator, and various architectural components, tailored and adapted to meet the specifications of the MaskFormer model. We address key challenges encountered during the replication, non-convergence issues, slow training, adaptation of loss functions, and the integration of TPU-specific functionalities. We verify our reproduced implementation and present qualitative results on the COCO dataset. Although our implementation meets some of the objectives for end-to-end reproducibility, we encountered challenges in replicating the PyTorch version of MaskFormer in TensorFlow. This replication process is not straightforward and requires substantial engineering efforts. Specifically, it necessitates the customization of various components within the TFMG, alongside thorough verification and hyper-parameter tuning. The replication is available at: https://github.com/PurdueDualityLab/tf-maskformer/tree/main/official/projects/maskformer
Paper Structure (24 sections, 10 figures, 4 tables)

This paper contains 24 sections, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Description of the symbols used in this document.
  • Figure 2: Overview of the TensorFlow Model Garden (TFMG) architecture. The diagram illustrates the modular design of the TFMG framework, highlighting the key components such as 'task_factory' for initializing machine learning tasks, 'train_lib' for training operations, 'train_utils' for training utilities, 'core' for central functionalities, 'modeling' for model definitions, 'performance' for tracking and optimization, 'distribute_utils' for distributed training support, and 'tfm_flags' for configuration management. Arrows indicate the direction of dependencies between modules.
  • Figure 3: Illustration the workflow components of a machine learning task, centralized around 'PanopticTask'. It includes the configuration, model building, initialization, and input preparation stages, followed by training and validation steps. 'build_losses' and 'build_metrics' are essential for calculating model performance during training.
  • Figure 4: Illustration of the data pre-processing pipeline for panoptic segmentation task, specifically for training MaskFormer model with 'COCO' dataset. The process begins with the raw image and annotation files (JSON format), which are serialized into TFRecord format, a TensorFlow-specific binary storage format. Subsequently, these TFRecords are deserialized, and the data is passed through a series of preprocessing steps including normalization, augmentation, and padding to ensure uniformity of the input data. The binary mask generation and the label mapping are integral to preparing segmentation tasks. The output is then converted to a format compatible with the model’s data loader, which feeds the processed data into the training pipeline. (zoom in for a better viewing experience)
  • Figure 5: Illustration of MaskFormer architecture (reproduced from cheng2021maskformer.). The process begins with an input image (A), which is processed by a ResNet-50 backbone to extract feature maps. Multi-scale features (B) are generated from the backbone and then decoded by a Pixel Decoder (C) to create refined feature maps. These are fed into a Transformer Decoder (D), along with a set of learned queries that interact with the feature maps to generate mask embeddings (F). An MLP head (E) processes the mask embedding to produce the final segmentation output, which consists of binary masks for each query, along with their corresponding output labels, indicating the presence of specific objects or regions within the input image.
  • ...and 5 more figures