Table of Contents
Fetching ...

torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP

Yoshitomo Matsubara

TL;DR

The paper addresses reproducibility in deep learning by upgrading torchdistill to be task-agnostic and compatible with third-party libraries. It introduces PyYAML-based instantiation, generalized modules, and reimplemented models/methods to support NLP and CV tasks, and the KD objective follows $L = \alpha L_{CE}(\hat{y}, y) + (1 - \alpha) \tau^2 L_{KL}(p, q)$. A GLUE-based NLP demonstration shows reproducible fine-tuning of BERT and KD experiments with Hugging Face tools, with all weights and configurations published. The work provides Colab demos and starter scripts across NLP and CV, enabling coding-free, reproducible experiments, and publishes 27 NLP models and 14 CV models for broad reuse.

Abstract

Reproducibility in scientific work has been becoming increasingly important in research communities such as machine learning, natural language processing, and computer vision communities due to the rapid development of the research domains supported by recent advances in deep learning. In this work, we present a significantly upgraded version of torchdistill, a modular-driven coding-free deep learning framework significantly upgraded from the initial release, which supports only image classification and object detection tasks for reproducible knowledge distillation experiments. To demonstrate that the upgraded framework can support more tasks with third-party libraries, we reproduce the GLUE benchmark results of BERT models using a script based on the upgraded torchdistill, harmonizing with various Hugging Face libraries. All the 27 fine-tuned BERT models and configurations to reproduce the results are published at Hugging Face, and the model weights have already been widely used in research communities. We also reimplement popular small-sized models and new knowledge distillation methods and perform additional experiments for computer vision tasks.

torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP

TL;DR

The paper addresses reproducibility in deep learning by upgrading torchdistill to be task-agnostic and compatible with third-party libraries. It introduces PyYAML-based instantiation, generalized modules, and reimplemented models/methods to support NLP and CV tasks, and the KD objective follows . A GLUE-based NLP demonstration shows reproducible fine-tuning of BERT and KD experiments with Hugging Face tools, with all weights and configurations published. The work provides Colab demos and starter scripts across NLP and CV, enabling coding-free, reproducible experiments, and publishes 27 NLP models and 14 CV models for broad reuse.

Abstract

Reproducibility in scientific work has been becoming increasingly important in research communities such as machine learning, natural language processing, and computer vision communities due to the rapid development of the research domains supported by recent advances in deep learning. In this work, we present a significantly upgraded version of torchdistill, a modular-driven coding-free deep learning framework significantly upgraded from the initial release, which supports only image classification and object detection tasks for reproducible knowledge distillation experiments. To demonstrate that the upgraded framework can support more tasks with third-party libraries, we reproduce the GLUE benchmark results of BERT models using a script based on the upgraded torchdistill, harmonizing with various Hugging Face libraries. All the 27 fine-tuned BERT models and configurations to reproduce the results are published at Hugging Face, and the model weights have already been widely used in research communities. We also reimplement popular small-sized models and new knowledge distillation methods and perform additional experiments for computer vision tasks.
Paper Structure (12 sections, 2 equations, 2 figures, 4 tables)

This paper contains 12 sections, 2 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Initial design of torchdistillmatsubara2021torchdistill vs. v1.0.0 in this work.
  • Figure 2: Example of two different ways to build a sequence of transforms in torchvision (transform) for CIFAR-10 dataset. The initial version (top, left) defines a function for torchvision build_transform in torchdistill and gives the function a list of dict objects in the left PyYAML as transform_params_config. torchdistill in this work (right) can build exactly the same transform by instantiating each of the transform classes step-by-step with !import_call, one of our pre-defined PyYAML constructors in the upgraded torchdistill.