Table of Contents
Fetching ...

LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

Tuan Dinh, Yuchen Zeng, Ruisu Zhang, Ziqian Lin, Michael Gira, Shashank Rajput, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee

TL;DR

LIFT proposes Language-Interfaced Fine-Tuning to solve non-language tasks by converting data into natural-language prompts and fine-tuning a pretrained language model without changing architecture or loss. Across classification and regression benchmarks, LIFT achieves competitive performance with strong baselines, elucidates inductive biases, and demonstrates robustness, calibration, and data-generation capabilities. The study also shows that feature-name prompts, two-stage pretraining with synthetic data, and data augmentation can enhance performance, especially in low-data regimes, while natural-language pretraining is essential for effectiveness. Overall, LIFT offers a promising no-code approach to broad-domain ML with language models, while also outlining important limitations and directions for future work.

Abstract

Fine-tuning pretrained language models (LMs) without making any architectural changes has become a norm for learning various language downstream tasks. However, for non-language downstream tasks, a common practice is to employ task-specific designs for input, output layers, and loss functions. For instance, it is possible to fine-tune an LM into an MNIST classifier by replacing the word embedding layer with an image patch embedding layer, the word token output layer with a 10-way output layer, and the word prediction loss with a 10-way classification loss, respectively. A natural question arises: Can LM fine-tuning solve non-language downstream tasks without changing the model architecture or loss function? To answer this, we propose Language-Interfaced Fine-Tuning (LIFT) and study its efficacy and limitations by conducting an extensive empirical study on a suite of non-language classification and regression tasks. LIFT does not make any changes to the model architecture or loss function, and it solely relies on the natural language interface, enabling "no-code machine learning with LMs." We find that LIFT performs comparably well across a wide range of low-dimensional classification and regression tasks, matching the performances of the best baselines in many cases, especially for the classification tasks. We also report experimental results on the fundamental properties of LIFT, including inductive bias, robustness, and sample complexity. We also analyze the effect of pretraining on LIFT and a few properties/techniques specific to LIFT, e.g., context-aware learning via appropriate prompting, calibrated predictions, data generation, and two-stage fine-tuning. Our code is available at https://github.com/UW-Madison-Lee-Lab/LanguageInterfacedFineTuning.

LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

TL;DR

LIFT proposes Language-Interfaced Fine-Tuning to solve non-language tasks by converting data into natural-language prompts and fine-tuning a pretrained language model without changing architecture or loss. Across classification and regression benchmarks, LIFT achieves competitive performance with strong baselines, elucidates inductive biases, and demonstrates robustness, calibration, and data-generation capabilities. The study also shows that feature-name prompts, two-stage pretraining with synthetic data, and data augmentation can enhance performance, especially in low-data regimes, while natural-language pretraining is essential for effectiveness. Overall, LIFT offers a promising no-code approach to broad-domain ML with language models, while also outlining important limitations and directions for future work.

Abstract

Fine-tuning pretrained language models (LMs) without making any architectural changes has become a norm for learning various language downstream tasks. However, for non-language downstream tasks, a common practice is to employ task-specific designs for input, output layers, and loss functions. For instance, it is possible to fine-tune an LM into an MNIST classifier by replacing the word embedding layer with an image patch embedding layer, the word token output layer with a 10-way output layer, and the word prediction loss with a 10-way classification loss, respectively. A natural question arises: Can LM fine-tuning solve non-language downstream tasks without changing the model architecture or loss function? To answer this, we propose Language-Interfaced Fine-Tuning (LIFT) and study its efficacy and limitations by conducting an extensive empirical study on a suite of non-language classification and regression tasks. LIFT does not make any changes to the model architecture or loss function, and it solely relies on the natural language interface, enabling "no-code machine learning with LMs." We find that LIFT performs comparably well across a wide range of low-dimensional classification and regression tasks, matching the performances of the best baselines in many cases, especially for the classification tasks. We also report experimental results on the fundamental properties of LIFT, including inductive bias, robustness, and sample complexity. We also analyze the effect of pretraining on LIFT and a few properties/techniques specific to LIFT, e.g., context-aware learning via appropriate prompting, calibrated predictions, data generation, and two-stage fine-tuning. Our code is available at https://github.com/UW-Madison-Lee-Lab/LanguageInterfacedFineTuning.
Paper Structure (81 sections, 1 equation, 23 figures, 25 tables)

This paper contains 81 sections, 1 equation, 23 figures, 25 tables.

Figures (23)

  • Figure 1: A high-level illustration of the Language-Interfaced Fine-Tuning (LIFT) framework.LIFT has a two-phase procedure: (1) converting the dataset into sentences and (2) fine-tuning the pretrained language model (e.g., GPT) on the obtained sentences. This figure visualizes how LIFT can be applied to the Iris classification task. We first convert the Iris dataset into plain English sentences (left). Since feature names and the task description are available for this task, one could incorporate them as part of the prompt (as option 1 in the figure). (In Sec. \ref{['sec:feature_names']}, we show that adding such contextual information to prompts helps LIFT achieve higher predictive accuracy.) One may also choose to use a simpler prompt with a generic naming convention ($x_1, x_2, \ldots, x_d$) for $p$ features (as option 2 in the figure). After the sentence conversion step, LIFT fine-tunes a pretrained LM with the sentence set without making any changes to model architecture or loss. At inference time, we convert the test samples to a sentence form using the same prompt, excluding the label part. LIFT performs surprisingly well in various non-language regression/classification tasks, and we summarize our main findings in Table \ref{['tab:summary_basic_findings']}. Note that to obtain a model for a given task, all we need here is to design proper sentence templates for LIFT and no changes to architecture or loss functions are needed.
  • Figure 2: Approximating various functions with LIFT using GPT-J. We visualize the target functions (first row) and the predictor functions learned by LIFT on GPT-J (second row). Blue dots show the $1000$ training samples. One can observe that LIFT well approximates the target functions.
  • Figure 3: Decision boundary visualization. We use three snapshots of a trained network to construct datasets having labels as their predictions (the first column). Top to bottom: snapshots with more training epochs, corresponding to more complex boundaries. LIFT/GPTs adapt well on different boundaries.
  • Figure 4: Given only the digit number.
  • Figure 5: Given the digit number and a half of image pixels.
  • ...and 18 more figures

Theorems & Definitions (1)

  • Remark