Table of Contents
Fetching ...

How to Query Language Models?

Leonard Adolphs, Shehzaad Dhuliawala, Thomas Hofmann

TL;DR

The paper investigates how to extract factual and commonsense knowledge from masked language models without fine-tuning. It introduces querying by example, appending demonstrations of the same relation to cloze prompts to disambiguate the information need, and evaluates on LAMA, TextWorld Commonsense, and BATS. Results show substantial gains on T-REx and ConceptNet with about 10 demonstrations, while Google-RE can be adversely affected, highlighting dataset-dependent dynamics; the improvements are mediated by embedding-space disambiguation and are achieved with a single forward pass. Overall, the work demonstrates a simple, efficient prompting strategy that uncovers more latent knowledge in LMs and informs practical prompt design for knowledge retrieval tasks.

Abstract

Large pre-trained language models (LMs) are capable of not only recovering linguistic but also factual and commonsense knowledge. To access the knowledge stored in mask-based LMs, we can use cloze-style questions and let the model fill in the blank. The flexibility advantage over structured knowledge bases comes with the drawback of finding the right query for a certain information need. Inspired by human behavior to disambiguate a question, we propose to query LMs by example. To clarify the ambivalent question "Who does Neuer play for?", a successful strategy is to demonstrate the relation using another subject, e.g., "Ronaldo plays for Portugal. Who does Neuer play for?". We apply this approach of querying by example to the LAMA probe and obtain substantial improvements of up to 37.8% for BERT-large on the T-REx data when providing only 10 demonstrations--even outperforming a baseline that queries the model with up to 40 paraphrases of the question. The examples are provided through the model's context and thus require neither fine-tuning nor an additional forward pass. This suggests that LMs contain more factual and commonsense knowledge than previously assumed--if we query the model in the right way.

How to Query Language Models?

TL;DR

The paper investigates how to extract factual and commonsense knowledge from masked language models without fine-tuning. It introduces querying by example, appending demonstrations of the same relation to cloze prompts to disambiguate the information need, and evaluates on LAMA, TextWorld Commonsense, and BATS. Results show substantial gains on T-REx and ConceptNet with about 10 demonstrations, while Google-RE can be adversely affected, highlighting dataset-dependent dynamics; the improvements are mediated by embedding-space disambiguation and are achieved with a single forward pass. Overall, the work demonstrates a simple, efficient prompting strategy that uncovers more latent knowledge in LMs and informs practical prompt design for knowledge retrieval tasks.

Abstract

Large pre-trained language models (LMs) are capable of not only recovering linguistic but also factual and commonsense knowledge. To access the knowledge stored in mask-based LMs, we can use cloze-style questions and let the model fill in the blank. The flexibility advantage over structured knowledge bases comes with the drawback of finding the right query for a certain information need. Inspired by human behavior to disambiguate a question, we propose to query LMs by example. To clarify the ambivalent question "Who does Neuer play for?", a successful strategy is to demonstrate the relation using another subject, e.g., "Ronaldo plays for Portugal. Who does Neuer play for?". We apply this approach of querying by example to the LAMA probe and obtain substantial improvements of up to 37.8% for BERT-large on the T-REx data when providing only 10 demonstrations--even outperforming a baseline that queries the model with up to 40 paraphrases of the question. The examples are provided through the model's context and thus require neither fine-tuning nor an additional forward pass. This suggests that LMs contain more factual and commonsense knowledge than previously assumed--if we query the model in the right way.

Paper Structure

This paper contains 25 sections, 4 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: BERT's top-3 predictions with probabilites when prompted with the cloze-style question (top) versus when prompted with one additional example of the same relation (bottom).
  • Figure 2: P@1 score for TREx over the number of examples provided. The dashed line shows the baseline value for when no additional example is given.
  • Figure 3: BERT-large's [CLS]-token embedding of a subset of T-REx queries visualized in two dimensions using t-SNE JMLR:v9:vandermaaten08a. Each point is a single query and the color represents the corresponding relation class. The ellipses depict the 2-std confidence intervals. The individual images show the clustering for both the natural language and the ([s]; [o]) template with either no examples or ten examples provided.
  • Figure 4: Normalized score for the hard games of the TWC environment over the number of examples provided for albert-xxlarge. The dashed baselines are the static agent with a uniform prior and the TWC commonsense agent by murugesan2020textbased. The shaded regions depict the standard deviation over 10 runs.
  • Figure 5: P@1 score on BATS over the number of examples provided. The performance of the GloVe and SVD benchmark models by GladkovaDrozd2016 is shown with the black, dashed lines.
  • ...and 2 more figures