Table of Contents
Fetching ...

Reverse Probing: Evaluating Knowledge Transfer via Finetuned Task Embeddings for Coreference Resolution

Tatiana Anikina, Arne Binder, David Harbecke, Stalin Varanasi, Leonhard Hennig, Simon Ostermann, Sebastian Möller, Josef van Genabith

TL;DR

The paper introduces reverse probing to evaluate knowledge transfer from embeddings of simple source tasks to a complex target task, coreference resolution. By extracting and aggregating embeddings from multiple fine-tuned source models and testing various layer truncations and aggregation schemes, the study investigates which sources, layers, and combination strategies best support coreference. Key findings show that semantic-similarity sources like MRPC and early-to-mid layer representations often yield the strongest transfer, that attention-based aggregation and embedding context from multiple layers improve results, and that combining multiple sources generally outperforms single-source approaches. The work offers practical guidance for embedding-level task transfer and suggests directions for more energy-efficient, knowledge-reuse-oriented NLP systems.

Abstract

In this work, we reimagine classical probing to evaluate knowledge transfer from simple source to more complex target tasks. Instead of probing frozen representations from a complex source task on diverse simple target probing tasks (as usually done in probing), we explore the effectiveness of embeddings from multiple simple source tasks on a single target task. We select coreference resolution, a linguistically complex problem requiring contextual understanding, as focus target task, and test the usefulness of embeddings from comparably simpler tasks tasks such as paraphrase detection, named entity recognition, and relation extraction. Through systematic experiments, we evaluate the impact of individual and combined task embeddings. Our findings reveal that task embeddings vary significantly in utility for coreference resolution, with semantic similarity tasks (e.g., paraphrase detection) proving most beneficial. Additionally, representations from intermediate layers of fine-tuned models often outperform those from final layers. Combining embeddings from multiple tasks consistently improves performance, with attention-based aggregation yielding substantial gains. These insights shed light on relationships between task-specific representations and their adaptability to complex downstream tasks, encouraging further exploration of embedding-level task transfer.

Reverse Probing: Evaluating Knowledge Transfer via Finetuned Task Embeddings for Coreference Resolution

TL;DR

The paper introduces reverse probing to evaluate knowledge transfer from embeddings of simple source tasks to a complex target task, coreference resolution. By extracting and aggregating embeddings from multiple fine-tuned source models and testing various layer truncations and aggregation schemes, the study investigates which sources, layers, and combination strategies best support coreference. Key findings show that semantic-similarity sources like MRPC and early-to-mid layer representations often yield the strongest transfer, that attention-based aggregation and embedding context from multiple layers improve results, and that combining multiple sources generally outperforms single-source approaches. The work offers practical guidance for embedding-level task transfer and suggests directions for more energy-efficient, knowledge-reuse-oriented NLP systems.

Abstract

In this work, we reimagine classical probing to evaluate knowledge transfer from simple source to more complex target tasks. Instead of probing frozen representations from a complex source task on diverse simple target probing tasks (as usually done in probing), we explore the effectiveness of embeddings from multiple simple source tasks on a single target task. We select coreference resolution, a linguistically complex problem requiring contextual understanding, as focus target task, and test the usefulness of embeddings from comparably simpler tasks tasks such as paraphrase detection, named entity recognition, and relation extraction. Through systematic experiments, we evaluate the impact of individual and combined task embeddings. Our findings reveal that task embeddings vary significantly in utility for coreference resolution, with semantic similarity tasks (e.g., paraphrase detection) proving most beneficial. Additionally, representations from intermediate layers of fine-tuned models often outperform those from final layers. Combining embeddings from multiple tasks consistently improves performance, with attention-based aggregation yielding substantial gains. These insights shed light on relationships between task-specific representations and their adaptability to complex downstream tasks, encouraging further exploration of embedding-level task transfer.

Paper Structure

This paper contains 15 sections, 3 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Probing workflow with Coreference Resolution (Coref) as target task and four different source tasks: Relation Extraction (RE), Question Answering (QA), Named Entity Recognition (NER), and Paraphrase Detection (MRPC).
  • Figure 2: Source task models: CorefTarget, BERT, MRPC, RE, QA, NER, SemTag, Chunking, NER-dslim, POS
  • Figure 3: Average cosine similarity between the embeddings of the source tasks and the target coreference task, averaged across all tokens for 15 batches
  • Figure 4: Mean vs attention aggregation (full setting)
  • Figure 5: Source task model performance truncated to the best layer (in parentheses) with mean aggregation
  • ...and 4 more figures