Evaluating Distributed Representations for Multi-Level Lexical Semantics: A Research Proposal

Zhu Liu

Evaluating Distributed Representations for Multi-Level Lexical Semantics: A Research Proposal

Zhu Liu

TL;DR

The paper tackles how distributed representations from PLMs and LLMs encode lexical semantics, assessed across three levels—local, global, and mixed—with cross-lingual benchmarks. It introduces a formal four-space model $(\mathcal{W}, \mathcal{M}, \mathcal{R}, \mathcal{C})$ and level-specific likelihoods, e.g., $p(e)=p(e|w,s)$, $p(\mathcal{M})=p([e_i]_N|\mathcal{W})$, and $p(\mathcal{C})=p([c_i]_M|\mathcal{W}, \mathcal{R})$, to structure evaluations. It develops analyses of local sense continuity, uncertainty in WSD, semantic roles through minimal-language pairs, global word networks, and mixed conceptual spaces via Semantic Map Models, leveraging both PLMs and LLMs. By exposing extraction, probe-design, dataset bias, and scaling-related interpretability challenges, the work aims to advance transparent, cross-linguistic lexical semantics and close gaps between computational models and linguistic theory.

Abstract

Modern neural networks (NNs), trained on extensive raw sentence data, construct distributed representations by compressing individual words into dense, continuous, high-dimensional vectors. These representations are expected to capture multi-level lexical meaning. In this thesis, our objective is to examine the efficacy of distributed representations from NNs in encoding lexical meaning. Initially, we identify and formalize three levels of lexical semantics: \textit{local}, \textit{global}, and \textit{mixed} levels. Then, for each level, we evaluate language models by collecting or constructing multilingual datasets, leveraging various language models, and employing linguistic analysis theories. This thesis builds a bridge between computational models and lexical semantics, aiming to complement each other.

Evaluating Distributed Representations for Multi-Level Lexical Semantics: A Research Proposal

TL;DR

and level-specific likelihoods, e.g.,

, and

, to structure evaluations. It develops analyses of local sense continuity, uncertainty in WSD, semantic roles through minimal-language pairs, global word networks, and mixed conceptual spaces via Semantic Map Models, leveraging both PLMs and LLMs. By exposing extraction, probe-design, dataset bias, and scaling-related interpretability challenges, the work aims to advance transparent, cross-linguistic lexical semantics and close gaps between computational models and linguistic theory.

Abstract

Paper Structure (19 sections, 3 equations, 2 figures)

This paper contains 19 sections, 3 equations, 2 figures.

Introduction
Research Proposal
Formalization
Basic Notions
Example 1
Example 2
Example 3
Three Levels
Local Level
Global Level
Mixed Level
Evaluation
Models and Representations
Local Level: Sense Continuity
Uncertainty in WSD
...and 4 more sections

Figures (2)

Figure 1: Graph models for four spaces: lexicon $\mathcal{W}$, corpus $\mathcal{R}$, concept $\mathcal{C}$, and prototypical meaning $\mathcal{M}$. The three levels and their respective scopes are identified. The shaded circle represents an observable variable, while the unshaded one indicates an unobservable variable.
Figure 2: Examples of the four spaces: lexicon $\mathcal{W}$, corpus $\mathcal{R}$, concept $\mathcal{C}$, and prototypical meaning $\mathcal{M}$.

Evaluating Distributed Representations for Multi-Level Lexical Semantics: A Research Proposal

TL;DR

Abstract

Evaluating Distributed Representations for Multi-Level Lexical Semantics: A Research Proposal

Authors

TL;DR

Abstract

Table of Contents

Figures (2)