Predicting Word Similarity in Context with Referential Translation Machines
Ergun Biçici
TL;DR
The paper reframes word similarity in context as a machine translation performance prediction (MTPP) problem using Referential Translation Machines (RTMs), enabling a unified training/test representation for context-sensitive similarity scoring. It constructs intra- and inter-context word-pair similarity features (wps) and trains unsupervised models to approximate Graded Word Similarity in Context (GWSC) signals, achieving top results. Beyond GWSC, the approach extends to emotion intensity prediction in multi-language tweets and to identifying discriminative semantic attributes, using stacked RTM models that integrate predictions from multiple MTPP predictors. The work demonstrates the versatility and transferability of RTMs across tasks, supported by a scalable data pipeline with interpretants and extensive feature engineering. Overall, RTMs offer a principled, data-driven path to automatic, context-aware semantic evaluation with strong empirical performance.
Abstract
We identify the similarity between two words in English by casting the task as machine translation performance prediction (MTPP) between the words given the context and the distance between their similarities. We use referential translation machines (RTMs), which allows a common representation for training and test sets and stacked machine learning models. RTMs can achieve the top results in Graded Word Similarity in Context (GWSC) task.
