Ukrainian Visual Word Sense Disambiguation Benchmark

Yurii Laba; Yaryna Mohytych; Ivanna Rohulia; Halyna Kyryleyza; Hanna Dydyk-Meush; Oles Dobosevych; Rostyslav Hryniv

Ukrainian Visual Word Sense Disambiguation Benchmark

Yurii Laba, Yaryna Mohytych, Ivanna Rohulia, Halyna Kyryleyza, Hanna Dydyk-Meush, Oles Dobosevych, Rostyslav Hryniv

Abstract

This study presents a benchmark for evaluating the Visual Word Sense Disambiguation (Visual-WSD) task in Ukrainian. The main goal of the Visual-WSD task is to identify, with minimal contextual information, the most appropriate representation of a given ambiguous word from a set of ten images. To construct this benchmark, we followed a methodology similar to that proposed by (CITATION), who previously introduced benchmarks for the Visual-WSD task in English, Italian, and Farsi. This approach allows us to incorporate the Ukrainian benchmark into a broader framework for cross-language model performance comparisons. We collected the benchmark data semi-automatically and refined it with input from domain experts. We then assessed eight multilingual and multimodal large language models using this benchmark. All tested models performed worse than the zero-shot CLIP-based baseline model (CITATION) used by (CITATION) for the English Visual-WSD task. Our analysis revealed a significant performance gap in the Visual-WSD task between Ukrainian and English.

Ukrainian Visual Word Sense Disambiguation Benchmark

Abstract

Paper Structure (14 sections, 2 equations, 2 figures, 1 table)

This paper contains 14 sections, 2 equations, 2 figures, 1 table.

Introduction
Related Works
Approach
Data sources
The methodology for constructing the benchmark
Evaluation
Evaluation metrics
Results
Conclusion
Future plans
Limitations
Ethical Statement
Bibliographical References
Language Resource References

Figures (2)

Figure 1: An illustration of GPT4-Vision visual hallucination caused by ambiguous target word.
Figure 2: Example of the benchmark entry. The word T2A Коса (en: braid, transl: kosa) is ambiguous. It corresponds to the meaning T2A Заплетене волосся; довге волосся (en: braided hair; long hair, transl: zapletene volossya; dovhe volossya). The word T2A Волосся (en: hair, transl: volossya) is the trigger word. The image that corresponds to the intended meaning is b (underlined). The other three images are examples of negative samples. Note: While the task involves nine negative images, we only display three negative images for simplicity.

Ukrainian Visual Word Sense Disambiguation Benchmark

Abstract

Ukrainian Visual Word Sense Disambiguation Benchmark

Authors

Abstract

Table of Contents

Figures (2)