Deriving the Meaning of Out-of-Vocabulary Words – Bc. Samuel Gazda
Bc. Samuel Gazda
Master's thesis
Deriving the Meaning of Out-of-Vocabulary Words
Deriving the Meaning of Out-of-Vocabulary Words
Abstract:
Táto práca sa zaoberá klasifikáciou a normalizáciou slov mimo slovnej zásoby (out-of-vocabulary words alebo OOV words). Predkladá dva typy klasifikátorov. Prvý je založený na postupnom vyhodnocovaní poskytnutých pravidiel a druhý používa jazykové modely postavené na architektúre RoBERTa. Obsahuje anotovaný dataset pozostávajúci z OOV slov a ich kontextov na trénovanie a ohodnotenie predstavených prístupov …moreAbstract:
This thesis deals with the classification and the normalization of out-of-vocabulary (OOV) words. It presents two types of classifiers, one based on the sequential evaluation of given rules and the second one based on the RoBERTa-based language models. It contains an annotated dataset consisting of OOV words and their contexts to train and evaluate the presented approaches. Finally, it presents a modular …more
Language used: English
Date on which the thesis was submitted / produced: 21. 5. 2024
Identifier:
https://is.muni.cz/th/u58r4/
Thesis defence
- Date of defence: 19. 6. 2024
- Supervisor: doc. RNDr. Aleš Horák, Ph.D.
- Reader: RNDr. Pavel Šmerk, Ph.D.
Full text of thesis
Contents of on-line thesis archive
Published in Theses:- světu
Other ways of accessing the text
Institution archiving the thesis and making it accessible: Masarykova univerzita, Fakulta informatikyMasaryk University
Faculty of InformaticsMaster programme / field:
Artificial intelligence and data processing / Machine learning and artificial intelligence
Theses on a related topic
- No theses on a related topic available.