Building parallel corpora from the Web – Mgr. Jan Pomikálek, Ph.D.
Mgr. Jan Pomikálek, Ph.D.
Advanced ('rigorózní') thesis
Building parallel corpora from the Web
Building parallel corpora from the Web
Abstract:
Parallel corpora are a valuable resource for many fields in computational linguistics, e.g. machine translation, cross language information retrieval (CLIR), lexicography. Unfortunately, the sources of parallel texts are very limited. On the other hand, there is World Wide Web with billions of Web pages, some of which are mutual translations. Though its potential for retrieving bilingual texts awaits …moreAbstract:
Parallel corpora are a valuable resource for many fields in computational linguistics, e.g. machine translation, cross language information retrieval (CLIR), lexicography. Unfortunately, the sources of parallel texts are very limited. On the other hand, there is World Wide Web with billions of Web pages, some of which are mutual translations. Though its potential for retrieving bilingual texts awaits …more
Language used: English
Date on which the thesis was submitted / produced: 17. 6. 2008
Identifier:
https://is.muni.cz/th/j3ahd/
Thesis defence
- Date of defence: 23. 6. 2008
Full text of thesis
Contents of on-line thesis archive
Published in Theses:- světu
Other ways of accessing the text
Institution archiving the thesis and making it accessible: Masarykova univerzita, Fakulta informatikyMasaryk University
Faculty of InformaticsAdvanced ('rigorózní řízení') programme / field:
Informatics / Informatics
Theses on a related topic
-
Corpora from reddit.com texts
Jan Brichta -
The use of "Once upon a time" in a corpus of fairy tales and in the British National Corpus
Mária Kopecká -
Learner Translation Corpus: CELTraC (Czech-English Learner Translation Corpus)
Kristýna Štěpánková -
Český Brown Corpus
David Krňávek -
Il nuovo corpus di italiano L2 della Università Masaryk di Brno: raccolta e organizzazione dei dati.
Petra Kaňoková -
Traducción de las formas del gerundio del español al checo: Análisis a través del corpus paralelo InterCorp
Ilona Mužátková -
Funções comunicativas e textuais dos dois pontos. Análise do uso na escrita jornalística brasileira baseada no corpus Linguateca
Andrea Podskalská -
Adaptation sémantique et orthographique des verbes empruntés à l’anglais : le rôle du corpus linguistique
Klára Halodová