Building parallel corpora from the Web

Pomikálek, Jan

EN SKPřihlásit se Přihlásit se (EduID)

Theses 1zmwl9

Building parallel corpora from the Web – Mgr. Jan Pomikálek, Ph.D.

Zpět na vyhledávání

Mgr. Jan Pomikálek, Ph.D.

Rigorózní práce

Building parallel corpora from the Web

Anotace:

Parallel corpora are a valuable resource for many fields in computational linguistics, e.g. machine translation, cross language information retrieval (CLIR), lexicography. Unfortunately, the sources of parallel texts are very limited. On the other hand, there is World Wide Web with billions of Web pages, some of which are mutual translations. Though its potential for retrieving bilingual texts awaits …více

Abstract:

Parallel corpora are a valuable resource for many fields in computational linguistics, e.g. machine translation, cross language information retrieval (CLIR), lexicography. Unfortunately, the sources of parallel texts are very limited. On the other hand, there is World Wide Web with billions of Web pages, some of which are mutual translations. Though its potential for retrieving bilingual texts awaits …více

Keywords

corpus text corpora web-derived corpora parallel corpora

Jazyk práce: angličtina

Datum vytvoření / odevzdání či podání práce: 17. 6. 2008

Identifikátor: https://is.muni.cz/th/j3ahd/

Obhajoba závěrečné práce

Obhajoba proběhla 23. 6. 2008

Citační záznam

Citovat tuto práci

Citace dle ISO 690:

POMIKÁLEK, Jan. \textit{Building parallel corpora from the Web}. Online. Rigorózní práce. Brno: Masarykova univerzita, Fakulta informatiky. 2008. Dostupné z: https://theses.cz/id/1zmwl9/.

Plný text práce

Obsah online archivu závěrečné práce

Zveřejněno v Theses:

světu

Jak jinak získat přístup k textu

Instituce archivující a zpřístupňující práci: Masarykova univerzita, Fakulta informatiky

Odkaz na adresář do lokálního úložiště instituce

Masarykova univerzita

Fakulta informatiky

Rigorózní řízení / obor:
Informatika / Informatika

Práce na příbuzné téma

Better Web Corpora For Corpus Linguistics And NLP
Vít Suchomel
Learner Translation Corpus: CELTraC (Czech-English Learner Translation Corpus)
Kristýna Štěpánková
The use of "Once upon a time" in a corpus of fairy tales and in the British National Corpus
Mária Kopecká
Los corpus CREA y CORDE en el contexto de los corpus lingüísticos
Jitka Hrušková
Český Brown Corpus
David Krňávek
Traducción de las formas del gerundio del español al checo: Análisis a través del corpus paralelo InterCorp
Ilona Mužátková
Be going to – current usage based on the corpus analysis
Kateřina Milerová