Textové korpusy ze závěrečných prací

Šmíd, Martin

CS SKLog in Log in (EduId)

Theses dq09ut

Textové korpusy ze závěrečných prací – Bc. Martin Šmíd

Zpět na vyhledávání

Bc. Martin Šmíd

Bachelor's thesis

Textové korpusy ze závěrečných prací

Text corpora from theses

Abstract:

Cílem práce je vytvořit nástroj pro stahování závěrečných prací z Informačního systému a vybudovat z nich textové korpusy. První část práce přibližuje korpusovou lingvistiku a využití jazykových korpusů. Následuje popis problematiky rozpoznání jazyka, která je důležitá pro odfiltrování nežádoucích textů. Ve třetí části je popsána implementace navrženého programu.

Abstract:

The aim of this bachelor thesis is to create a tool for downloading students theses from the Information system and to build text corpora. The corpus linguistics field is described in the first part of this thesis. After that a description of a language identification follows, which is important for filtering unwanted texts. In the third part the description of the tool implementation is given.

Keywords

jazykový korpus paralelní korpus korpusová lingvistika identifikace jazyka závěrečné práce Sketch Engine Python corpus parallel corpus corpus linguistics language identification theses

Language used: Czech

Date on which the thesis was submitted / produced: 19. 5. 2016

Identifier: https://is.muni.cz/th/anxmd/

Thesis defence

Date of defence: 23. 6. 2016
Supervisor: Mgr. et Mgr. Vít Baisa, Ph.D.
Reader: RNDr. Vít Suchomel

Citation record

Cite this text

ISO 690-compliant citation record:

ŠMÍD, Martin. \textit{Textové korpusy ze závěrečných prací}. Online. Bachelor's thesis. Brno: Masaryk University, Faculty of Informatics. 2016. Available from: https://theses.cz/id/dq09ut/.

{{Citace kvalifikační práce
 | příjmení = Šmíd
 | jméno = Martin
 | instituce = Masaryk University, Faculty of Informatics
 | titul = Textové korpusy ze závěrečných prací
 | url = https://theses.cz/id/dq09ut/
 | typ práce = Bachelor's thesis
 | vedoucí = Mgr. et Mgr. Vít Baisa, Ph.D.
 | rok = 2016
 | počet stran =
 | strany =
 | citace = 2024-04-25
 | poznámka =
 | jazyk = 
}}

Full text of thesis

Contents of on-line thesis archive

Published in Theses:

světu

Other ways of accessing the text

Institution archiving the thesis and making it accessible: Masarykova univerzita, Fakulta informatiky

Reference to the local database directory of the institution

Masaryk University

Faculty of Informatics

Bachelor programme / field:
Informatics / Artificial Intelligence and Natural Language Processing

Theses on a related topic

Identifikace sporného autorství ve forenzní lingvistice
Battseren BATERDENE
Translating (Ir)reversible Binomials: A Corpus Study
Tomáš Herlík
Dummy subjects in English, Norwegian and German. A parallel corpus study.
Bohumila Chocholoušová
Parallel Corpus from Wikipedia
Adéla Štromajerová
Dummy subjects in English, Norwegian and German. A parallel corpus study.
Bohumila Chocholoušová
Parallel Corpus in Sketch Engine: Creation and Data Mining
Magdaléna VYVIJALOVÁ
Common Translation Errors in Wikipedia Articles: A Corpus-based Study
Adéla Štromajerová
A hybrid approach to parallel text alignment
Adam Obrusník

All theses