Automatické určení jazyka dokumentu založené na zpracování slabik

Hegr, Jan

CS SKLog in Log in (EduId)

Theses ek4ui7

Automatické určení jazyka dokumentu založené na zpracování slabik – Bc. Jan Hegr

Bc. Jan Hegr

Master's thesis

Automatické určení jazyka dokumentu založené na zpracování slabik

Syllable based automatic language identification of documents

Abstract:

V teoretické části práce jsou diskutovány metody určení jazyka dokumentu, dále pak jsou detailně vysvětleny dva přístupy k tomuto problému. První spočívá v porovnávání profilů frekvencí výskytů N-gramů znaků, druhý využívá aproximaci Markovského řetězu k určení pravděpodobnosti výskytů N-gramů slabik. Proto je zde věnován prostor i problému dělení slov na slabiky. V praktické části je pak popsána implementace …more

Abstract:

First part of this thesis focuses on methods of automatic language identification, especially those using N-gram statistics and Markov chain models. Language independent syllabification algorithms are also discussed. The next part deals with an implementation of the secondly mentioned algorithm for language identification. On top of that, the algorithm was extended to be able to decide that the language …more

Keywords

určení jazyka kategorizace textů N-gram slabika Markovský řetěz

Language used: Czech

Date on which the thesis was submitted / produced: 10. 1. 2011

Identifier: https://is.muni.cz/th/a8yts/

Thesis defence

Date of defence: 8. 2. 2011
Supervisor: doc. Mgr. Pavel Rychlý, Ph.D.
Reader: RNDr. Radim Řehůřek, Ph.D.

Citation record

Cite this text

ISO 690-compliant citation record:

HEGR, Jan. \textit{Automatické určení jazyka dokumentu založené na zpracování slabik}. Online. Master's thesis. Brno: Masaryk University, Faculty of Informatics. 2011. Available from: https://theses.cz/id/ek4ui7/.

{{Citace kvalifikační práce
 | příjmení = Hegr
 | jméno = Jan
 | instituce = Masaryk University, Faculty of Informatics
 | titul = Automatické určení jazyka dokumentu založené na zpracování slabik
 | url = https://theses.cz/id/ek4ui7/
 | typ práce = Master's thesis
 | vedoucí = doc. Mgr. Pavel Rychlý, Ph.D.
 | rok = 2011
 | počet stran =
 | strany =
 | citace = 2024-09-29
 | poznámka =
 | jazyk = 
}}

Full text of thesis

Contents of on-line thesis archive

Published in Theses:

světu

Other ways of accessing the text

Institution archiving the thesis and making it accessible: Masarykova univerzita, Fakulta informatiky

Reference to the local database directory of the institution

Masaryk University

Faculty of Informatics

Master programme / field:
Informatics / Informatics

Theses on a related topic

No theses on a related topic available.