Klasifikace dokumentů v textových korpusech

Suchomel, Vít

CS SKLog in Log in (EduId)

Theses buir8w

Klasifikace dokumentů v textových korpusech – Bc. Vít Suchomel

Zpět na vyhledávání

Bc. Vít Suchomel

Master's thesis

Klasifikace dokumentů v textových korpusech

Document classification in text corpora

Abstract:

Diplomová práce má dva základní cíle. Prvním cílem je vytvoření „modelového korpusu“ webových textů o velikosti 100 milionů slov. Korpus bude vhodně složen z různých typů textů (např. novinové články, blogy, volně dostupná próza) zastoupených v přesně stanovených množstvích. Druhým cílem práce je vytvoření klasifikátoru jednotlivých typů textů pomocí metod strojového učení. Student se seznámí s několika …more

Abstract:

There are two aims of this diploma thesis. The first objective is to create a "model corpus" of web texts containing 100 million words. The corpus has to be consisted of several types of texts (e. g. newspaper articles, blogs, prose available online) in strictly defined quantities. The second objective of the thesis is to create a classifier of the respective text types using machine learning methods …more

Keywords

Klasifikace dokumentů strojové učení korpus klasifikace klasifikátor SVM předzpracování

Language used: Czech

Date on which the thesis was submitted / produced: 11. 1. 2010

Identifier: https://is.muni.cz/th/wv40x/

Thesis defence

Date of defence: 10. 2. 2010
Supervisor: RNDr. Jan Pomikálek, Ph.D.

Citation record

Cite this text

ISO 690-compliant citation record:

SUCHOMEL, Vít. \textit{Klasifikace dokumentů v textových korpusech}. Online. Master's thesis. Brno: Masaryk University, Faculty of Informatics. 2010. Available from: https://theses.cz/id/buir8w/.

{{Citace kvalifikační práce
 | příjmení = Suchomel
 | jméno = Vít
 | instituce = Masaryk University, Faculty of Informatics
 | titul = Klasifikace dokumentů v textových korpusech
 | url = https://theses.cz/id/buir8w/
 | typ práce = Master's thesis
 | vedoucí = RNDr. Jan Pomikálek, Ph.D.
 | rok = 2010
 | počet stran =
 | strany =
 | citace = 2024-05-28
 | poznámka =
 | jazyk = 
}}

Full text of thesis

Contents of on-line thesis archive

Published in Theses:

světu

Other ways of accessing the text

Institution archiving the thesis and making it accessible: Masarykova univerzita, Fakulta informatiky

Reference to the local database directory of the institution

Masaryk University

Faculty of Informatics

Master programme / field:
Informatics / Artificial Intelligence and Natural Language Processing

Theses on a related topic

Klasifikace dokumentů s částečnou informací od učitele
Ondřej MACEK
Automatická klasifikace vícejazyčných dokumentů
Ladislav HLOM
Detekce obsazenosti parkovacích míst pomocí algoritmu strojového učení bez učitele
Václav Bilský
Detekce hlasivkových pulsů v řečovém signálu pomocí strojového učení
Michal VRAŠTIL
Tvorba korpusu novinových titulků a jeho analýza
Pavlína Sedlářová