Bc. Jan Brichta
Bachelor's thesis
Corpora from reddit.com texts
Corpora from reddit.com texts
Abstract:
Cílem této práce je vyvinout nástroje pro zpracování dat z webové stránky reddit.com do korpusů a ukázat analýzu těchto dat pomocí nástroje Sketch Engine. Ve výsledku bylo z datasetu vytvořeno 10 korpusů, které pokrývjí období od roku 2005 do roku 2023.Abstract:
The purpose of this thesis is to develop tools for processing data from the reddit.com website into text corpora and show analysis of the data with the Sketch Engine. This results in the creation of 10 corpora from dataset that spans from the year 2005 to 2023.
Language used: English
Date on which the thesis was submitted / produced: 23. 5. 2024
Identifier:
https://is.muni.cz/th/nzmup/
Thesis defence
- Date of defence: 28. 6. 2024
- Supervisor: RNDr. Vít Suchomel, Ph.D.
- Reader: RNDr. Ondřej Herman
Full text of thesis
Contents of on-line thesis archive
Published in Theses:- světu
Other ways of accessing the text
Institution archiving the thesis and making it accessible: Masarykova univerzita, Fakulta informatikyMasaryk University
Faculty of InformaticsBachelor programme / field:
Informatics / Informatics
Theses on a related topic
-
Better Web Corpora For Corpus Linguistics And NLP
Vít Suchomel -
Testing Zipf's Law with Corpus linguistics
Ivana Kyselová Košková -
Český Brown Corpus
David Krňávek