A tool for checking texts extracted from PDF – Bc. Samuel Benko
Bc. Samuel Benko
Bachelor's thesis
A tool for checking texts extracted from PDF
A tool for checking texts extracted from PDF
Abstract:
Táto práca predstavuje vývoj a vyhodnotenie nástroja na extrakciu textu pre súbory PDF so zameraním na udržiavanie kontextu a zvládanie bežných problémov spojených s extrakciou textu PDF, ako je zlučovanie iniciálok, súvislých blokov, hlavičiek a pätiek, tabuliek, viacstĺpcových dokumentov, číslovaných dokumentov. zoznamy a delenie slov. Primárnym cieľom je znížiť ľudské úsilie potrebné na dohľad nad …moreAbstract:
This thesis presents the development and evaluation of a text extraction tool for PDF files, focusing on maintaining context and handling common challenges associated with PDF text extraction, such as merging initials, continuous blocks, headers and footers, tables, multi-column documents, numbered lists, and hyphenation. The primary goal is to reduce human effort required for overseeing extracted …more
Language used: English
Date on which the thesis was submitted / produced: 18. 5. 2023
Identifier:
https://is.muni.cz/th/e233g/
Thesis defence
- Date of defence: 26. 6. 2023
- Supervisor: RNDr. Vít Suchomel, Ph.D.
- Reader: RNDr. Miloš Jakubíček, Ph.D.
Full text of thesis
Contents of on-line thesis archive
Published in Theses:- světu
Other ways of accessing the text
Institution archiving the thesis and making it accessible: Masarykova univerzita, Fakulta informatikyMasaryk University
Faculty of InformaticsBachelor programme / field:
Programming and development / Programming and development
Theses on a related topic
- No theses on a related topic available.