Exploring Semantic Homogeneity in Unlabeled Data Clustering Using Large Language Models

FARES, Bashar

CS SKLog in Log in (EduId)

Theses zn85fp

Exploring Semantic Homogeneity in Unlabeled Data Clustering Using Large Language Models – Bashar FARES

Bashar FARES

Master's thesis

Exploring Semantic Homogeneity in Unlabeled Data Clustering Using Large Language Models

Abstract:

This thesis investigates the topical clustering of unlabeled scientific text, leveraging various pre-trained large language models. The primary focus is on grouping the publication database at Deggendorf Institute of Technology (DIT) according to their main topics.

Abstract:

This thesis investigates the topical clustering of unlabeled scientific text, leveraging various pre-trained large language models. The primary focus is on grouping the publication database at Deggendorf Institute of Technology (DIT) according to their main topics.

Keywords

Transformers Large Language Models Data Clustering Topic Modeling

Language used: English

Date on which the thesis was submitted / produced: 8. 2. 2024

Thesis defence

Supervisor: prof. Dr. Andreas Fischer

Citation record

Cite this text

ISO 690-compliant citation record:

FARES, Bashar. \textit{Exploring Semantic Homogeneity in Unlabeled Data Clustering Using Large Language Models}. Online. Master's thesis. České Budějovice: University of South Bohemia in České Budějovice, Faculty of Science. 2024. Available from: https://theses.cz/id/zn85fp/.

@MastersThesis{FARES2024thesis,
AUTHOR = "FARES, Bashar",
TITLE = "Exploring Semantic Homogeneity in Unlabeled Data Clustering Using Large Language Models [online]",
YEAR = "2024 [cit. 2024-10-15]",
TYPE = "Master's thesis",
SCHOOL = "University of South Bohemia in České Budějovice, Faculty of ScienceČeské Budějovice",
NOTE = "SUPERVISOR: prof. Dr. Andreas Fischer",
URL = "https://theses.cz/id/zn85fp/",
}

@MastersThesis{FARES2024thesis,
AUTHOR = {FARES, Bashar},
TITLE = {Exploring Semantic Homogeneity in Unlabeled Data Clustering Using Large Language Models},
YEAR = {2024},
TYPE = {Master's thesis},
INSTITUTION = {University of South Bohemia in České Budějovice, Faculty of Science},
LOCATION = {České Budějovice},
SUPERVISOR = {prof. Dr. Andreas Fischer},
URL = {https://theses.cz/id/zn85fp/},
URL_DATE = {2024-10-15},
}

{{Citace kvalifikační práce
 | příjmení = FARES
 | jméno = Bashar
 | instituce = University of South Bohemia in České Budějovice, Faculty of Science
 | titul = Exploring Semantic Homogeneity in Unlabeled Data Clustering Using Large Language Models
 | url = https://theses.cz/id/zn85fp/
 | typ práce = Master's thesis
 | vedoucí = prof. Dr. Andreas Fischer
 | rok = 2024
 | počet stran =
 | strany =
 | citace = 2024-10-15
 | poznámka =
 | jazyk = 
}}

The right form of listing the thesis as a source quoted

FARES, Bashar. Exploring Semantic Homogeneity in Unlabeled Data Clustering Using Large Language Models. České Budějovice, 2024. diplomová práce (Mgr.). JIHOČESKÁ UNIVERZITA V ČESKÝCH BUDĚJOVICÍCH. Přírodovědecká fakulta

Full text of thesis

Contents of on-line thesis archive

Published in Theses:

světu

Other ways of accessing the text

Institution archiving the thesis and making it accessible: JIHOČESKÁ UNIVERZITA V ČESKÝCH BUDĚJOVICÍCH, Přírodovědecká fakulta

Reference to the local database file of the institution

UNIVERSITY OF SOUTH BOHEMIA IN ČESKÉ BUDĚJOVICE

Faculty of Science

Master programme / field:
Artificial Intelligence and Data Science / Artificial Intelligence and Data Science

Theses on a related topic

Large Language Models (LLMs): Examining the quality of generated text with task specific data
Michal Caninec
Large Language Models as a tool for generating high-level features for text documents
Vojtěch Balek
Developing a Cybersecurity Domain Chatbot based on an Open Source Large Language Model
Shahrukh Azhar AHSAN
Think Twice Before You Answer: Mitigating Biases of Question Answering Models
Lukáš Mikula

Name

Posted by

Uploaded/Created

Rights

Theses zn85fp zn85fp/2

8/2/2024

Folders

Files

thesis Thesis_-_Bashar_Fares.pdf

Bulánová, L.

9/2/2024

Co je jinak přidání souboru

Soubor nebo složku lze nahrát pomocí tlačítka Přidat.
Co je jinak další operace se soubory

Podrobnosti lze zjistit označením příslušného řádku.
Co je jinak pohled pro experty

Pro častou práci je možné zvolit režim Více možností.
Co je nové vyhledávání souborů

Vyhledávaný výraz můžete zadat přímo do adresního řádku.
Co je nové rychlý přístup k souborům

Pomocí funkce Nedávné je možné se rychle vrátit k právě prohlíženým souborům. Oblíbené soubory je také možné označit Hvězdičkou.
Co se chystá

Připravujeme další vylepšení pro mobilní zařízení.

Exploring Semantic Homogeneity in Unlabeled Data Clustering Using Large Language Models – Bashar FARES

Bashar FARES

Master's thesis

Exploring Semantic Homogeneity in Unlabeled Data Clustering Using Large Language Models

Abstract:

Abstract:

Keywords

Thesis defence

Citation record

ISO 690-compliant citation record:

The right form of listing the thesis as a source quoted

Full text of thesis

Contents of on-line thesis archive

Other ways of accessing the text

UNIVERSITY OF SOUTH BOHEMIA IN ČESKÉ BUDĚJOVICE

Theses on a related topic

Folders

Files

Co je jinak přidání souboru

Co je jinak další operace se soubory

Co je jinak pohled pro experty

Co je nové vyhledávání souborů

Co je nové rychlý přístup k souborům

Co se chystá