Assessment of the State-of-the-art Benchmarks Used to Evaluate Social Reasoning and Theory of Mind in LLMs – Bc. Lucia Horníková
Bc. Lucia Horníková
Master's thesis
Assessment of the State-of-the-art Benchmarks Used to Evaluate Social Reasoning and Theory of Mind in LLMs
Assessment of the State-of-the-art Benchmarks Used to Evaluate Social Reasoning and Theory of Mind in LLMs
Abstract:
Veľké jazykové modely dnes dosahujú pozoruhodné výsledky v rôznych aplikačných úlohách, čo prirodzene vyvoláva otázku, či disponujú schopnosťou uvažovania. V rámci hľadania odpovede na túto otázku vzniklo viacero hodnotiacich sád (angl. benchmarks), vytvorených odborníkmi alebo formou crowdsourcingu. Predkladaná diplomová práca sa zameriava na tri často využívané sady z oblasti teórie mysle a sociálneho …moreAbstract:
Large language models have demonstrated impressive performance on various downstream tasks, hinting at possible reasoning capabilities. To assess whether these models can reason beyond surface-level inference, various benchmarks have been created, whether collected from human experts or crowdsourced. This thesis focuses on three commonly used state-of-the-art benchmarks for the task of social reasoning …more
Language used: English
Date on which the thesis was submitted / produced: 1. 12. 2025
Identifier:
https://is.muni.cz/th/f0evs/
Thesis defence
- Date of defence: 29. 1. 2026
- Supervisor: Ph.D. Seyed Mahed Mousavi
- Reader: Mgr. Hana Žižková, Ph.D.
Citation record
ISO 690-compliant citation record:
HORNÍKOVÁ, Lucia. \textit{Assessment of the State-of-the-art Benchmarks Used to Evaluate Social Reasoning and Theory of Mind in LLMs}. Online. Master's thesis. Brno: Masaryk University, Faculty of Arts. 2025. Available from: https://theses.cz/id/sbqdrg/.
Full text of thesis
Contents of on-line thesis archive
Published in Theses:- světu
Other ways of accessing the text
Institution archiving the thesis and making it accessible: Masarykova univerzita, Filozofická fakultaMasaryk University
Faculty of ArtsMaster programme / field:
Computational Linguistics / Computational Linguistics