Searching for Similar Documents Individually
- 1.How does the search for similar documents work in general?The contents of files stored in the Theses.cz data repository that have an available plain-text version are continuously processed by automated analysis. Any thesis you open in the system can be compared with other files that contain similar text:
- Click the title of the thesis to display basic information about the document,
- scroll to the bottom of the page and click the row with the full-text file (right-click opens a details panel on the right; left-click opens a context menu),
- use the operation “Find similar documents” (the icon showing two eggs “similar as two peas in a pod”),
- use the displayed similarities (or adjust the settings and let the system recalculate the similarities) to determine whether the document may be plagiarized or not.
1 The “Find similar documents” operation.
For each similar document found, Theses.cz displays the degree of similarity in percent. Each document includes a graphical interpretation of similarities in the form of a linear or 2D similarity map:
1 Interactive linear map with clickable navigation and display of the corresponding section in the document.
2 List of source documents to which the document is similar. Each shows the similarity percentage and its linear similarity map.
3 The magnifying-glass button allows you to compare the selected document directly with this one.
4 With the “minus” button, you can remove a source document from the similarity calculation if it is not relevant (e.g. the student was expected to cite it properly).
5 Clicking a highlighted passage in the text displays the documents with which that passage is similar. Both the passage and the selected documents are highlighted.
6 A clear overview of displayed, skipped, and excluded documents, including explanations.
7 Enable 2D similarity maps.
8 Option to change sensitivity from standard to high.
The user can smoothly switch between detected similarities and the detailed comparison of two documents using the magnifying-glass icon next to the selected similar document. Any similar document can be viewed immediately without losing the context of the controlled document. A two-document comparison appears as follows:
1 The controlled document and the similar document displayed side by side.
2 Return to the list of detected similarities.
3 If the source document is not public, words and sections without similarity are covered by a grey overlay.
4 The user can display a 2D similarity map instead of the linear similarity map.
Files are searched across the entire system, regardless of the institution to which the thesis belongs.
- 2.What does “Dokument asi není správně porovnán” mean?The comparison is based on the plain-text version of the file. In this text version, the system did not find a sufficient amount of usable text for comparison. Check the text version of the file (extension .txt) to see whether it was generated correctly. If the text version appears incorrect, you may ask the file uploader to verify (e.g. in Word) that the document can be exported to plain text, or to re-upload a corrected version. HTML files can be problematic, as they often contain many non-text elements (formatting code, etc.) that complicate extraction. You may also report the problematic file to the administrators at theses@fi.muni.cz.
- 3.What does the “celková podobnost” value represent?
This value indicates the overall percentage of similarity with all detected similar documents.
A detected similarity does not necessarily mean that either document is plagiarized. Every similarity must be evaluated by an expert in the given topic. There is no percentage threshold at which a document can automatically be considered plagiarism.
- 4.The system found similarities with my work — what does this mean?
Authors can verify whether their text has been used in another author’s work.
A similarity detected by the system between your work and another document in the database does not necessarily mean that the work is plagiarized. Each case (similarity) must be assessed individually, including checking citations, context, and discipline standards. There is no universal percentage above which a work can automatically be considered plagiarism.
What the similarity search interface displays and how to use it:
For each similar document found, IS MU shows the percentage of similarity. Each document includes a graphical representation of similarities in the form of a linear or a 2D similarity map:
1 Interactive linear map with clickable navigation showing the corresponding location in the document.
2 List of source documents similar to the controlled document, each with the similarity percentage and its linear similarity map.
3 The magnifying glass button allows detailed comparison of the selected document with the current one.
4 Using the “minus” button, you can exclude a source document that is not relevant to the similarity assessment (e.g., one that the student was expected to cite properly).
5 Clicking a selected passage in the text displays the documents with which that passage is similar; the passage and documents are highlighted.
6 A clear overview of displayed, skipped, and excluded documents, including explanations.
7 Enable 2D similarity maps.
8 Option to change sensitivity from standard to high.
The user can smoothly switch between detected similarities and the detailed comparison of two documents using the magnifying-glass button next to any selected document. They can immediately open any similar document without losing track of the controlled document. Two-document comparison is displayed as follows:
1 The controlled document and the similar document displayed side by side.
2 Return to the list of detected similarities.
3 For non-public documents, words and text sections without similarity are covered with a grey overlay.
4 The user may display a 2D similarity map instead of the linear similarity map.
- 5.How does the search algorithm work?
The system performs document-to-document comparison:
- A searchable plain-text version is generated for every document in the database. The algorithm analyses this text version, focusing on similar or paraphrased passages, and evaluates the degree of similarity across the entire shared database of documents, including online sources.
- Texts in Czech, English, and Slovak are compared, provided that they contain at least a few sentences or paragraphs (very small files do not contain enough text for analysis and similarity detection).
- Before the result is presented to the user, documents that overlap only in passages identical to those already found in earlier sources are skipped. In practice, this filters out—for example—hundreds of theses and online documents that all cite the same law. If fewer than 10 such similar sources exist, all are shown for clarity without skipping.
- The user is shown the most relevant documents that demonstrate significant similarity with the analysed document, along with the percentage of similarity.
As a warning mechanism for students, it is important to remember that submitted theses are archived in Theses and may be re-examined repeatedly— for example, at any time in the future using an improved version of the algorithm. The time saved by copying may later result in a great deal of effort spent repairing your own reputation. The IS MU developers continue to improve the algorithm, and the database of searchable documents is constantly expanding with new sources. What the systems do not detect today may be discovered tomorrow.
- 6.How can I compare two documents with each other?
The “Similar Documents” application now naturally includes a function for comparing two selected documents. In the list of similar documents, you can use the button with the magnifying-glass icon to display the selected document alongside the controlled document for comparison. The correlation between similar passages in both documents is displayed graphically at points where similarities were detected, including the degree of similarity of each passage.
1 The magnifying-glass button allows detailed comparison of the selected document with the controlled one.
2 Indicator that the document is non-public, in which case only the overlapping sections are shown.
The application is functional even if the user does not have access to the source document. In such cases, only the beginnings of the similar words from the source document are shown, while the remaining text is intentionally obscured. This serves as guidance for a rough assessment of how serious the textual similarity is.
1 The controlled and the similar document displayed side by side.
2 Return to the list of detected similarities.
3 In the case of non-public documents, only common sections are displayed.
4 The user may also display a 2D similarity map (instead of the linear similarity map).
The application contains several graphical elements designed to simplify navigation through the detected similarities. Colors and the fill level of circles in different parts of the interface indicate the degree of similarity of a passage:
1 Almost verbatim match.
2 Only minor differences.
3 Partially similar passage.
4 Significantly rephrased passage.
5 Low degree of similarity; consider only in connection with surrounding passages.
Both the compared document and the source document are displayed in a similar fashion. The text is divided into smaller parts—approximately paragraph-sized—and two consecutive paragraphs of the compared document are checked against three paragraphs of the source document.
1 The timestamp of the document change helps determine which document is older.
2 The linear map of the document shows the occurrences of similar passages. If one location corresponds to multiple areas in the opposite document, the color of the strongest similarity is used.
3 Color-coded similarity indicators. Multiple circles may appear in one place because a single location may correspond to multiple parts of the opposite document.
4 The blue-highlighted block indicates selected paragraphs in both documents that are similar to each other.
- 7.How do I read the 2D map when comparing similarities?
The 2D map visually displays the locations of similar passages in both documents.
1 The horizontal axis represents the compared document, written from left to right.
2 The vertical axis represents the source document, written from bottom to top.
3 Colored dots indicate similar passages in both documents; the color represents the degree of similarity. Clicking a dot highlights the corresponding passages in both documents.
For example, similarities between the concluding sections of both documents (typically the bibliography) may appear in the top-right corner.
The 2D map provides a quick overview of the nature and distribution of similar passages. Several common patterns include:
- Strong diagonal
- A strong red diagonal indicates a long continuous segment of text shared by both documents.
This example shows a dissertation composed of several articles, one of which serves as the source document. Additional similarities reflect longer established phrases typical of the field, used in other articles included in the dissertation. Note: the slope of the diagonal varies depending on the relative lengths of the documents.
- Author’s introductory statement
A short similarity in the bottom-left corner usually corresponds to standardized introductory acknowledgements or declarations written similarly across documents within the same institution.
- Bibliography
The cluster of dots in the top-right part represents longer phrases—citations in the bibliography. This usually indicates that both works draw on similar sources, not plagiarism. The more or less diagonal direction suggests that the references appear in a similar order, sorted according to the same criterion (e.g., year of publication).
- Several dots aligned vertically
This indicates multiple similarities between one passage of the compared document and several locations in the source document (or horizontally, vice versa). Typically, this corresponds to longer standard phrasing, definitions, or commonly used formulations in the field. Usually not a cause for concern.
- Weak similarity
If no strong clusters or diagonals appear, and similarities are mostly yellow or light orange, the resemblance is likely due to standard phrasing or a shared topic. This generally does not indicate plagiarism.
- Rephrased text
This is an actual example of plagiarism: the entire work is written as a strong rephrasing of another text. Red dots are nearly absent, but a wavy diagonal line indicates similarity across nearly the whole document. Gaps in the diagonal may mean areas where the system did not detect a similarity— or that those parts were copied from other documents.
Dots near the top edge represent similarities in the bibliography: the source document cites all references at the end, while the plagiarized document cites progressively throughout (e.g., using footnotes).
- Copied chapter
This shows the same document as above; however, the source document is now a Wikipedia article. The plagiarized work uses the same structure but is heavily rephrased with omitted parts. Because the documents differ greatly in length, the diagonal is almost vertical. The copied portion spans less than two A4 pages—still large enough to be significant.
- Introduction of the thesis
In many fields, the introduction of a final thesis summarizes existing knowledge. Here, two secondary-school graduation theses of similar length from the same school clearly drew upon the same sources (or on each other) for the introductory section. The second part of the documents (right or top) shows no similarity, suggesting that the remainder—the “original work” section— may indeed be original.
Provided you have failed to find the information you were searching for, you can contact us at
fi
muni
cz














