Self-Similarity in Texts: Fabulous Contributions to Computational Narratology


After some time spent to introduce theses of other fellow graduates and researchers, the moment is finally arrived for me to humbly introduce you my master thesis, which have been submitted eighteen months ago.

Aim of my thesis was the reliable measurement of self-similarity in texts. For this goal tools and methods typical of various scientific fields have been borrowed and accurately installed in our context. For the final experiment the Hurst exponent (as self-similarity measure) and the dependency distance (as complexity measure) of some fairy tales is calculated, in order to perform a comparative analysis between the text units (phrase, sentence, word).

The concept of self-similarity plays a very important role in different fields, such as physics, mathematics and figurative arts. For this reason trying to retrieve it in natural language texts compelled me to cope with different questions, such as: How does self-similarity differ from  structures/geometrical shapes to time series? What is the trait d´union between a structure and a sequence? Which order does better convey relationships? What makes a certain episodes order more beautiful than others? To which extent the texts can be handled as realizations of specific schemata? Is there a relationship between the syntax of a language and the structure ruling the development of a basic narrative written in the same
language? If yes, is it possible to retrieve this self-affinity also on deeper linguistic levels?

The automatic detection of self-affinity in texts can be useful to extract automatically mind maps from texts or ideal sequences of structured/mapped data (graphs).
This would imply an enormous impact in the field of  knowledge/information visualization.
Texts are not the only human expressions we can model as time series of information values. In this sense the music represents a very good instance. If both are expressions are reduced to the same meta representation
, the amount of applications and possibilities unleashed can literally open the door “trans-genre” translations.
Furthermore the interpretation of these time series as signals could imply the adoption of other kind of powerful mathematical tools, such us the Fourier transform, in order to decompose them and find if narratemes, even if distant one from another, displaying evident causal relationships, still belong to the “same wave”.
If the self-affinity is by other scientific results confirmed to be a real feature of each literary work, than it is also thinkable to use it in order to determine if undeciphered texts are nevertheless written in a natural language and hence are prone to be translated, as in the case of the Voynich Manuscript.

Click here in order to read the full content of my work (if you know German; if you do not, I suggest you to learn it or to contact me privately for a longer and more detailed summary in English).

Furthermore I highly recommend you to visit our page  Theses and Papers in order to have access to other valuable studies.
Thank you for your attention, have a sweet Monday!


Written by Cosimo Palma

I studied philosophy, historical philosophy, philosophical history and history in the redundant Naples, computational linguistics and informatics in the city of Marx.
Language enthusiast and chess player in my free time, until the end of September I will spend my busy time in the Tower A of the European Parliament in Luxembourg, trying to do my best in the IT as well as in the Communication department of the Terminology unit.