Names of places in Croatia
January 19, 2022 at 4:44 am
(This post was last modified: December 2, 2022 at 4:09 am by BrianSoddingBoru4.)
I have tried to support my alternative interpretation of the names of places in Croatia mathematically:
To summarize, I think that I have thought of a way to measure the collision entropy of the different parts of the grammar. The entropy of the syntax can obviously be measured by measuring the entropy of spell-checker word list such as that of Aspell and subtracting from that an entropy of a long text in the same language. I got that, for example, the entropy of the syntax of the Croatian language is log2(14)-log2(13)=0.107 bits per symbol (symbol here signifies a consonant, as that is useful for the problem I am trying to solve), that the entropy of the syntax of the English language is log2(13)-log2(11)=0.241 bits peer symbol, and that the entropy of the syntax of the German language is log2(15)-log2(12)=0.3219 bits per symbol. It was rather surprising to me that the entropy of the syntax of the German language is larger than the entropy of the syntax of the English language, given that German syntax seems simpler (it uses morphology more than the English language does, somewhat simplifying the syntax), but you cannot argue with the hard data. The entropy of the phonotactics of a language can, I guess, be measured by measuring the entropy of consonant pairs (with or without a vowel inside them) in a spell-checker wordlist, then measuring the entropy of single consonants in that same wordlist, and then subtracting the former from the latter multiplied by two. I measured that the entropy of phonotactics of the Croatian language is 2*log2(14)-5.992=1.623 bits per consonant pair. Now, I have taken the entropy of the phonotactics to be the lower bound of the entropy of the phonology, that is the only entropy that matters in ancient toponyms (entropy of the syntax and morphology do not matter then, because the toponym is created in a foreign language). Given that the Croatian language has 26 consonants, the upper bound of the entropy of morphology, which does not matter when dealing with ancient toponyms, can be estimated as log2(26*26)-1.623-2*0.107-5.992=1.572 bits per pair of consonants. So, to estimate the p-value of the pattern that many names of rivers in Croatia begin with the consonants 'k' and 'r' (Karašica, Krka, Korana, Krbavica, Krapina and Kravarščica), I have done some birthday calculations, first setting the simulated entropy of phonology to be 1.623 bits per consonant pair, and the second by setting the simulated entropy of phonology to be 1.623+1.572=3.195 bits per consonant pair. The former gave me the probability of that k-r-pattern occuring by chance to be 1/300 and the latter gave me the probability 1/17. So the p-value of that k-r-pattern is somewhere between 1/300 and 1/17. So I concluded that the simplest explanation is that the river names Karašica, Krka, Korana, Krbavica, Krapina and Kravarščica are related and all come from the Indo-European root *kjers meaning horse (in Germanic languages) or to run (in Celtic and Italic languages). Do those arguments sound compelling to you?
I don't know what to think. My mathematics professor Tomislav Rudec tells me my arguments sound compelling to him. My informatics professor Franjo Jović tells me my arguments sound very interesting and could be right. Yet, Dubravka Ivšić, who is arguably the greatest expert in this part of linguistics, tells me my arguments are not compelling or even interesting (this is what she told me when I asked her via e-mail:
Also, what do you think, is it likely that this k-r root meaning "to flow" in the Croatian toponyms is not etymological root, but rather a phonosemantic root? I don't know which claim is more extraordinary. If I claim it is an etymological root, then I am saying the mainstream interpretation of the Croatian toponyms is wildly wrong (which I am not sure is such an extraordinary claim, as the mainstream interpretation of Croatian toponyms seems to be based on groupthink more than evidence). If I claim it is a phonosemantic root, then mainstream etymology of Croatian toponyms can still be correct, but then I am saying some weak form of the phonosemantic hypothesis is correct. As far as I understand, the vast majority of linguists would consider even a weak form of the phonosemantic hypothesis a very extraordinary claim.
Administrator Notice
Link removed. Advertising.
(I am sorry that it is in Croatian, but it would take me a lot of time to translate that to English. If you cannot understand something there using Google Translate, I am willing to help you with that, but I am not willing to translate the whole text to English.)Link removed. Advertising.
To summarize, I think that I have thought of a way to measure the collision entropy of the different parts of the grammar. The entropy of the syntax can obviously be measured by measuring the entropy of spell-checker word list such as that of Aspell and subtracting from that an entropy of a long text in the same language. I got that, for example, the entropy of the syntax of the Croatian language is log2(14)-log2(13)=0.107 bits per symbol (symbol here signifies a consonant, as that is useful for the problem I am trying to solve), that the entropy of the syntax of the English language is log2(13)-log2(11)=0.241 bits peer symbol, and that the entropy of the syntax of the German language is log2(15)-log2(12)=0.3219 bits per symbol. It was rather surprising to me that the entropy of the syntax of the German language is larger than the entropy of the syntax of the English language, given that German syntax seems simpler (it uses morphology more than the English language does, somewhat simplifying the syntax), but you cannot argue with the hard data. The entropy of the phonotactics of a language can, I guess, be measured by measuring the entropy of consonant pairs (with or without a vowel inside them) in a spell-checker wordlist, then measuring the entropy of single consonants in that same wordlist, and then subtracting the former from the latter multiplied by two. I measured that the entropy of phonotactics of the Croatian language is 2*log2(14)-5.992=1.623 bits per consonant pair. Now, I have taken the entropy of the phonotactics to be the lower bound of the entropy of the phonology, that is the only entropy that matters in ancient toponyms (entropy of the syntax and morphology do not matter then, because the toponym is created in a foreign language). Given that the Croatian language has 26 consonants, the upper bound of the entropy of morphology, which does not matter when dealing with ancient toponyms, can be estimated as log2(26*26)-1.623-2*0.107-5.992=1.572 bits per pair of consonants. So, to estimate the p-value of the pattern that many names of rivers in Croatia begin with the consonants 'k' and 'r' (Karašica, Krka, Korana, Krbavica, Krapina and Kravarščica), I have done some birthday calculations, first setting the simulated entropy of phonology to be 1.623 bits per consonant pair, and the second by setting the simulated entropy of phonology to be 1.623+1.572=3.195 bits per consonant pair. The former gave me the probability of that k-r-pattern occuring by chance to be 1/300 and the latter gave me the probability 1/17. So the p-value of that k-r-pattern is somewhere between 1/300 and 1/17. So I concluded that the simplest explanation is that the river names Karašica, Krka, Korana, Krbavica, Krapina and Kravarščica are related and all come from the Indo-European root *kjers meaning horse (in Germanic languages) or to run (in Celtic and Italic languages). Do those arguments sound compelling to you?
I don't know what to think. My mathematics professor Tomislav Rudec tells me my arguments sound compelling to him. My informatics professor Franjo Jović tells me my arguments sound very interesting and could be right. Yet, Dubravka Ivšić, who is arguably the greatest expert in this part of linguistics, tells me my arguments are not compelling or even interesting (this is what she told me when I asked her via e-mail:
Administrator Notice
Link removed. Advertising.
Link removed. Advertising.
Also, what do you think, is it likely that this k-r root meaning "to flow" in the Croatian toponyms is not etymological root, but rather a phonosemantic root? I don't know which claim is more extraordinary. If I claim it is an etymological root, then I am saying the mainstream interpretation of the Croatian toponyms is wildly wrong (which I am not sure is such an extraordinary claim, as the mainstream interpretation of Croatian toponyms seems to be based on groupthink more than evidence). If I claim it is a phonosemantic root, then mainstream etymology of Croatian toponyms can still be correct, but then I am saying some weak form of the phonosemantic hypothesis is correct. As far as I understand, the vast majority of linguists would consider even a weak form of the phonosemantic hypothesis a very extraordinary claim.