R for Sutta Science

This is a very cool plot! It would be very instructive to plot these on a log-log scale. The distributions for MN, AN, and SN look like exponentials, whereas DN does not appear to be a pure exponential.
It would be interesting to see if they follow “power-law” distribution and not exponential. Power law distribution means there is always a finite probability of very unlikely events (like earthquakes). On a log-log scale, if the distribution looks like a straight line (negative slope), then the slope will give power of the power law, i.e. if the distribution is y = x^a, then ln y = a ln x. Slope of the line will be “a”.
If the distribution is exponential, y= e^x, then ln y = x, slope of 1.
Power law distributions have many fascinating and weird properties.

Edit: I did some searches, and word frequency distribution is a famous power law distribution called Zipf’s Law. Zipf's law - Wikipedia
Here is a review paper on the subject:
nihms579165.pdf (1.4 MB)

2 Likes

One more interesting paper and analysis, in case anyone was interested:
wordfreq-eng-german.pdf (2.1 MB)

2 Likes