Zipf’s (basic) law states that, across a corpus of natural language, the frequency of any word in that corpus is inversely proportional to its rank in the frequency table.
So the most frequent word, ranking first in the frequency table, sets the frequency for all the other, less frequent words. The second most frequent word is half (1/2) as common as that, the third is one-third (1/3) as common, and so on. This is readily seen in the two graphs, the first of which uses normal linear scale in its axes, and the second uses logarithmic scales, which transforms the curve into a straight line.


source: https://eclecticlight.co/2015/07/11/zipfs-law-deep-and-meaningful/
"Zipf's law, the rank vs. frequency rule, also works if you apply it to the sizes of cities. The city with the largest population in any country is generally twice as large as the next-biggest, and so on. Incredibly, Zipf's law for cities has held true for every country in the world, for the past century."
source: https://io9.gizmodo.com/the-mysterious-law-that-governs-the-size-of-your-city-1479244159
Zipf's law is close even for the largest US states (in square miles).
1 |
Alaska | 570,641 | |
2 |
Texas | 261,914 | 2.18 |
3 |
California | 155,973 | 3.66 |
4 |
Montana | 145,556 | 3.92 |
5 |
New Mexico | 121,365 | 4.70 |
6 |
Arizona | 113,642 | 5.02 |
Additional: https://en.wikipedia.org/wiki/Zipf%27s_law
No comments:
Post a Comment