Benford’s law describes the relative frequency distribution for leading digits of numbers in datasets. Leading digits with smaller values occur more frequently than larger values. This law states that approximately 30% of numbers start with a 1 while less than 5% start with a 9. According to this law, leading 1s appear 6.5 times as often as leading 9s! Benford’s law is also known as the First Digit Law.
If leading digits 1 – 9 had an equal probability, they’d each occur 11.1% of the time. However, that is not true in many datasets. The graph displays the distribution of leading digits according to Benford’s law.
Analysis of datasets shows that many follow Benford’s law. For example, analysts have found that stock prices, population numbers, death rates, sports statistics, financial and tax information, and billing amounts often have leading digits that follow this distribution.
Uses for Benford’s Law
Analysts have used it extensively to look for fraud and manipulation in financial records, tax returns, applications, and decision-making documents. They compare the distribution of leading digits in these datasets to Benford’s law. When the leading digits don’t follow the distribution, it’s a red flag for fraud in some datasets.
When Does Benford’s Law Apply and Not Apply
Benford’s law generally applies to data that fit some of the following guidelines:
- Quantitative data.
- Data that are measured rather than assigned.
- Ranges over orders of magnitudes.
- Not artificially restricted by minimums or maximums.
- Mixed populations.
- Larger datasets are better.
Elaborations on Guidelines
Benford’s law often does not apply to assigned numbers, such as ID numbers, phone numbers, and zip codes.
It works best for data that range over multiple orders of magnitudes from very low to very high. You can cover the 10s, 100s, 1000s, and so on. For example, population and incomes can range from very low to very high.
Conversely, if the range of values is restricted, it affects the leading digits, and Benford’s law is less likely to apply. For example, human characteristics naturally fall into restricted ranges. Consequently, this distribution doesn’t apply to human ages, heights and weights. Similarly, limits imposed on potential values can also invalidate this law. Awards in small claims courts have an upper limit, which can negate Benford’s law.
Interestingly, mathematicians have proven that numbers from mixed populations follow Benford’s law. Mixed populations are things like all numbers pulled from a magazine issue. Obviously, those numbers will represent various topics and types of values. Benford himself did that with Reader’s Digest and newspapers. You can also combine data from different sources to achieve the same effect.
Like all distributions, larger datasets will produce observed relative frequencies that more closely approximate the theoretical values of Benford’s law. Smaller datasets can create relatively large deviations due to random error. Some analysts say datasets as small as 100 are acceptable, but most think a minimum size of 500 or even 1,000 is necessary.
Curiously, it will work in some cases where it should not. For example, it applies to house numbers even though those are assigned.
Benford’s Law Formula
Benford’s law formula is the following:
Where d = the values of the leading digits from 1 to 9.The formula calculates the probability for each leading digit. The table below displays the probabilities that Benford’s law formula calculates for all digits.
Digit Probability
1 30.1%
2 17.6%
3 12.5%
4 9.7%
5 7.9%
6 6.7%
7 5.8%
8 5.1%
9 4.6%
source: https://statisticsbyjim.com/probability/benfords-law/
No comments:
Post a Comment