Benford Distribution Fitting Support

At Goodness-of-fit Benford Distribution, we describe several methods for determining whether data follows a Benford distribution. On this webpage, we describe various Real Statistics worksheet functions that make it easier to perform these analyses.

First significant digit

FIRST_SIG(x) = first significant digit of the positive numeric value x.

E.g. FIRST_SIG(45.17) = 4 and FIRST_SIG(.00004517) = 4.

Kolmogorov-Smirnov test functions

KS_BSTAT(R1) = the KS statistic for the data in the column array or range R1 that is suspected of following a Benford distribution

KS_BCRIT(n, alpha, interp) = the critical value of the KS distribution for a sample that follows the Benford distribution of size n and significant level alpha (from .01 to .10, default .05) based on the table in Figure 1 of Goodness-of-fit Benford Distribution.

KS_BPROB(x, n, iter, interp, txt) = an approximate p-value for the KS test at x based on a sample of size n based on an interpolation of the values in the table in Figure 1 of Goodness-of-fit Benford Distribution, where iter = the number of iterations (default = 40) to calculate the approximation.

KS_BTEST(R1, lab, alpha): returns a column array containing the KS statistic, p-value, and critical value for the KS test on the data in the column array or range R1 that is suspected of following a Benford distribution; alpha is the significant level (from .01 to .10, default .05).

If interp = TRUE (default) then the recommended interpolation is used; otherwise, linear interpolation is used.

If lab = TRUE (default FALSE) then a column of labels is appended to the output.

Note that the values for α in the table in Figure 1 range from .01 to .10. When txt = FALSE (default), if the p-value is less than .01 then the p-value is given as 0, and if the p-value is greater than .10 (then the p-value is given as 1. When txt = TRUE, then the output instead takes the form “< .01” or “> .10”.

Anderson-Darling test functions

For the Anderson-Darling test, the Real Statistics Resource Pack provides the following worksheet function:

AD_BSTAT(R1) = the Anderson-Darling  statistic for the data in the column array or range R1 that is suspected of following a Benford distribution

AD_BSTAT takes the place of the ANDERSON function for the Benford distribution. The ADCRIT(n, alpha, dist, , interp), ADPROB(x, dist, n, iter, interp, txt), and ADTEST(R1, dist, lab, iter, alpha) functions, described at Anderson-Darling Test, also support the Benford distribution when dist = 13 or “benford”. The critical values and estimated p-values are based on the table in Figure 2 of Goodness-of-fit Benford Distribution.

Examples

For Example 1 of Goodness-of-fit Benford Distribution, KS_BSTAT(RESHAPES(B2:F11)) = .137121, KS_BCRIT(50,.05) = .159298, KS_BPROB(AD11,50,,,TRUE) = “>.1”. Also, KS_BCRIT(25,0.025) = .246041 and KS_BPROB(0.15,50,,,TRUE) = .070353.

For the AD test, we see that AD_BSTAT(RESHAPES(B2:F11)) = 1.162533, ADCRIT(,0.05,13) = 2.304, and ADPROB(P16,”benford”) = .217872.

Finally, the results of the KS and AD tests for Example 1 are displayed in Figure 1.

ADTEST and KS_BTEST output

Figure 1 – ADTEST and KS_BTEST

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Wikipedia (2022) Benford’s law
https://en.wikipedia.org/wiki/Benford%27s_law

Morrow, J. (2010) Benford’s law, families of distributions and a test basis
http://www.johnmorrow.info/projects/benford/benfordMain.pdf

Lesperance, M., Reed, W. J., Stephens, M. A., Tsao, C., Wiltons, B. (2016) Assessing conformance with Benford’s Law: Goodness-of-fit tests and simultaneous confidence intervals. PLoS ONE
https://doi.org/10.1371%2Fjournal.pone.0151235
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4809611/

Leave a Comment