Kolmogorov Distribution

Basic Concepts

For n sufficiently large, the values of D_n,α (two-tailed) (see Kolmogorov-Smirnov Test) are approximately equal to the inverse of the Kolmogorov distribution divided by the square root of n. This holds even for values of α not found in the Kolmogorov-Smirnov Table. The cdf of the Kolmogorov distribution has the value

For sufficiently large n, $\sqrt{n}$ D_n,α approximately follows the Kolmogorov distribution.

Furthermore, for any value of n, the critical value D_n,α is approximately equal to

where D_α is the critical value of the Kolmogorov distribution. E.g. for α = .05, (i.e. F(1.3581) = .95 where F(x) is the cdf of the Kolmogorov distribution, as described above. Thus, for example

Critical value example

Worksheet Functions

Real Statistics Functions: The Real Statistics Resource Pack supplies the following functions:

KDIST(x, iter) = the value of the Kolmogorov distribution function F(x) where iter = the # of iterations used in calculating the infinite sum (default = 50).

KINV(p, iter0, iter) = the inverse of KDIST; i.e. KINV(p, iter) = x where 1 − KDIST(x, iter) = p. The inverse function is calculated from KDIST using iter0 iterations (default 40).

x takes values between 1.0 and 2.4 and p takes values between 0.0000198590086116779 and 0.269999671677355.

Observation: Based on the previous observation, for n sufficiently large, D_n,α = KINV(α)/SQRT(n), which yields the same result as the last line in the Kolmogorov-Smirnov Table (although with greater accuracy), and so

D_n,α = KINV(α)/(SQRT(n)+.12+.11/SQRT(n))

More Worksheet Functions

To avoid having to handle the SQRT(n) terms in the above expression, you can instead use the following Real Statistics functions for samples that are sufficiently large:

Real Statistics Functions: The Real Statistics Resource Pack supplies the following functions:

KSDIST(x, n) = the p-value of the one-sample Kolmogorov-Smirnov test at x for samples of size n

KSINV(p, n) = the critical value at p of the one-sample Kolmogorov-Smirnov test for samples of size n

Actually, the first of these functions takes the form KSDIST(x, n, b, iter) and the second takes the form KSINV(p, n, b, iter0, iter), where

KSDIST(x, n, , TRUE, iter) = 1-KDIST(x*(SQRT(n) +0.12+0.11/SQRT(n)), iter)

KSDIST(x, n, , FALSE, iter) = 1-KDIST(x*SQRT(n), iter)

KSINV(p, n, , TRUE, iter0, iter) = KINV(p, iter0, iter)/(SQRT(n)+.12+.11/SQRT(n))

KSINV(p, n, , FALSE, iter0, iter) = KINV(p, iter0, iter)/SQRT(n)

b = TRUE (default) works better for small values of n, but when b = FALSE then it is assumed that n is sufficiently large so that the approximation described previously is sufficient. Note that the third argument in the above functions is used for the two-sample Kolmogorov-Smirnov test.

Observations

For Example 1 of Kolmogorov-Smirnov Test, where the sample size n = 1,000, we have

D_1000,.05 = KSINV(.05, 1000) = KINV(.05)/(SQRT(1000)+.12+.11/SQRT(1000)) = .04278

which is pretty close to the value shown in cell G16 of Figure 3 of Kolmogorov-Smirnov Test. Referring to this figure, we can also calculate the p-value as follows:

p-value = KSDIST(G15,B14) = 1–KDIST(0.011706*SQRT(1000)) = .999167

Note too that D_10,.05 = KSINV(.05, 10) = .4094 which yields the value calculated above via the formula =KINV(.05)/(SQRT(10)+.12+.11/SQRT(10)).

Reference

Ferguson, T. (2008) The Kolmogorov distribution
No longer available online

Wicklin, R. (2020) Kolmogorov D distribution and exact critical values
https://blogs.sas.com/content/iml/2020/06/24/kolmogorov-d-distribution-exact.html

Dimitrova, D. S., Kaishev, V. K. and Tan, S. (2017). Computing the Kolmogorov-
Smirnov distribution when the underlying cdf is purely discrete, mixed or continuous
https://www.jstatsoft.org/article/view/v095i10

18 thoughts on “Kolmogorov Distribution”

Rashmi

March 11, 2020 at 7:36 am

Hi Charles
I need a favour from your side..
I have to compare two dataset of different Metrices(don’t have common metrices) to find out the common metric among them. But i am not able to do that.Please help me to do that
Reply
- Charles
  
  March 11, 2020 at 9:08 am
  
  Hello Rashmi,
  You can use the two-sample Anderson-Darling test to determine whether the two data sets come from a common population. See
  Two-sample Anderson Darling Test
  Charles
  Reply
Steffen Hoernig

January 8, 2018 at 11:32 am

Dear Charles,

first my thanks for your very useful work and great helpfulness.
I have one small request: Since many of your functions are approximations, it would be very helpful if you could provide, alongside the definition, an indiction of the set of arguments over which they are valid. As an example, I have been plotting the KDIST function to understand it better, and found that it explodes below x = 0.05 (while it behaves just fine for x > 0.05). I suppose KDIST was not meant to be used for such small arguments, but there is no indication of this on the site.
Similarly, since the set of images of KINV is [1;2.4], the relationship “KINV(p, m) = x where 1 − KDIST(x, m) = p” holds only for x in [1;2.4]; outside this interval the expression KINV(1-KDIST(x)) returns nonsensical results. Turning this around, the expression “1-KDIST(KINV(p))” only works correctly for p in [0;0.27], which leads me to the conclusion that KINV should indeed only be used for arguments smaller than 0.27 (i.e. the inverse for the left branch of the distribution function is not modeled) – but again I have found no indication of this on the site.
Thanks!
Reply
- Charles
  
  January 13, 2018 at 6:08 pm
  
  Steffen,
  Good suggestion. I have just added the correct limits to the webpage.
  Thanks for helping to make the website more easy to understand.
  Charles
  Reply
Julio Moreno

February 23, 2017 at 2:53 am

Hi Charles, I’m triying to emulate a function to calculate this distribution based on the result you have mention (KSDIST(G15,B14) = 1-KDIST(G15*SQRT(1000)) = .999167.)
nevertheless I’m getting a quite different value when using my function.

could you help me known what I’m doing wrong?

Public Function KSDIST(x As Variant, Optional n = 1000) As Variant
Dim check As Boolean
Dim t As Variant
t = Sqr(8)
check = False
If x <= 0 Then
check = True
Else
Dim F As Variant, k As Variant, R As Variant
F = 0
R = Application.SqrtPi(2) / x
For k = 1 To n Step 1
F = F + (1 / (Exp((2 * k – 1) * Pi) / (t * x)) ^ 2)
Next k
KSDIST = R * F
End If
If check = True Then
MsgBox "x must be a positive number"
Exit Function
End If

End Function

Result KSDIS(0.011706,1000)=234.7407 🙁
Reply
- Charles
  
  February 23, 2017 at 8:45 am
  
  Julio,
  Do you think that the result that is provided on the website is incorrect?
  Charles
  Reply
  - Julio Moreno
    
    February 23, 2017 at 8:08 pm
    
    No Charles, not at all, I think I´m doing something wrong but now I see it is an issue of parenthesis.
    
    Thanks!
    Reply
  - Julio Moreno
    
    February 23, 2017 at 9:53 pm
    
    I found a different definition of Kolmogorov distribution when searching in google but in russian.
    
    Coul you please tellme if this function is equivalent to the one you used?
    
    Public Function KSDIST(x As Variant, Optional n As Variant = 1000) As Variant
    ‘Function that provides either KS Distribution p value or ks distribution value at x
    Dim F As Variant, i As Variant, y As Variant, R As Variant
    F = 0
    For i = -n To n Step 1
    F = F + (-1) ^ i * Exp((-2 * (i * x) ^ 2))
    Next i
    KSDIST = F
    End Function
    Reply
David Harris

December 31, 2016 at 3:55 am

Hi Charles

How would I calculate the K-S D statistic against the random (Poisson) distribution, compare calculated D with Dα=0.05, and claim if the point pattern is clustered or not

Thank you
Reply
- Charles
  
  January 1, 2017 at 1:48 pm
  
  David,
  You do this as described on the referenced webpage. Do you have a specific question?
  Charles
  Reply
Doc

January 18, 2016 at 11:37 pm

Hi Charles – I was looking to build my own KS distribution without the use of the KSDIST and KSINV function. Would you have a quick guide on how to code it in excel or a formula?
Reply
- Charles
  
  January 19, 2016 at 10:51 am
  
  Hi Doc,
  You just need to program the formula given on the referenced webpage. Since it is an infinite sum, you will need to make a finite approximation.
  Charles
  Reply
Sohrab

June 17, 2015 at 9:58 pm

Hi Charles,

Do you know how to do 2D K-M Test for two samples? Have you prepared such a function in Excel?
Reply
- Charles
  
  June 17, 2015 at 10:04 pm
  
  Sorry Sohrab, but I have not created a 2D Kolmogorov Smirnov test in Excel.
  Charles
  Reply
  - Sohrab
    
    June 17, 2015 at 10:20 pm
    
    Thank you for responding but do you know if I can do the test in R or MATLAB or other software?
    Reply

Basic Concepts

Reference

18 thoughts on “Kolmogorov Distribution”

Leave a Comment Cancel reply