Basic Concepts
When the population mean and standard deviation are known, we can use the one-sample Kolmogorov-Smirnov test to test for normality, as described in Kolmogorov-Smirnov Test for Normality.
However, when the population mean and standard deviation are not known, but instead are estimated from the sample data, then the usual Kolmogorov-Smirnov test, based on the critical values in the Kolmogorov-Smirnov Table, yields results that are too conservative. Lilliefors created a related test that gives more accurate results in this case (see Lilliefors Test Table).
The Lilliefors test uses the same calculations as the Kolmogorov-Smirnov test, but the table of critical values in the Lilliefors Test Table is used instead of the Kolmogorov-Smirnov Table. Since the critical values in this table are smaller, the Lilliefors Test is less likely to show that data is normally distributed.
Examples
Example 1: Repeat Examples 1 and 2 of the Kolmogorov-Smirnov Test for Normality using the Lilliefors test.
For Example 1 of Kolmogorov-Smirnov Test for Normality, using the Lilliefors Test Table, we have
Since Dn = 0.0117 < 0.0283 = Dn,α, once again we conclude that the data is a good fit for the normal distribution. (Note that the critical value of .0283 is smaller than the critical value of .043 from the KS Test.)
For Example 2 of Kolmogorov-Smirnov Test for Normality, using the Lilliefors Test Table with n = 15 and α = .05, we find that Dn = 0.1875 < 0.2196 = Dn,α, which confirms that the data are normally distributed (more formally that we cannot reject the null hypothesis that the data is normally distributed).
Worksheet Functions
Real Statistics Functions: The following functions are provided in the Real Statistics Resource Pack to automate the table lookup:
LCRIT(n, α, tails, interp) = the critical value of the Lilliefors test for a sample of size n, for the given value of alpha (default .05) and tails = 1 (one tail) or 2 (two tails, default) based on the Lilliefors Test Table. If interp = TRUE (default) the recommended interpolation is used; otherwise linear interpolation is used.
LPROB(x, n, tails, iter, interp, txt) = an approximate p-value for the Lilliefors test for the Dn value equal to x for a sample of size n and tails = 1 (one tail) or 2 (two tails, default) based on a linear interpolation (if interp = FALSE) or the recommended interpolation (if interp = TRUE, default) of the critical values in the Lilliefors Test Table, using iter number of iterations (default = 40).
Note that the values for α in the table in the Lilliefors Test Table range from .01 to .2 (for tails = 2) and .005 to .1 for tails = 1. When txt = FALSE (default), if the p-value is less than .01 (tails = 2) or .005 (tails = 1) then the p-value is given as 0 and if the p-value is greater than .2 (tails = 2) or .1 (tails = 1) then the p-value is given as 1. When txt = TRUE, then the output takes the form “< .01”, “< .005”, “> .2” or “> .1”.
For Example 2 of Kolmogorov-Smirnov Test for Normality, Dn,α = LCRIT(15, .05, 2) = .2196 > .184 = Dn and p-value = LPROB(0.184, 15) = .182858 > .05 = α, and so once again we can’t reject the null hypothesis that the data is normally distributed.
Real Statistics Support for KS Test
Click here for information about the Real Statistics functions that perform the Kolmogorov-Smirnov test both when the mean and standard deviation are specified and when they are estimated from the data. Both raw data and data in the form of a frequency table are supported.
Lilliefors Distribution
Especially for values of α not found in the Lilliefors Test Table, we can use an approximation to the Lilliefors distribution. Click here for more information about this distribution, including some useful functions provided by the Real Statistics Resource Pack.
Examples Workbook
Click here to download the Excel workbook with the examples described on this webpage.
Reference
Lilliefors, H. W. (1967) On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown, Journal of the American Statistical Association, Vol. 62, No. 318, pp. 399-402.
https://pdfs.semanticscholar.org/4aad/1756e88dba86399a75891895e00b160f5460.pdf
Hi,charles
again your site proves to be my best source!
I got to do a normality test for an unknown sample size so i thought it lillie fors would be safe.
(do you think i should do chi-square as well for over 50 and use an if function?)
I also do not understand in this page where .83 and .895 come from.can you help?
Hello
Thank you for your kind words and support.
1. You need to know the sample size to perform the Lilliefors test.
2. I don’t know enough about what sort of data you have to be able to comment on the use of a chi-square test.
3. Other commonly used alternatives to the Lilliefors test are the Shapiro-Wilk test, d’Agostino-Pierce test, and Anderson-Darling test. All of these are described on the Real Statistics website.
4. The .83 and .895 values come from the Lilliefors Table. See
https://www.real-statistics.com/statistics-tables/lilliefors-test-table/
Charles
for the record i am using the normality for residuals of a singe linear regression
data can be as few as 4 and as big as 100
i could find .895 but .83 not
is it some kind of constant or you can look it up just as the .895 case !!!
is there a reason not to use the extended table(which goes up to 50)?
The formula for f(n) is written towards the end of the webpage, namely f(n) = (.83 + n)/sqrt(n) – .01.
The example uses n = 1000 which is larger than 50.
Charles
so,
.83
and
.01
apply to every calculation
they are something like constants
while .895 is found on the
table(including calculated >50 cases)
The .895 value only applies in the case where alpha = .05 and n > 50.
The .83 and .01 apply for all alpha but where n > 50.
Charles
and what is the case for n<50?where to look them up?
https://www.real-statistics.com/statistics-tables/lilliefors-test-table/
Charles
and what is the case for n50.
My question is if *.83* in f_n = (.83+n)/√n-.01 is a constant.ie it does not change whatever n and alpha or anyother parameter is
For n = 50 and alpha = .05, the critical value is .1246. You don’t use f(n)
Charles
Dear Charlie,
Thank you for your website, which is well written and particularly pedagogical.
I see a problem of principles in these tests of normality. In fact we don’t test the hypothesis Ho with an accuracy of alpha but we test the hypothesis H1 (rejection) with this percentage.
For the KS test for example the higher the % ( 0.95 ; 0.99 ; 0.995 ; … ) and the lower the chance not to conclude H1 and reject Ho, so the “easier” to conclude it would be a Gaussian! That makes no sense.
When the test passes with success, that does not mean we have 95 % (or more) it is a Gaussian. It means that we can’t say with 95 % chance it is something different. But the probability it is really a normal dsitribution is not known.
So shouldn’t we always take at least 50 % (meaning 50 % or *less*) if we want to conclude distribution is a Gaussian ? Indeed, to fairly conclude we have a “good” chance that it is a Gaussian, we should at least be allowed to say there is no 50 % chance it is something else…
Hello Chris,
This is the sort of issue we have with all statistical tests (at least the non-Bayesian tests). We don’t know whether the data is really coming from a normal distribution whether the p-value is 50% or 2%. The value of 5% is arbitrary, but commonly used, compromise. Since rejection occurs for values less than alpha, the lower the alpha value the more likely you are to declare the data as normally distributed. An alpha of 50% would increase the likelihood that you would declare the data as not normally distributed.
Charles
For LCRIT, I can’t seem to get a value if n > 50. What am I doing wrong?
Mark,
I am not sure what you are doing wrong, but I just tried to use =LCRIT(60), and I got the value .114113. What version of Real Statistics are you using? You can find this out by entering the formula =VER()
Charles
Charles
I am using 4.14 2010.
When I use Excel 2013 with the corresponding Real Statistics version, it works OK.
I don’t like Excel 2013, so I guess this a cost of that attitude.
Charles
Problem solved – I installed version 5.2 for Excel 2010 and LCRIT works for a sample size of 300.
Mark
Mark,
Good to hear.
Charles
Hey Charles,
If I’m not mistaken, Dn from the Kolmogorov-Smirnov Test for Normality page should be Dn = 0.1875, not Dn = 0.184.
Thanks.
David,
Yes you are correct. Thanks for catching this mistake. I really appreciate your helping in improving the Real statistics website.
Charles
Of the many tests regimes there are for tests for normality. Is there a list illustrating the order of preference for the test method according to the type of data you have?
I mean which test should I use for what type of data? It seems to be so easy to fudge a result as necessary according to the test method.
Keith,
In general, I believe that the Shapiro-Wilk test is the best one to use. If you have a number of ties, then d’Agostino-Pearson is probably better.
Charles