Chi-square Test for Normality

The chi-square goodness of fit test can be used to test the hypothesis that data comes from a normal hypothesis. In particular, we can use Property 2 of Goodness of Fit, to test the null hypothesis:

H0: data are sampled from a normal distribution.

Example (known parameters)

Example 1: 90 people were put on a weight gain program. The following frequency table shows the weight gain (in kilograms). Test whether the data is normally distributed with a mean of 4 kg and a standard deviation of 2.5 kg.

Frequency Table and Histogram

Figure 1 – Frequency table and histogram for Example 1

We begin by calculating the probability that x < b for b = 0, 1, …, 8, assuming a normal distribution with a mean of 4 and a standard deviation of 2.5. This probability is NORM.DIST(b, 4, 2.5, TRUE). The probability that x is in the interval (a, b] is then

NORM.DIST(b, 4, 2.5, TRUE) – NORM.DIST(a, 4, 2.5, TRUE)

Multiplying these values by the sample size of 90 gives us the expected frequencies.

Chi-square test (known parameters)

Figure 2 – Chi-square normality test with known parameters

We now perform the Chi-square goodness of fit test. Since the observed and expected frequencies of the first and last intervals are less than 5, it is better to combine the 1st and 2nd intervals as well as the last and second-to-last intervals, as shown on the right side of the figure.

The chi-square test statistic is SUM(K4:K11) = 4.47 (cell K12), which is less than the critical value of CHISQ.INV.RT(0.05,K14) = 14.07 (cell K15), and so we can conclude there is a good fit. Note that the value of df = number of intervals – 1 = 8 – 1 = 7 (cell K14), since the population mean and standard deviation are known. We get the same result by observing that p-value = CHISQ.DIST.RT(4.47) = .724 > .05 = α.

Example (estimated parameters)

Example 2: In the above example, the population mean and variance were known. This is usually not the case. This time we will simply ask whether the above data comes from a normal population.

We first calculate the sample mean and variance as described in Frequency Tables using the midpoint of each interval, although for the first and last intervals (-∞,0] and [8,∞) we need to guess at acceptable representative values, which we take as -1 (i.e. a weight gain of 1 kg) and 9 respectively.

Calculating parameters from data

Figure 3 – Calculation of parameters from data

We next test the null hypothesis that the data is normally distributed using the sample mean 3.74 (cell F16) and standard deviation 2.20 (cell F17) as estimates for the population mean and standard deviation.  The mean is calculated by the formula =E14/B14 and the variance by the formula =(F14-B14*F16^2)/(B14-1).

As in Example 1, we combine the first two and the last two intervals so that all expected frequencies are at least 5, as shown in Figure 4. Once again we use a chi-square goodness of fit test based on 8 intervals, but this time since the mean and standard deviation are estimated parameters, per Property 1 of Goodness of Fit, we use df = 8 – 1 – 2 = 5.

Chi-square test (estimated parameters)

Figure 4 – Chi-square test using estimated parameters

Since \chi^2 = 1.35 < 11.07 = \chi^2_{crit} (or p-value = .930 > .05 = α), we again retain the null hypothesis that the data are normally distributed.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

Zar, J. H. (2010) Biostatistical analysis 5th Ed. Pearson
https://bayesmath.com/wp-content/uploads/2021/05/Jerrold-H.-Zar-Biostatistical-Analysis-5th-Edition-Prentice-Hall-2009.pdf

Agresti, A. (2013) Categorical data analysis, 3rd Ed. Wiley.
https://mybiostats.files.wordpress.com/2015/03/3rd-ed-alan_agresti_categorical_data_analysis.pdf

100 thoughts on “Chi-square Test for Normality”

  1. hi, i have seen another implementatio where from your version of df , two was subtracted with the explanation that 2 parameters , namely mean and stdv, were used.

    which is ‘correct’, or else put in which circumstances to use each version

    Reply
  2. Hi Charles, very neat explanation, you state that “since the observed and expected frecuencies are less than 5…”, if this were a different distribution (let´s say an exponential one) would the threshold of 5 still be true?

    Reply
  3. Why one tailed test in example 1, because you wrote CHIINV(.05,7). I think two-tailed test would be more apropreate, CHIINV(.025,7). What do you think?

    Reply
    • You can check for normality based on the midpoint of each interval. Such tests will be less accurate, but this is probably the best you can do without have the raw data.
      Charles

      Reply
  4. Hello Charles,
    Could you please provide some insights or point to reference work that would explain why midpoint are used in Example 2. I understand we use them because the population’s mean and stdev are unknown, but I’d like to be able to understand the mathematical intuition behind this? Thanks very much for your time and your awesome contribution to statistics on the WWW!

    Reply
    • Martin,
      You need to pick some value and the midpoint seems a reasonable choice. If the data is heavily skewed, you might actually pick a different, more representative point.
      I am please that you appreciate my contribution to statistics on the web. I am trying to do my part.
      Charles

      Reply
  5. Hi,
    i have financial data for 80 firms for 10 years,2007-16,,,,with 3 explanatory and 1 moderating variable.
    1- how can i check the normality of my data?
    2- my R2 is very low (0.04), even i creased the sample size from 60 firms to 80, but still result the same….while P.value is less than 0.01.

    Reply
    • Moin,
      1. I suggest that you use the Shapiro-Wilk test to check for normality.
      2. It may be that the data is not a good fit for the regression that you are conducting. If you are conducting multiple linear regression, then you should draw scatter plots (e.g. each independent variable vs. dependent variable). If these don’t look linear, then you may have a problem.
      Charles

      Reply
  6. Now , I have 4 column (4 categories), each catergories have 10 points. Can we use the Chi-Squared test for normality and how can I do it? (Using only Chi Square Test). Since my lecturer only taught Chi-squared Test, I can not apply another method such as Lilliefors Test,……

    Reply
    • David,
      Sorry, but I don’t understand the first sentence of your comment.
      Note that Lilliefors test is the same as the Chi-square Goodness of Fit test using a different table of critical values. You should use Lilliefors test when you are estimating the mean and standard deviation from the data and the Chi-square test when the mean and standard deviation are known.
      Charles

      Reply
      • For example, I have a biostatistics problem like this:

        A scientist determined the effectiveness of segmental wire fixation in athletes with spondyolysis. Between 1993 and 2000, 20 athletes (6 women and 14 men) with lumbar spondyolysis were treated surgically with the technique. The following table gives the Japanese Orthopedics Association evaluation score for lower back pain syndrome for men and women prior to the surgery. The lower score indicates less pain.

        Gender JOA scores
        Female 14,13,24,21,20,21
        Male 21,26,24,24,22,23,18,24,13,22,25,23,21,25

        Give conclusion for the evaluation of the segmental wire fixation treatment between male and female?

        So, this is the question. To solve this problem, I have to do 3 steps:
        – test the variance (F -test)
        – Normality test (Chi- square distribution) to determine the population is normally distributed or not.
        – After using the normality test and depending on the condition’s question to apply ANOVA or kinds of non-parametric test.

        I get stuck in question 2. Can I gather all data points in one group and use chi square test to find the population is normally distributed or not ?

        Reply
        • David,
          Each group should be tested for normality. I suggest that you use the Shapiro-Wilk test instead of the chi-square test. If you use the chi-square test, I suggest that you use the Lilliefors version of the test.
          Charles

          Reply
  7. Hello, i wanna ask, if the data is normally distributed which means it is parametric, can i use chi square test, which actually for the parametric test?
    Thank you.

    Reply
  8. Hi Charles,
    I have data of students with age , gender , IQ scores and thumb size. I want to test normality can you guide how to proceed.

    Reply
  9. Hello Charles,
    Can we use the Chi-Squared test for normality when we have actual sample data ? I see the two cases you presented are:
    1)When data is presented in terms of frequency tables
    2)When we are testing against a specific pair (mean, st.dev)

    Now , I have 120 sample data points. Can I test whether these points come from a normal population by calculating the sample mean and sample deviation ( S.E ?) and applying method 2?
    Thanks in Advance.

    Reply
    • Guero,
      Yes, but in this case I suggest that you use the Lillifiers version of the test since you will get more accurate answers. See the webpage
      Lilliefors Test
      In general, I find that the Shapiro Wilk test for normality is more accurate than the chi-square approach. See the following webpage for information about Shapiro-Wilk
      Charles

      Reply
      • Thanks, Charles, another one, please:

        Say we have a multilinear regression:
        Y ~a1X1+a2X2+a3X3

        We want the residuals (Y- (a1X1+a2X2+a3X3)) ,
        to be normally distributed. If they are,

        does it follow that the residuals Y|X1, Y|X2, Y|X3 ; Y|Xi
        means Y restricted to Xi

        ( i.e., we regress Y against X1, holding X2=X3=0) are also
        normally -distributed?

        I guess this is equivalent to asking whether the residuals
        for Y~a1X1+a2X2+a3X3 are jointly normal?

        Hope I didn’t make this confusing and thanks again.
        Guero.

        Reply
  10. Hi Charles,

    Thanks for the excellent web page, extremely useful!

    I am getting slightly confused when using different significance levels and whether or not we would accept the null hypothesis.

    In your example, the test statistic is 1.35 and as this is less than the critical region CHIINV(0.05,5)=11.07 then we accept.

    Imagine our test statistic was 12. Under a 5% significance level we would reject H0. But if we used a 1% significance level the critical region would be CHIINV(0.01,5)=15.09. This would mean we would reject the null hypothesis under 5% but accept under 1%.

    However I though using a smaller significance level is ‘more reliable’. So I am confused that a sample could exhibit less Normal qualities i.e. higher test statistic and still pass a ‘more robust test’.

    Thanks!

    Reply
    • Chris,
      Sorry for the delay in answering your question.
      The smaller the value of alpha, the smaller the critical region, i.e. the region where the null hypothesis is rejected. This means that for a lower alpha value, it is less likely that the null hypothesis would be rejected. This is consistent with your example. At 5% the data is not consistent with a normal population, while for 1% the data is consistent (enough) with a normal population.
      Perhaps another way at looking at this is that 1% the acceptance region is larger than the acceptance region at 5%. Also at 5% we can afford to be wrong 1 out of 20 times, while at 1% we can afford to be wrong only 1 out of 100 times.
      Charles

      Reply
    • Ramon,
      The critical value is CHIINV(alpha,df) = CHIINV(.05,7) using Excel 2007 or CHISQ.INV.RT(alpha,df) = CHISQ.INV.RT(.05,7) using more recent versions of Excel.
      Charles

      Reply
        • If you put =CHIINV(0.05,7) in Excel you get 2.16734991
          If you put =CHIINV(0.95,7) in Excel you get 14.0671404
          So, correct formula is the second
          Best regards

          Reply
          • Ramon,
            That’s interesting; when I enter =CHIINV(.05,7) on my computer I get 14.067… If I enter =CHISQ.INV(.05,7) I get 2.167… If I enter =CHISQ.INV.RT(.05,7) I get 14.067…
            Charles

  11. Hi, I’m a bit of a noob in stats and I’m stuck with the Chi squared methods at the moment. I need to use it to test the normality of some data I’ve been supplied with (sample size of 40, sorted into 8 groups of 5), I’ve sorted it into ascending order, found the average values at the boundary of each group, and then used these to find the value i need to use to compare to a normal distribution curve, however I’m stuck trying to find out how to do this in excel? any help would be great thank you 🙂

    Reply
    • Ben,
      Is there some reason why you are testing for normality in this way? Why can’t you simply test normality on all 40 elements? (although for some tests — e.g. Anova — you need to check each group for normality) Also, generally chi-square is not the best test for normality. Shapiro-Wilk is usually a better test.
      Charles

      Reply
  12. Hi Charles,

    In example 1, when you say: “The probability that x is in the interval (a, b] is then NORMDIST(b, 4, 2.5, TRUE) – NORMDIST(a, 4, 2.5, TRUE)” can you please tell me what is the meaning of “a”?. I have tried to do the calculations taking “a” as the frequency or fx or fx^2 but none of those work. Thanks

    Reply
    • John,
      Here I am referring to cumulative probability, i.e. F(x). F(a) = the probability that the outcome is less than a. Thus, the probability that the outcome is between a and b is F(b) – F(a).
      Charles

      Reply
  13. Dear Charles
    i would like to ask how i check normality or the distribuation of my data by prism or excel for biological data for ex : westrenblotting data to decide to use anova or nonparmetric test

    Reply
  14. Forget to write that there are 50 participants, 18 boys n 32 girls. I want to compare them. Are they normal and homogene or not..? If it was not normal then for comparing what I should use…?

    Reply
  15. Hello Sir!
    Thank you for your help in advance,
    I have a question, , it is about motivation of students in learning English,
    How I could know the normality and homogeneity in order to compare who is more motivated girls or boys? Motivation is consists of integrative and instrumental motivation, but I should do it manually, my question is how I could do this? Do you think I have to use chi_ square or another way…?

    Reply
    • Hedi,

      There are many tests for normality. In general, I suggest that you use the Shapiro-Wilk test. You should test both the boys sample and the girls sample for normality (separately). See the following webpage:
      Shapiro-Wilk

      There are also many tests for homogeneity of variances. I suggest that you use Levene’s test. See the webpage
      Levene’s Test

      If you use the t-test with unequal variances, then you don’t need to check for homogeneity of variances. See the webpage
      t test with unequal variances

      Charles

      Reply
  16. Hello Sir!
    Happy new year!
    Thank you for your help in advance,
    I have a question, , it is about motivation of students in learning English,
    How I could know the normality and homogeneity in order to compare who is more motivated girls or boys? Motivation is consists of integrative and instrumental motivation, but I should do it manually, my question is how I could do this? Do you think I have to use chi_ square or another way…?

    Reply
    • There are many tests for normality. In general, I suggest that you use the Shapiro-Wilk test. You should test both the boys sample and the girls sample for normality (separately). See the following webpage:
      Shapiro-Wilk

      There are also many tests for homogeneity of variances. I suggest that you use Levene’s test. See the webpage
      Levene’s Test

      If you use the t-test with unequal variances, then you don’t need to check for homogeneity of variances. See the webpage
      t test with unequal variances

      Charles

      Reply
  17. I was studied that we use normality test to test our data normal distributed or not. And it will decide the method we use for hypothesis testing: parametric or non-parametric test. For testing in 1 sample as your example, we can easily to conclude. However, how about use normality test for 2 more samples in a problem? We use normality test for each 2 sample, right? So if other sample is not normal distribution, how we can conclude it? I confuse this when we have 2 more samples to decide the method for hypothesis testing.
    Thank you so much.

    Reply
    • Laura,
      For tests such as ANOVA you need to test each group sample for normality. In a 3 x 3 design, this means that you need to test each of the 9 groups for normality. Remember though that ANOVA and many other tests are pretty robust for departures from normality. Happy New Year.
      Charles

      Reply
      • Happy New Year, sir
        In the case of 2 samples. If this population is normal distributed, I will using testing 2 means for hypothesis testing. However, if this population is not normal distributed. My hypothesis testing will Mann-Whitney U test for independent sample or Wilcoxon Matched-Pairs Singed Rank test for dependent sample.
        So, my question is if 1 sample in normality test is normal distributed, other is not normal distributed? Does this case happended? This question is same for 3 means or more in order to decide using ANOVA or Krushal test. I’m so confused between when we use parametric test and non-parametric test.
        Thank you so much, sir.

        Reply
    • Laura,
      When comparing two samples, each sample should be normal. If one is normal and the other is not, then the test may not be valid. Even so, a t test is pretty robust to violations of normality. Generally, a problem occurs when one or both samples are far from symmetric. If both samples are skewed to the right, then you are probably better off using a nonparametric test (Mann-Whitney).
      Charles

      Reply
  18. Please, next time, indicate how the mean, variance in figure 3 are computed. Or, better yet, show the formulas for every equation so that we wouldn’t have to make guesses as to how they were computed.

    Reply
  19. Dear Charles,

    I would like to thank you for this extremely useful resource !

    I have a question regarding normality check via Chi-Square testing and sample size. I am applying your calculation to a case in which measurement of dust is involved. This means that there is a very large sample size. Since the dust grains measured are not really counted, but only weighted its amount in classified sizes, the results of frequency are given in percentage. Thus, I assume a sample size of 100, but I get extremely large X2 values, that, compared to an independent to the sample size X2 critical, and thus constant, make my conclusion always NOT NORMAL distribution. I fulfill all the criteria for the tests (more than 5 classes, larger frequency than 5, or grouped frequencies, etc) I’ve cross checked some of this distributions with Shapiro-W test and they are normally distributed.

    I tried to lower and to increase the population number for the Chi2-testing keeping the % fractions but still, I either get too low frequencies or too large X2…

    To rule out that I have overseen something, I took your example “Norm Chi-sq 1” and multiplied by 10 or 100 the given frequencies and the same effect occurs. Is there any explanation to this phenomena? Am I overseeing something? What would be your recommendation to proceed?

    Thanks.

    Reply
    • Dear Juan,
      I don’t completely understand the problem that you are having with the chi-square test, but this is not really a great test for normality. Shapiro-Wilk is usually one of the best tests for normality. I would also create a graph (e.g. Q-Q plot) to make sure.
      Charles

      Reply
      • Thank you for your quick answer !

        In short what I mean is that the tests seems very sensitive for sample size: if sample size goes up, the X2 calculated goes up very much and it is then very easy to be out of normality… If you take your example and keep the ratios between frequencies (imagine that they would be given as percentages) and you increase the “n”, the test changes drastically… is that a known effect?

        Reply
        • Juan,
          Most statistical tests are sensitive to sample size. With very big samples it is often easier to find a significant effect.
          Charles

          Reply
  20. Hey, great example!

    I’m trying to use the Chi-Squared Goodness of Fit test to see if I can assume normality for further tests on my 2 samples of data. Basically I recorded battery drain times for 2 popular brands of batteries, 20 samples per brand. I want to see if I can assume normality for the 2 samples. What would my Ho and Ha be?

    Thanks for the help!

    Reply
    • As stated on the referenced webpage, H0: data are sampled from a normal distribution, and so Ha: data are not sampled from a normal distribution.
      Charles

      Reply
  21. How can I test normality for a sample of 36 monthly returns in percentage for a stock?

    Is N = 36 a large enough sample te reasonably test normality or should I increase N to say, 48 or 60…?

    Thanks Charles!

    Reply
    • N = 36 should be a big enough sample. I suggest that you use a test like Shapiro-Wilk instead of Chi-square to test for normality.
      Charles

      Reply
  22. Hello Mr.Charles,

    It is my understanding that using Chi-Square test, I can check goodness of fit of my data. So, I can check for example, if my data follows binomial distibution with some probability of success.

    Now, suppose I believe my data follows a Chi-Square distribution then how would I check it? Hope it is not an absurd question, in which my apologies.

    Reply
    • The value of a is simply the value of x prior to b in the frequency table. For this example, if b = 3 then a = 2.
      Charles

      Reply
  23. I’m looking for a non-traditional way to explain GOF.
    In Example 2 with df=5 and Chi^2=1.35 is there about a 7% probability that we would be correct if we said the data were not normally distributed?
    Does that imply that there is a 93% probability that the data are normally distributed?
    Alternatively, if we try fits for several types of distributions we can say that there is a 7% chance that we are wrong if we reject normal; an x% chance that we are wrong if we reject uniform, etc.
    Do we need to make the negative statement or can we make a positive statement?

    Reply
    • John,

      No this is not correct. Actually you need to look at the conditional probabilities given that the null hypothesis true.

      “Suppose we perform a statistical test of the null hypothesis with α = .05 and obtain a p-value of p = .04, thereby rejecting the null hypothesis. This does not mean that there is a 4% probability of the null hypothesis being true, i.e. P(H0)=.04. What we have shown instead is that assuming the null hypothesis is true, the conditional probability that the sample data exhibits the obtained test statistic is 0.04; i.e. the probability of D given that H0 is true = P(D|H0)=.04 where D = the event that the sample data exhibits the observed test statistic.”

      Charles

      Reply
  24. I have an unrelated question, I looked through the comments above an thought I would ask my question, I am performing a goodness of fit test and the mean and SD were given to me as percentages. I am not sure what to do with these values or how to convert them into a number usable for my expected values.
    Many thanks

    Reply
    • Jared,
      It really depends on what these percentages represent, but the likely answer is that you simply multiply the percentages by the sample size.
      Charles

      Reply
  25. Can I use this procedure to test whether a sample data set came from a chi-square distribution? If not how do I to test for the chi-square distribution?

    Reply
      • Thank you for your fast reply. I need a little further clarification. I wish to test a column of computed chi square values that is 10,000 entries long. Applying Theorem 3 in example 4, as you suggested, I would use CHIDIST(chi square, df) = CHIDIST(?,9999). What would be entered into the chi square portion? I want to test the whole column, not just a single number as in example 4, so would I just enter the column in which the data is in ? A histogram of the data leads me to believe that the it does indeed fit the chi square distribution. I just need a p-value to confirm it.

        Reply
  26. Hello Sir.
    Im just wondering why do we need to combine classes if the expected frequency is less than 5. Why 5 but not other values? And how does it affect our results if we do not combine classes with expected frequency which is less than 5?

    Reply
  27. Hi Charles,

    Thank you for the great article.

    I’m confused. In example 2 you use a df of 5 (k-m-1 = 8-2-1). 2 since mean and variance are unknown but what causes the -1? I can see that you refer to Theorem 3 but according to wiki:
    http://en.wikipedia.org/wiki/Goodness_of_fit

    “where \nu is the number of degrees of freedom, usually given by N-n-1, where N is the number of observations, and n is the number of fitted parameters, ASSUMING THAT THE MEAN VALUE IS AN ADDITIONAL FITTED PARAMETER. ”

    I guess the “-1” is due to the mean and the “n” is the additionally fitted parameters.
    So for your example 2 it should be 8-1-1 = 6 as ONLY variance is an additional parameter?

    Please correct me if i’m wrong.

    Best Regards
    Gustav

    Reply
    • Gustav,

      I beleve that in the example given in wikipedia the population mean is unknown (and is estimated by the sample mean) and the population variance is known. Thus df = N-n-1 = N-1-1 = N-2. Here N = number of obervations and n = number of fitted parameters = 1 in this case. If N were 8, then df = N-2 = 6.

      In Example 2 of the referenced webpage, both the population mean and the population variance are unknown, and so n = 2. Since N = 8, we have df = N-n-1 = 8-2-1 = 5.

      Charles

      Reply
  28. hello sir, i have asked for lognormal distribution problem in K-S test for which you replied. thank you very much.

    if possible please explain one problem of Log normal distribution in chi square test. it’ll be great helpful people like me, who are new to statistics.

    thank you sir

    Reply
    • Hello Sandeep,
      To replicate Example 1 and 2 on the referenced page with the log normal distribution instead of the normal distribution, just replace formulas of the form =NORMDIST(x,mean,stdev,TRUE) by =LOGNORMDIST(x,mean,stdev) or LOGNORM.DIST(x,mean,stdev,TRUE).
      Charles

      Reply

Leave a Comment