Testing for Normality and Symmetry

Since a number of the most common statistical tests rely on the normality of a sample or population, it is often useful to test whether the underlying distribution is normal, or at least symmetric. This can be done via the following approaches:

Review the distribution graphically (via histograms, boxplots, QQ plots)
Analyze the skewness and kurtosis
Employ statistical tests (esp. Chi-square, Kolmogorov-Smironov, Shapiro-Wilk, Jarque-Bera, D’Agostino-Pearson)

If data is not symmetric, sometimes it is useful to make a transformation whereby the transformed data is symmetric and so can be analyzed more easily.

Topics

References

Wikipedia (2012) Normality test
https://en.wikipedia.org/wiki/Normality_test

Wikipedia (2012) Skewness
https://en.wikipedia.org/wiki/Skewness

81 thoughts on “Testing for Normality and Symmetry”

Alana

December 3, 2025 at 8:55 pm

Please, how do I go about testing for homogeneity and auto correcting of my data
Reply
- Charles
  
  December 4, 2025 at 8:54 am
  
  Hello Alana,
  For regression, see
  https://real-statistics.com/multiple-regression/heteroskedasticity/
  https://real-statistics.com/multiple-regression/weighted-linear-regression/
  https://real-statistics.com/multiple-regression/robust-standard-errors/
  Bootstrapping is also used.
  Charles
  Reply
Mohadeseh

January 30, 2023 at 8:21 pm

Hello, I hope you are doing well.

I have a question about normalization according to my dataset.
I have a data set that contains different groups for different participants (for example, the data for MRI (Group 1), PET (G2), and CT scan (G3)), as well as one frequency band for each group. Now I’m debating whether I should import my data based on frequency band, with one column displaying my frequency band and rows containing all participants from different groups combined, or if the first column contains the frequency band for one group, the second column contains the frequency band for group two, and the last column is related to group three based on that frequency band. In which way should I import my data in order to do normalization?

I got stuck, and if you know the answer to my question, I really appreciate your help.
Thanks in advance.
Mohadeseh, a biomedical engineer.
Reply
Prat

August 16, 2020 at 11:52 am

Hi Charles

I have got different sets of weights of the host and the number of parasites that has infected it . I would like to check if there is any association between weight of the host
And the parasite . In order for this I need to take a spearman’s test. Can you please let me know what quantities I would require to test ?
Reply
- Charles
  
  August 16, 2020 at 7:14 pm
  
  Hello Prat,
  For the Spearman’s test see https://real-statistics.com/correlation/spearmans-rank-correlation/
  Charles
  Reply
- Jeffersson Flórez
  
  September 30, 2020 at 6:19 pm
  
  Buenos días Estimado Doctor Charles, me disculpa la pregunta soy nuevo en estos temas, estoy realizando la Prueba de Normalidad a una muestra ( mayor a 7 Datos) la estoy realizando con D’Agóstino Pearson, los datos son Datos modales y me dice que no hay normalidad en los Datos, que otra prueba podría realizar para hallar normalidad en los Datos?
  Gracias.
  Reply
Hajnalka Dancsi

April 22, 2020 at 4:40 pm

Hi, i was wondering if you could advise me. I have a non-normally distributed data (for almost all of the dependent variable i have) but the assumption of homogeneity of variance is met. In this instance, would Anova be robust enough to put up with the non normality or should i just turn to use non-parametric test such as Kruskall Wallis (i have 3 conditions). Many thanks for your advice in advance.
Hajnalka
Reply
- Charles
  
  April 22, 2020 at 8:05 pm
  
  If the data is not too far from normality (especially if it is reasonably symmetric) then Anova should work. If not, Kruskal-Wallis is a good choice.
  Charles
  Reply
Nicole

April 10, 2019 at 12:14 pm

Dear Charles,
I was hoping you would be able to provide me with some information.
I have pre and post scenarios and in each I have three separate conditions (data contains 17 variables which I need to analyse). I am in need of normality in my data for parametric analysis. Do I normalise all together (pre and post) or do I normalise separately? Some of my pre variables are normally distributed and some are not, the same goes for my post variables (but these are not the same). If I normalise separately I will end up with different values, which will unable be to analyse the effects on the variables, but if i normalise together I will be normalising some of the variables that would otherwise be normal if looked at separately.
Any advice would be greatly appreciated.
Nicole
Reply
- Charles
  
  April 11, 2019 at 8:45 am
  
  Dear Nicole,
  It is likely that you need to show that the differences between the pre and post values, i.e. the z_i where z_i = x_i – y_i) are normally distributed.
  Charles
  Reply
Elsa

April 9, 2019 at 12:12 pm

Sir What is the difference between Test for symmetry and test for normality?
Reply
- Charles
  
  April 9, 2019 at 7:49 pm
  
  Elsa,
  I have already answered this question for you, but here is my response again:
  You can test for symmetry using the Box Plot or Histogram graph. Alternatively you can test whether the skewness is zero. See the following webpage for the skewness test:
  https://real-statistics.com/tests-normality-and-symmetry/statistical-tests-normality-symmetry/dagostino-pearson-test/
  There are number of tests for normality. One of these is the d’Agostino-Pearson test, as described at the above webpage, but usually the best test is the Shapiro-Wilk test, as described at
  Shapiro-Wilk Test
  Charles
  Reply
  - Elsa
    
    April 10, 2019 at 4:46 pm
    
    Thank you for the response sir.
    Reply
  - Elsa
    
    April 11, 2019 at 4:00 am
    
    Hello sir,
    What is the formula that I used to solve the Test for symmetry?
    Reply
    - Charles
      
      April 11, 2019 at 8:49 am
      
      Elsa,
      It is given on the webpage that I suggested earlier.
      Charles
      Reply
Evan

February 7, 2019 at 6:15 am

Dear Sir,
How would you state where the empirical rule and Chebyshev ‘s Theorem are used to test normality.
I understand the two models, but I am curious of the best way to describe this?

Thank you,
Reply
- Charles
  
  February 7, 2019 at 11:18 am
  
  Dear Evan,
  Chebyshev’s Theorem applies to all distributions and so I don’t think it would be useful as a test for normality.
  If you can determine that the data has a bell-shaped curve (e.g. via a histogram), then the Empirical Rule could be useful in testing for normality.
  Charles
  Reply
Patricia Anne

January 13, 2019 at 2:52 am

How can we say if the given distribution is normal
Reply
- Charles
  
  January 13, 2019 at 8:12 am
  
  Patricia,
  This part of the website is devoted to answering that question. Read especially about QQ plots and the Shapiro-Wilk test.
  Charles
  Reply
ChiRho

March 31, 2018 at 4:24 am

Hi, can real statistics test the normality of the residuals generated by making use of anova?
Reply
- Charles
  
  March 31, 2018 at 8:11 am
  
  Sorry, but I don-t understand your question.
  Charles
  Reply
  - Gabriel
    
    March 31, 2019 at 6:12 pm
    
    I think what they are referring to is the levene test, which is basically an ANOVA that uses the sum of the absolute value of the residuals (in place of the sum of the x’s), and the sum of the residuals squared (in place of the sum of x’s squared).
    Reply
    - Charles
      
      April 1, 2019 at 11:18 am
      
      Hello Gabriel,
      Thanks for helping out. Whose comment are you referring to?
      Charles
      Reply
Rohit Nair

November 6, 2017 at 9:30 am

Hi,
Can we use page view/unique visitor report over a period of time can be considered as normal distribution ?
I have data of unique visitors for last 52 week.
also I got the following value
Kurtosis 7.89154432
Skewness -2.104581667

any other test you recommend to analyse the same ?
Reply
- Charles
  
  November 6, 2017 at 9:38 am
  
  The kurtosis is extremely high compared to a normal distribution. The skewness is also a little low for a normal distribution. The kurtosis and skewness of a normal distribution is zero, although we could accept some variation from these values, but not the values you have found. Your data does not appear to be normally distributed. A variety of tests can be used to confirm these results. These tests are listed on the referenced webpage.
  Charles
  Reply
Anders

October 10, 2017 at 1:55 pm

Dear Charles,

Very good website. I found a lot of useful information.

I have a set of data with over 3000 data entries. I want to test whether data is normally diststributed. I understand (from your website) that a normal shapiro wilks test only can handle a small data sample.

How do I determine normality and what test do you recommend?

/Anders
Reply
- Charles
  
  October 10, 2017 at 9:41 pm
  
  Anders,
  Shapiro-Wilk can be used with a data set of 5,000 entries, as shown on the following webpage:
  https://real-statistics.com/tests-normality-and-symmetry/statistical-tests-normality-symmetry/shapiro-wilk-expanded-test/
  Actually, this webpage shows how to support ever larger data sets.
  Charles
  Reply
Lee

October 1, 2017 at 8:47 am

Hi Sir,

I am in biology background and very new to statistical analyses. I have variable 1 and variable 2. For each variable, I have two timepoints, timepoint 1 and timepoint 2. For each timepoint, I have three treatments, treatment 1, 2 and 3.

E.g. for variable 1 or 2
Timepoint 1, treatment 1, 3 replicates
Timepoint 1, treatment 2, 3 replicates
Timepoint 1, treatment 3, 3 replicates
Timepoint 2, treatment 1, 3 replicates
Timepoint 2, treatment 2, 3 replicates
Timepoint 2, treatment 3, 3 replicates

My ultimate goals is to do statistical test to investigate effects of timepoint, treatment and their interactions on variables 1 and 2 (e.g. two way ANOVA). Also, I want to do correlation test between variables 1 and 2 (e.g. Pearson).

My problems are:
(1) When I want to do normality/homogeneity test, should I use the mean (from three replicates) or all the values from three replicates? And for these tests should be done within variable (mean-6 values; replicate-18 values) or within timepoint (mean-3 values; replicate-9 values) or within treatment (mean-1 values; replicates-3 values)?
(2) Similarly, when transformation is required in case of non-normal distribution, should I transform data for each replicate first, and then calculate mean and standard deviation? or directly transform the mean?
(3) I understand that percentage data should be transform prior to statistical analysis. May I know the definition of ‘percentage data’ refers to “%” in which the range is from 0-100, or also other forms of ‘percentage’ such as “%/day” or “% d-1”?

Thank you in advance and hope that my questions are clear.
Reply
- Charles
  
  October 3, 2017 at 8:36 am
  
  Lee,
  (1) Which to do, depends on what hypothesis you are testing. I don’t really know what you mean by “I want to do correlation test between variables 1 and 2 (e.g. Pearson)”, since you have many different types of data for these two variables.
  (2) Generally you transform the data and not the mean.
  (3) You can transform any of these version of percentage, as long as you use the same transformation for all the data in any one group.
  Charles
  Reply
christopher delino

September 30, 2017 at 10:52 pm

1. Sir, i am going to correlate variable 1 to variable 2. But when i test the distribution of each variable, one group is normally distributed while the other is abnormally distributed. What should i use then parametric(pearson) or nonparametric(spearman)?

2. And are their cases when normality testing is ignored?

Thank you ver much.
Reply
- Charles
  
  October 1, 2017 at 8:25 am
  
  Christopher,
  1. The two samples don’t need to be normally distributed in order to compute their correlation. You may need normality for certain tests. What hypothesis are you trying to test?
  2. There are many tests for which normality is either not required or for which the test is pretty robust to violations of normality (e.g. Anova).
  Charles
  Reply
  - christopher delino
    
    October 1, 2017 at 1:32 pm
    
    The question is “Is there a significant relationship between parental involvement (in weighted mean) and the academic performance (score in their test) of their children?”
    Reply
Christopher Delino

September 24, 2017 at 8:27 am

Hi sir.

I am currently writing a research, and I have found out after using KS Test and shapiro. Comparison of scores among the different grade levels, has normal dist, so i will use AnoVa. However, in comparison of scores among different location, is abnormal dist, so i will use kruskal wallis.

My question is, i am using the same scores but when they are grouped differently. They have resulted to different dist (one is normal and the other is abnormal). Would it be wise/proper to use anova for the 1st and kruskal wallis for the other? Considering all of these belongs to entirely one research?

Thank you very much.
Reply
- Charles
  
  September 24, 2017 at 8:55 am
  
  Christopher,
  You can certainly use ANOVA for one test and Kruskal-Wallis for another test, even if they are for the same research area.
  Charles
  Reply
David

August 8, 2017 at 11:26 pm

Hey Charles,

Do you have any guidance on when to rely on the central limit theorem (eyeballing a histogram or QQ plot) compared to a test statistic such as the Anderson-Darling, Lilliefors, or Shapiro-Wilk test?

I have read that the CLT is more applicable for a large amount of data points, while test statistics become increasingly unreliable with large amounts of data points because even a minor amount of outliers could cause the null hypothesis of normality to be erroneously rejected.

Thanks.
Reply
- Charles
  
  August 9, 2017 at 7:52 am
  
  David,
  In general, I would test the data (using QQ plot, Shapiro-Wilk, etc.) and not rely on the CLT.
  Charles
  Reply
Rob

February 10, 2017 at 10:58 am

I am confused. I thought that the residuals must be normally distributed and not the raw data.
Reply
- Charles
  
  February 10, 2017 at 11:05 am
  
  Rob,
  This depends on the specific test that you are trying to use. For the purposes of the normality tests described on the referenced webpage, you can think of the data as being the residuals (if that is what is required for the test you have in mind). Also in some cases the residuals are normal if and on only if the data is normally distributed.
  Charles
  Reply
anil pardeshi

January 2, 2017 at 10:52 am

hi sir..
m coected statistical data in industry bt how can project their and how to use the chek the data normal or log normal and how to analysis their data .
Reply
- Charles
  
  January 2, 2017 at 11:16 am
  
  Sorry, but I don’t undersatdn your question.
  Charles
  Reply
Luis Francisco Marty Matos

December 21, 2016 at 1:43 am

How can I calculate the sample size for linear and multiple regression?
Reply
- Charles
  
  December 21, 2016 at 6:30 am
  
  See https://real-statistics.com/multiple-regression/statistical-power-sample-size-multiple-regression/
  Charles
  Reply
Pramod Desai

September 25, 2016 at 2:44 pm

I had enquired as to how to judge normality from Q-Q plot. Is it visual only or there is some mathematics involved. I did not see any response so I am raising the query again for your kind attention.
Reply
- Charles
  
  September 26, 2016 at 7:20 am
  
  Pramod,
  Visual only.
  Charles
  Reply
Mei

September 8, 2016 at 6:16 am

Can I analyse using independent sample T test for sample size of less than 10?
A sample size of less than 10-It this considered non-normal distribution?
Reply
- Charles
  
  September 8, 2016 at 8:50 am
  
  Mei,
  You can run a t test with samples of less than 10 elements, and in fact the data can still be normally distributed (or symmetric), although you still need to check to see whether the data is at least symmetric. The problem with such a small sample is low power, i.e. a likelihood of a type II error.
  Charles
  Reply
  - Mei
    
    September 8, 2016 at 9:32 am
    
    Dear Charles,
    How do I check whether the data is normally distributed (or symmetrical)?
    
    Thanks
    Reply
    - Charles
      
      September 8, 2016 at 11:16 am
      
      Mei,
      This is described on the referenced webpage. usually the best test for normality is the Shapiro-Wilk test and a good way to check for symmetry is via a Boxplot or Histogram. All of these are described on the website.
      Charles
      Reply
      - Mei
        
        September 9, 2016 at 12:49 pm
        
        Dear Charles,
        How can I analyse data with Shapiro-Wilk test with spss software?
        
        Thanks
      - Charles
        
        September 9, 2016 at 9:21 pm
        
        Mei,
        Sorry, but i don’t use SPSS. This site is about statistical analysis using Excel.
        Charles
    - Wang
      
      May 18, 2017 at 4:34 pm
      
      Google it. Or read a book. There’s so much useful info out there. Learn it by yourself first and then ask questions.
      Reply
Takwa

August 25, 2016 at 10:24 pm

Dear Charles,
can I use a paired t-test when the samples are not normally distributed but their differences are?

thank you for your help
Reply
- Charles
  
  August 25, 2016 at 11:17 pm
  
  Takwa,
  Yes.
  Charles
  Reply
Rosalind Cutler

May 28, 2016 at 4:15 pm

Hello
Thanks so much for your fabulous stats pages. I am at the planning stage for my project and I have a question about what to do in the event that I end up with two very different sized groups of participants. At first they are all going to take the same tests. But I have asked the groups to identify as either Atype or Btype so that I can see if there is a difference in how they performed in the tests according to type. What happens if I have 100 of type A and only 6 of type B?
Is there something I can do? Is is possible to do a comparison with such a discrepancy in size?
Reply
- Charles
  
  May 28, 2016 at 5:47 pm
  
  Rosalind,
  You can perform certain tests with such unbalanced samples size. Generally, though, the statistical power of such tests is more based on the size of the smaller sample, and so the power of the test will likely be very poor.
  Charles
  Reply
kamran

May 14, 2016 at 1:07 am

Thanks for a very useful website.
SigmaPlot software, automatically performs normality test and equal variance test among the samples whenever a parametric test is run. When I am running two way ANOVA on my data, it gives following:
Normality test: failed
Equal variance test: passed

However, when I am testing individual samples separately for normality, all of the samples are passing the normality test. For normality assumptions, is it sufficient, if all the samples are passing normality test separately?
Thanks again
Reply
- Charles
  
  May 14, 2016 at 8:33 am
  
  Kamran,
  Generally you care about the normality of each group, not all the data combined. Of course, the assumptions depend on the specific test you are conducting.
  Charles
  Reply
atul jain

April 18, 2016 at 8:23 pm

Hi Charles;

Thanks for the wonderful website and free resource you have generated for all of us.
I am trying to test normality for a sample size of 3 (only three data points). Which Normality test (if any) should I be using?

Atul
Reply
- Charles
  
  April 19, 2016 at 2:31 pm
  
  Hi Atul,
  In gneral I tend to use Shpairo-Wilk. With only three sample elements you shouldn’t expect too much from any of the tests.
  Charles
  Reply
Mateusz Psurski

April 8, 2016 at 11:09 pm

Hello, I have a question about D’Agostino test. GraphPad Prism v.6 showes that data is normal with p ~0,06, RealStatistic with the same data showes p value 0,312153. Why?
79,72; 136,27; 126,79; 78,46; 108,45; 139,75; 141,54; 129,32; 78,95; 81,30; 153,46
138,15; 116,89; 88,76; 101,43; 128,26; 156,63; 84,41; 122,12; 89,36; 116,83; 96,82
80,96; 136,80; 137,23; 85,54; 87,04; 114,28; 133,52; 104,68; 95,31; 95,24; 151,45
143,14; 70,59; 80,57 ;96,00; 105,72; 124,76; 93,53; 71,00; 140,88; 161,89; 159,13
78,69; 106,94; 73,93; 111,59; 88,73; 21,77; 29,10; 31,66; 32,30; 34,48; 40,00; 47,73
50,25; 53,24; 54,48; 55,48; 58,34; 59,31; 60,34; 61,97; 63,57; 66,54;
Reply
- Charles
  
  April 9, 2016 at 12:24 pm
  
  Hello Mateusz,
  I ran the D’Agostino test and got p = .312153, as you have stated. I also got a result of p = .191604 from the Shapiro-Wilk test. I looked at a QQ-plot and saw that the data looks like a good fit for normality. I don’t know why you are getting such a different result from GraphPad Prism. Finally, I used the online calculator at http://contchart.com/goodness-of-fit.aspx and got results that show that the data is normally distributed.
  Charles
  Reply
  - Mateusz Psurski
    
    April 9, 2016 at 5:43 pm
    
    Hello Charles, thx for the reply,
    
    I also ran SW test on GraphPad and received p value equal .191604, so the same as in your AddIn.
    In GraphPad they used D’Agostino-Pearson omnibus K2 normality test. I’m not sure (I’m not mathematician), but I think in AddIn you applied different version of D’Agostino tests (I know that he (she?) invented several)
    
    Mathew
    Reply
Sage

April 5, 2016 at 12:16 pm

Dear Sir,
I am currently developing a model based on Neural Networks.
Performance analyzes were successfully done but doing graphical residual analysis, I observed the trend to be a bit linear as shown below. while performing residual analysis, I noticed that Percentile Vs residual isnot linear, even Residual Vs predicted is lineary and not randomly distributed.

I tried a lot of method of data transformation method but I did not succeed. When I transform other data set, it works. It is just my dataset which is kind difficult for me. Can you show me how to transform my data to achieve linearity and normality with real stat? Below are my data:
Date Observed Predicted
15-Feb-15 1176.491943 1176.492483
19-Feb-15 1176.48291 1176.483679
20-Feb-15 1176.46582 1176.467308
25-Feb-15 1176.463379 1176.46493
2-Mar-15 1176.452515 1176.454374
7-Mar-15 1176.450439 1176.452346
12-Mar-15 1176.44165 1176.443764
17-Mar-15 1176.435913 1176.437807
22-Mar-15 1176.432251 1176.43359
27-Mar-15 1176.429688 1176.430538
1-Apr-15 1176.428101 1176.428278
6-Apr-15 1176.427002 1176.426561
11-Apr-15 1176.426147 1176.425223
16-Apr-15 1176.425659 1176.424153
21-Apr-15 1176.425293 1176.423277
26-Apr-15 1176.425049 1176.422545
1-May-15 1176.424805 1176.421921
6-May-15 1176.424805 1176.421381
11-May-15 1176.424805 1176.420909
16-May-15 1176.424805 1176.420491
21-May-15 1176.424805 1176.420119
26-May-15 1176.424805 1176.419786
31-May-15 1176.424805 1176.419486
1-Jun-15 1176.424805 1176.419216

Thank you for your time and help
Reply
- Charles
  
  April 5, 2016 at 4:37 pm
  
  Sage,
  Have you tried taking first differences to try to get a stationary time series? Then you can use time series approaches.
  Charles
  Reply
E.W

March 23, 2016 at 10:43 pm

Sir,

I appreciate your time and effort for this useful website. I have a question on selecting a statistical test.

I am trying to assess changing abundance of microbes in oral cavity along time. There are two oral samples: samples with dietary treatment of interest and oral samples with no treatment. I measured the relative abundances of microbes in oral samples at multiple time points: 2,4,6,8 days for samples with treatment and 0, 8 days for samples with no treatment(control). 0 day samples are measured before treatment effect takes place, so it is basically the same for both treated groups and control groups. Each has 3 replicates, thus there are 3 replicates*(0 day for both groups+8day for control+2,4,6,8 day time points for treated groups)= 18 samples total.

Do you have any suggestion for analysing these data? I am thinking about calculating the difference of abundance between each time point and the 0 day samples and then doing statistical tests on these delta values. Samples from different time points are all independent because all samples are measured from different individuals. Thus I am planning to calculate delta between every possible combination of nonzero-day replicates and three different replicates of the 0 day samples. However, I am having difficulty choosing statistical tests.

Thanks!
Reply
- Charles
  
  March 24, 2016 at 4:30 pm
  
  E.W.,
  
  If you have factor A with two levels control vs treatment and factor B with two levels 0 vs 8 days (repeated measure) with 3 replicates each, then you can use a mixed repeated measures ANOVA to do the analysis.
  
  I don’t see how you can integrate the the fact that you don’t have 0 days for A, but instead have 2,4,6 days. I guess you could try to see if you could use some sort of regression to predict a 0 days value, but I don’t know how effective this would be. You could also do separate analyses on the control and treatment samples, but this is probably not what you really want. In any case, with such a small sample, it is not clear what sort of useful result you could get anyway.
  
  Charles
  Reply
Zohreh

February 28, 2016 at 9:41 pm

Salaam sir
My study concerns running correlation. To check the normality of distribution, checking the boxplot, skewness & kurtosis, and 1 sample ks would suffice?
Thanks for your time and concern to run such a useful website.
Reply
Lina

January 22, 2016 at 4:46 am

Sir, I am conducting a study. I have 28 participants. There are pre-test and post-test. One treatment group only. No control group. My plan is, step 1, use QQ plot test to check whether the differences between pre and post follow normal distribution. If yes, I will just use paired-t-test. If no, I will go to use non parametric method. I plan to use “Wilcoxon paired signed rank test”, but this test requires a symmetric distribution population. Is there any test I can use to detect whether the pre-data and post-data is from a same symmetric distribution population please?
Thanks, Lina
Reply
- Charles
  
  January 22, 2016 at 4:25 pm
  
  Lina,
  To use the Wilcoxon paired signed ranks test you don’t need to detect whether the pre-data and post-data come from the same symmetric distribution population. You need to check whether the differences between the pairs z_i = x_i – y_i is symmetric. The best way to do this, is to calculate the differences and look at a box-chart or histogram to see whether the data is reasonably symmetric.
  Charles
  Reply
  - Lina
    
    January 23, 2016 at 12:03 am
    
    Thank you so much!
    Reply
  - Ian
    
    February 1, 2018 at 8:50 pm
    
    Charles,
    I’m dealing with data very similar to Lina’s. If after calculating the differences the data is NOT reasonably symmetric, what should you do then? Do you transform the data set of calculated differences or do you transform both the pre- and post- data before running the Wilcoxon paired signed ranks test? If it is the former, I’ve run into a problem of having to transform a data set with negative, positive, and zero numbers. Andy advice would be very helpful!
    Thanks, Ian
    Reply
    - Charles
      
      February 2, 2018 at 9:24 am
      
      Ian,
      You can use a (1) nonparametric test, e.g. Wilcoxon signed-ranks test, (2) resampling approach or (3) transformation.
      You can transform both data sets or the differences between the data sets.
      If the data set contains negative values and you want to use a square root or log transformation, then let a = the smallest value in the data set and then subtract a-1 from all the data elements in order to make all the data elements positive.
      Charles
      Reply
AG

November 3, 2015 at 1:22 pm

Dear Charles,

You have a very helpful website.

I have a question regarding the normality test. In particular, I would like to interpret the results out of the Shapiro-Wilk test.

Assume the following setup. Given N data points, let P_N be the empirical distribution

P_N ( P (true distribution is normal) >= 1 – epsilon) >= 1-beta

Here, epsilon is the probability that the data are normally distributed, and beta is the confidence level for which the previous statement holds (given the data).

Can I use the Shapiro-Wilk test to compute the values for epsilon and beta?

Alternatively, is there a function, say f, such that we can make the statement
P (true distribution is normal) >= 1 – epsilon,
and epsilon = f(x1,…,xN)?

Best wishes,
AG
Reply
- Charles
  
  November 4, 2015 at 9:46 am
  
  It seems like you want to find a confidence interval for the W statistic in the Shapiro-Wilk test. Sorry, but I don’t know how to do this.
  Charles
  Reply
Obitr

September 15, 2015 at 11:47 pm

sir, should I test for each of my independent variables, and how can I transform a non-normal variable to be normally distributed.
Reply
- Charles
  
  September 17, 2015 at 1:04 pm
  
  This depends on the test that you are trying to perform, but usually when normality is required you need to make sure that the data for each sample is normally distributed.
  
  Regarding transformations, please see the webpage
  Transformations for Symmetry
  Charles
  Reply
Jerome

July 18, 2015 at 5:02 pm

sir, can you answer questions on psychological courses, because am an undergraduate of psychology?
Reply
- Charles
  
  July 18, 2015 at 5:44 pm
  
  Jerome,
  If you questions are about statistics.
  Charles
  Reply
  - Raj
    
    September 23, 2023 at 12:39 pm
    
    Hi All,
    
    Its regarding the variable of change from baseline(CFB).
    
    The questioned it was, I have the two variable i.e pre and post based on the both variable need derived change from baseline.
    
    Please confirm the on which variable i need to performed the normality test.
    
    Thank you.
    Reply
    - Charles
      
      September 25, 2023 at 10:52 am
      
      Hell Raj,
      It depends on what test you are running. If you are conducting a paired t-test (pre vs post), then you need to test the normality of the differences. E.g. if x = pre-test scores, y = the post-test scores, and w = y-x, then you need to test the normality of the w data.
      Charles
      Reply

Topics

References

81 thoughts on “Testing for Normality and Symmetry”

Leave a Comment Cancel reply