Basic Concepts
As we can see throughout this website, most of the statistical tests we perform are based on a set of assumptions. When these assumptions are violated the results of the analysis can be misleading or completely erroneous.
Typical assumptions are:
- Normality: Data have a normal distribution (or at least is symmetric)
- Homogeneity of variances: Data from multiple groups have the same variance
- Linearity: Data have a linear relationship
- Independence: Data are independent
We explore in detail what it means for data to be normally distributed in Normal Distribution, but in general, it means that the graph of the data has the shape of a bell curve. Such data is symmetric around its mean and has a kurtosis equal to zero. In Testing for Normality and Symmetry we provide tests to determine whether data meet this assumption.
Observations
Some tests (e.g. ANOVA) require that the groups of data being studied have the same variance. In Homogeneity of Variances we provide some tests for determining whether groups of data have the same variance.
Some tests (e.g. Regression) require that there be a linear correlation between the dependent and independent variables. Generally, linearity can be tested graphically using scatter diagrams or via other techniques explored in Correlation, Regression, and Multiple Regression.
We touch on the notion of independence in Definition 3 of Basic Probability Concepts. In general, data are independent when there is no correlation between them (see Correlation). Many tests require that data be randomly sampled with each data element selected independently of data previously selected. E.g. if we measure the monthly weight of 10 people over 5 months, these 50 observations are not independent since repeated measurements from the same people are not independent. Also, the IQ of 20 married couples doesn’t constitute 40 independent observations.
Almost all of the most commonly used statistical tests rely on the adherence to some distribution function (such as the normal distribution). Such tests are called parametric tests. Sometimes when one of the key assumptions of such a test is violated, a non-parametric test can be used instead. Such tests don’t rely on a specific probability distribution function (see Non-parametric Tests).
Another approach for addressing problems with assumptions is by transforming the data (see Transformations).
References
Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf
Zar. J. H. (2010) Biostatistical analysis 5th Ed. Pearson
https://bayesmath.com/wp-content/uploads/2021/05/Jerrold-H.-Zar-Biostatistical-Analysis-5th-Edition-Prentice-Hall-2009.pdf
Hi,
I am trying to learn and understand statistics by trying to find a step-by-step advise on how one should start with data analysis, but I could not find a blog nor a tutorial discussing it straightforward.
Could you please tell me, right after, say I have my data, should the first step be normality testing to know whether I would perform parametric or non-parametric tests, or should I should choose first the statistical test that I need to do based on the question that I would want to be answered by my data and then do the normality testing or assumptions tests?
Hope you could give me guidance on this.
Miko,
The first step is to define your objective. This may consist of the hypotheses that you are trying to test or the events you are trying to predict. Then you determine how to accomplish this. This may include identifying the tests or analyses you need to run (and what assumptions need to be satisfied) and what data you need to collect (and how much).
Charles
Hi Charles,
Thank you for the reply.
After doing a lot of reading and researching, somehow, I have managed to put direction to what I am doing. Also this site of yours really helped me a lot in understanding statistics more. Just some clarifications about normality test although you already mentioned the typical assumptions:
Is normality test confined to continuous data only? I mean “continuous data can be normal or non-normal.
Is discrete data automatically considered as non-normal and requires non-parametric statistical analysis?
Thanks heaps!
Charles, you’re the real MVP! One of the most helpful sites I’ve found!
Thanks 🙂
I want to compare means for 3 age groups, I have the raw data for two groups while the third is unavailable, however, I have the means for all the dependant variables (25 variable) in the 3 age groups. The sample for one age group is more than 175 participants while the samples for the other two groups are 50 participants in each group, third, the first age group samples contained a collapsed mean of male and female participants while gender variable is separated in the other two groups.
Would this data still be comparable, if NOT, what are the violations?
If yes, what would be the best statistical model?
Firas,
You seem to be missing information about the variance for the third group. THis will make it difficult to perform the usual analyses.
Charles
Hi Charles,
What’s your take on this info regarding CDC’s statistical models which? Do you think the estimated numbers (of flu death) are generally reasonable?
CDC officials do not have exact counts of how many people die from flu each year. Flu is so common that not all flu cases are reported, and flu is not always listed on death certificates. So the CDC uses statistical models, which are periodically revised, to make estimates. Fatal complications from the flu can include pneumonia, stroke and heart attack.
Here’s the link to the article from which this text was extracted:
https://www.statnews.com/2018/09/26/cdc-us-flu-deaths-winter/
Hello David,
Since they can’t count all the cases, they are forced to extrapolate the number of deaths in some way. Whether or not the approaches they use are valid, I am unable to say.
Charles
What causes behind the violation of the assumptions of parametric tests
If the assumptions are violated then the test may not be valid: e.g. the resulting p-value may not be correct.
Charles
Hi
For my report I have conducted a paired t test to look at differences between 2 methods of measuring body composition. My lecturer says they expect more than just normal distribution to be acknowledged in my results section. Confusing as I know equal variance does not come into it for paired t test. How do I acknowledge independence and linearity? Thanks in advance
Hello Ryan,
Linearity is not one of the assumptions. The four assumptions for this test are shown at
https://statistics.laerd.com/spss-tutorials/dependent-t-test-using-spss-statistics.php
Charles
Hi Charles, My name is Tom and I am a second year student revising for my stats test, I am wondering if you could explain the 4 steps of the parametric assumptions of data analysis, randomness, level of data, normal distribution and homogeneity of variance. Furthermore, could you explain to me in basic terms what an ANOVA and MANOVA are used for and the other types of them, Kind Regards,
Tom
Hello Tom,
These are all described elsewhere on the Real Statistics website. For example, see the following webpages:
Normal Distribution:
https://real-statistics.com/normal-distribution/
https://real-statistics.com/tests-normality-and-symmetry/
Homogeneity of Variances
https://real-statistics.com/one-way-analysis-of-variance-anova/homogeneity-variances/
ANOVA
https://real-statistics.com/anova/
MANOVA
https://real-statistics.com/multivariate-statistics/multivariate-analysis-of-variance-manova/
Charles
Hi, my name is Julia l like all the links this is great. Statistics is new to me so all the links and info will hello me greatly
. I hope I am doing this correctly, this is a response to your discussion post. Keep sharing.
Julia
Hi Charles,
How can I contact you via email? I need your assistance in few things regarding the quantitative data analysis I’m currently conducting.
Thank you
See Contact Us
Charles
Hi Charles,
I am an ScD student working on a statistics project.
For citing purposes, do have the date this page was created and updated (if applicable)?
Kind regards.
Jen,
Generally you would use the date when you referenced the webpage. See also
Citation.
Charles
Thank you for your prompt response. I’m sorry I missed that reference yesterday!
Kind regards.
Where shall we report the assuptions of the tests? in the method section or as an introduction in the result section
Probably in the results section, but this depends on the type of article you are writing.
Charles
Hi, I’m Amirah.
I want to ask about quasi-experimental design. In this design, random sampling is frequently ignored since we are dealing with intact groups. How would we satisfy the randomization assumption before proceeding with parametric tests?
Amirah,
If I understand you correctly, you are asking me how to satisfy the randomization assumption when randomization wasn’t used. On its face, that seems to be quite impossible, but…
In practice, however, most designs are not completely randomized, and yet the full randomization assumption is overlooked. While it is possible to split a sample into two or more groups randomly, it is usually practically impossible to create a truly random sample. You can only do the best you can. The question you need to ask yourself is how much impact will the shortcomings of the randomization will have on your results. Often this means how much bias is introduced, especially regarding confounding variables — i.e. the variables (characteristics) that you are not studying but may inadvertently impact your results; ideally you want the sample to have confounding factors similar to those found in the generally population under study.
Charles
Hello,
My name is Fatin. about my research, i’ve run normality test and homogeneity test. the results show it failed to follow the assumption for both test. My question, can i proceed with Welch ANOVA and Games howell test ? and ignore the assumption of normality ? since many tests are much more sensitive to violations of homogeneity of variances than violation of normality ?
Fatin,
While it is true that these sorts of tests are more sensitive to homogeneity of variances than normality, normality is still a requirement. How good the results are may be dependent on how far from normality the data is. There is no simple answer. E.g. if the problem with normality is the presence of outliers, then you might run the tests without outliers (making sure to explain this when presenting the results). Another approach is to use a transformation to get the data to be more normally distributed. You could try to use a non-parametric test, again making sure to report the results correctly. Yet another approach is to use bootstrapping, which is not dependent on these sort of assumptions.
Charles
hi, what are the assumptions about the data that underlie both parametric and non-parametric tests?
May,
The actual assumptions depend on the specific test being used. In general, there are fewer assumptions for a non-parametric test than a similar parametric test.
Charles
Hello,
Thank you for developing this website. I’ve been taught that the assumption of normal distribution applied to the *errors*, not the underlying data. Is this incorrect?
Shelby,
This depends on the test being used.
Charles
Hi i am a student and i need your help regarding following.
Parametric testing based on t-test requires three assumptions:
1. Assumption of normality
2. Homogeneity of variance
3. Data independence
These are required so that the sampling distribution of t follows the theoretical t-distribution with the corresponding degree of freedom. The goal is to verify the role of first two assumptions. To that end, obtain the sampling distribution of t in four cases, and analyze the role of the said assumptions: (a) normal samples with similar variances, (b) non-normal samples with similar variances, (c) normal samples with very different variances, (d) non-normal samples with very different variances.
Your analysis should provide answers for the following:
1. is sample normality or population normality required for t-test?
2. is homogeneity of variance necessary?
3. does your answer to previous question depend on whether the sample sizes are equal or not?
4. what are the implications if one performs t-test and one or both assumptions are violated. (hint: observe the experimental and theoretical t-distribution and see how the deviations between the two affects the decision about null hypothesis)
Dhruv,
1. The t test is reasonably robust to violations of normality, but fails if the data is too far from normality.
2. There is a workaround in case this assumption is violated
3. You can use the paired t test if this assumption fails.
See the following webpage for more details https://real-statistics.com/students-t-distribution
Charles
Â
Hi am a master student. I wanna ask the following questions pls help me.
1. List statistical assumptions when analyzing data using test statistics, discuss each of them and indicate how they are related.
2. In the study related to malnutrition among under-five children, protein supplement was provided to 120 under weight children. Baseline weight was taken for each before the supplement began; and six months after the supplement was provided the second weight was taken. Therefore we have paired data for each child. It is required to determine whether the supplement helped to increase the weight of children. There are parametric and non-parametric methods often used for such studies. Your responsibility is thus: i) identify such methods (whether we discussed in class or not) and describe how they can be used for such study, ii) discuss how the methods are related or differ.
3. In standard anova used for comparison of a groups, the model can be written in two ways:
a. Yij = μ + ai + eij
b. Yij = a’i + eij
Where i indicate ith group, j indicate jth patient
i. Show how total variation is being disaggregated in both cases (and construct anova table for each of them)
ii. Is ai and a’i the same?
iii. Discuss similarity and differences between the two models
Getachew,
Sorry, but I don’t have enough time to answer homework assignments. In any case, question 1 is addressed on this webpage and also at
https://real-statistics.com/students-t-distribution/one-sample-t-test/
The parametric test for Question 2 is the paired t test and the nonparametric test is Wilcoxon’s Signed Ranks test. Both of these are described on the Real Statistics website.
Charles
Dear Charles
I have a dependent variable (Scores received from a multiple-choice test/computerized reading comprehension test), and two independent variables (received from two Likert-Scale questionnaires/Computer Familiarity Scale and Attitudes towards Computer Scale). I want to probe the effect OR relationship of two independent variables with the dependent one. I’ve reached the conclusion that the Linear Regression may be the best statistical test to examine my research hypothesis. What’s your idea? is it the most appropriate test to run? If yes, which assumptions should be met for running Linear Regression statistical test?
Somebody says that just two Normality and Durbin-Watson are the only required assumption that must be met, and this this the accepted procedure for statisticians. But I read somewhere that 8 assumptions should be met, and the most important ones are Linear relationship
Multivariate normality
No or little multicollinearity
No auto-correlation
Homoscedasticity.
Would you please tell me am I required to meet all the assumptions for conducting Linear Regression test or Not?
Warm Regards
Which Assumption is very important Normality or homogeneity?
I main if I have data (Scale) and assumption of normality not valid but assumption of homogeneity valid I can analyze this data
and if assumption of normality valid and assumption of homogeneity not valid I can analyze this data
Ahmed,
The specific assumptions, depend on the specific test, but many tests are much more sensitive to violations of homogeneity of variances than violation of normality.
Charles
Dear Charles
Thanks a bunch for your constructive and practical comments. I see that I already found the most appropriate solution to my problem based on your directions. you were so helpful, thanks. Would you mind if I ask for some more information? you know, finding proper statistical test for one’s methodology looks like a puzzle, you should put all the pieces together to attain the whole puzzle. Anyway, I’ll do really appreciate your help if you every step of running exact statistical test based on my design of study.
one testing group takes the paper-based version of a test. after some weeks interval, it takes the computer-based version of the test with item review option. after some weeks interval the testing group takes the same computer-based version of the test without item review possibility.
By comparing the scores obtained from the first and second testing sessions, I want to examine the effect of testing administration mode (paper or computer) as my independent variable on the testing scores (performance). In this phase, age and gender are examined (the performance difference of males and females on both versions). By comparing the scores obtained from the second and third testing sessions (computer-based test with item review and computer-based test without item review), I want to examine the effect of item review on the performance of test takers on computer-based test. Besides, three different questionnaire were distributed to the test takers of the testing groups to examine the correlation of computer familiarity, computer attitude and testing mode preference with the testing performance (the scores from the first computer-based testing).
Now, I think you reached to a clearer mental image of my research design and methodology. As you said, I think, the most suitable test to is the ANOVA with repeated measures to examine the effect of administration mode, age, and gender on the testing performance (by comparing the paper-based test and the first computer-based test scores) as well as the effect of item review on computer-based testing performance ( by comparing the first and second computer-based tests).
If yes, please give me more details on how to meet the required assumptions underlying this kind of test (such as sphericity correction you mentioned in your explanation) and how to run this test to examine the effect of different moderator variables on testing performances.
Then, what kind of statistical test may I use to investigate the correlation of computer familiarity, attitude and testing mode preference with the first computer-based testing performance?! I mean the relationship of some independent variables, moderator factors or variables with one or two dependent variables.
and the last point, You believe that I’ m not supposed to check the normality and homogeneity? They are not required at all ?! OR……. I may ignore them to ease the difficult or unpleasant statistical situation, although it’s better to do them to strengthen my research results?
Anyway, Richards, thanks a million in advance for your time and expertise. I’m looking forward to hearing from you ASAP.
Jigaretooooo dadaaaaash, bede biad etelaato
Truly Yours
It seems like you have a mix of fixed between subject factors (Age and Gender) and within subject factors (the various test and test environments). This is precisely the situation described on the webpage:
One between subjects factor and one within subjects factor
This webpage explains how to correct for sphericity since this is easier than testing for sphericity.
The webpage deals with one between subjects factor (Age or Gender), but not both. Dealing with both at the same time probably makes the analysis complicated. I believe there are other tools that handle this case, but Real Statistics doesn’t. You can combine the two factors into one (e.g. Male-Young, Male-Middle, Male-Older, Female-Young, Female-Middle, Female-Older, in which case Real Statistics will handle it.
Note that you don’t need to perform all the analysis manually. Instead you can use the Real Statistics software tool.
You do need to test for normality. Although these tests are pretty robust to violations of strict normality, you still should make sure that the data doesn’t depart greatly from normality (especially due to the presence of outliers).
Charles
I should confess that your last explanation made me more confused
Sorry about that. I suggest that you read the webpages about Repeated Measures Anova.
https://real-statistics.com/anova-repeated-measures/one-within-subjects-factor/
Charles
Then, to run statistical tests, four assumptions are required to be met. How can I test normality distribution and homogeneity of variances when I have just one group, although I obtained three score sets from three testing sessions for this testing group?
Thanks
Morteza,
This looks to be a fit for repeated measures ANOVA. You need to check for normality for each of the three levels. The requirement for homogeneity of variances (between the levels) is not sufficient. Instead you need to satisfy the stronger assumption of sphericity. Fortunately, generally you can ignore this assumption and simply use a sphericity correction. See the following webpage for details:
ANOVA with Repeated Measures
Charles
Hello
would you please help me to find an appropriate statistical test for my methodology based on my research design?
Testing group/ Pre-test/X1 (Administration Mode)/Post-test1/X7 (Item Review)Post-test2
X2 (Age)
X3 (Gender)
X4 (Mode Preference)
X5 (Computer Familiarity)
X6 (Computer Attitude)
I have one testing group that have to take three tests in three testing sessions. Meanwhile, I want to examine the effects of three testing administration mode, age and gender effect on their testing performance (it is possible by comparing the scores of pre and post tests). then I examine the effect of item review variable on their performance by comparing the scores of first post test and second post test. it should be mentioned that paper-based test is administered to the participants in the first testing session. the computer based test with item review is implemented in the second testing session (first post test). and the computer test without item review is administered in the third testing session (second post test).
is it right to use Two-way ANOVA or repeated- measures……?!
I’m looking forward to your reply as soon as possible
Warm Regards
If the same subjects are taking the three tests, you should use repeated measures ANOVA. If you have different groups of users, then you want to use two factor repeated measures ANOVA (one within subject factor and one between subject factor). See
One between subjects factor and one within subjects factor.
Charles
Hello Mr. Charles
My name is Ibrahim:
My advisor says that i should produce a comprehensive and precise guide for a layman’s on the topic ‘Parametric Statistic’ but my problem is on how design the content. Kindly, help me on the it. Thank you in advance
Ibrahim,
This is a huge topic which is covered throughout the website. I am happy to answer specific questions, but your question is too broad for me to provide a suitable answer in a brief space.
Charles
Thank you
Dear Mr. Zaiontz,
thank you for providing such a great website. I am currently taking a statistics course and I often wonder about the applicability of statistical inference tools to the real world considering the assumptions that are required. As far as I understand, the assumptions gurantee the validity of the conclusions since when the assumptions are met the validity of the conlusions can be proved matehmatically. But in a mathematical proof I use clear cut logic which assumes that those assumptions are perfectly met which however in the real world probably is rarely if ever the case. So my question is whether most statistical inference tools such as hypothesis tests or confidence intervals still give appropriate conclusions even if for example an underlying normality assumption is not met perfectly but at least approximately or if my data is not perfectly symmetric (as for example assumed in the Wilcoxon Rank Test)?
Sebastian,
Glad you like the website.
You pose a great question. Statistics is not just a mathematical discipline, but it is supposed to provide practical application to real-world problems. In general, statistical tests are reasonably robust to small departures from the assumptions. Robust means that if you are testing whether say the p-value < .05, the test really tests for this (and not that a type I error of .05 should really be .08). Also some assumptions are more sensitive than other assumptions. E.g. ANOVA requires that the data be normally distributed and the variances of all the groups be equal. The test is quite robust to violations of the first assumption. Even when the data are not so normally distributed (especially if the data is reasonably symmetric), the test gives the correct results. ANOVA is much more sensitive to violations of the second assumption, especially when the group sizes are different. If you have very different group sizes, you probably want to use a different test. But even here, if the group sizes are the same and the largest group variance is no more than 3 or 4 times the smallest group variance, then the test is likely to be quite reliable. Charles
Thank you for the fast response
HELLO Mr Charles
Am a 200L student of Anchor University Lagos, Nigeria and i was given an assignment with a first time experience in this course and i was asked that what are the things to do to your data if the parametric assumptions is not met?
My delight,
Tomi.
Tomi,
The two main approaches are:
1. Use a data transformation
2. Use a non-parametric test instead
The Real Statistics website describes both approaches.
Charles
Hello Charles Zaiontz,
Firstly, thank you for this educative and enlightening post.
Please, if my data should violate the four statistical assumption (with particular interest in normality assumption). What are the ready at my disposal?
Best regard,
Emmanuel
Emmanuel,
It depends on what hypothesis you are trying to test.
Charles
pls can u represent these assumptions in grath for me?
You need to look at the assumptions for the specific tests.
Charles
Good day. I am researching on this topic ‘ Home and School factors as determinants of Secondary School Students enrollment in Financial Accounting. The School variables are: 1. school type ( Public and Private), 2. school location ( Urban and Rural), 3. teachers qualifications, 4. teachers’ methods of teaching.
The Home variables are: 1. Parents’ occupations, 2. Parent’s educational background, 3. Parents’ aspirations.
The dependent variable is enrollment in Financial Accounting.
Please what statistical analysis instrument can i use to analyse my data. Please help me because i have been having serious headache on this. Please its urgent sir.
Thanks alot
Sorry, but you haven’t provided the type of information necessary for me to answer your question.
What hypothesis are you trying to test?
Charles
thanks so much.
however, my dilemma is on the issue of skewness. any more info about it?
See
https://real-statistics.com/tests-normality-and-symmetry/analysis-skewness-kurtosis/
https://real-statistics.com/tests-normality-and-symmetry/statistical-tests-normality-symmetry/dagostino-pearson-test/
Charles
Which are the assumptions of Non-parametric tests ?
Aumi,
It depends on the nonparametric test, but usually there are fewer assumptions than for a corresponding parametric test.
Charles
I am trying to understand the true meaning behind Kurtosis ? Can you define and explain its overall purposes for layman like me, please, thanks.
Kurtosis is simply concentration of data around the mean:
1. Leptokurtic : Data is more clustered around the mean, kurtosis value is large positive, standard deviation (deviation of values from the mean) is low.
2. Platykurtic : Data is uniformly distributed about the mean.
3. Mesokurtic : Data is normally distributed but doesn’t mean it’s a standard normal distribution, standard deviation is high.
Good day, my name is Akeem, I am testing for normality and independency in a multivariate data. but I am confused about the test that must be done before the other. my question is should normality test come before the independent test?
Hello,
I’m a PhD student and I want to analysis my results. I have 5 independent factors (each one has three levels) and one dependent. I want to select a suitable statistical analysis. I check only the normality and it showed a normal distribution. I fell confused from the number of tests. Could you please helm me in that?
Many Thanks,
Anwer,
You need to determine what sort of hypothesis you want to test before you can decide what is the suitable statistical analysis.
Charles
Charles,
Thanks for reply.
I want to see the relationship between 5 independents ( 4numerical and 1 categorical ) and dependent value and find the optimum values. in other words, the effects of parameters on output and which one the most significant.
Thanks,
Anwer
Anwer,
This sounds like a regression-type scenario. I suggest that you start by looking at the Regression part of the website.
Charles
hi,
i’m a student and doing a research on the relationship between communication factors and job satisfaction among PB staff.
my sample size is 56 because of the population are very small.
the normality test i’ve done is not normal.
my question is if i used non parametric, does it mean i don’t have to analyze the hypotheses test, correlation, regression analysis (where parametric usually analyze) ?
thank you 🙂
Hani,
Two observations:
1. Just because data isn’t normal doesn’t necessarily mean that you can’t use a parametric test. It usually depends on how far from normal the data is. You can sometimes apply a transformation which makes the data normal.
2. Nonparametric tests can often perform very similar analyses as parametric tests; it depends on the type of analysis you want to perform.
Charles
so, if i used 1-sample k-s test for normal distribution.
i can still continue the other analysis using parametric test, isn’t it? but it depends on how far from the normal data?
Hani,
It also depends on what other analyses you want to do.
Charles
Hi, i’m doing a lab report right now, and for my data they meet two of the three assumptions for a parametric test such as an ANOVA or linear regression?
The data is normally distributed and there’s independent data. However, there’s no equal variance. The levene’s test gave a significance value of 0.039.
So can i still use an ANOVA or regression, and if so, how do i justify this?
Thank you
Katy,
If the homogeneity of variance assumption is not met, Welch’s ANOVA is a commonly used substitute.
For linear regression you can use robust standard errors.
Both of these approaches are covered on the website and are included in the Real Statistics Resource Pack.
Charles
Hi
I have some data (x axis represnts fouling resistance and y axis represents organics) and I simply made a correlation using excel between x and y axis. Reviewer wanted to know what assumption was made regarding the normality of data distribution. Can you please give an answer?
Kind regards
Biplob
You don’t need to assume normality to calculate a correlation coefficient. Depending on which statistical test you use to may need the normality assumption when you test whether this correlation is significantly different from zero. See the following webpage for details
Correlation
Charles
pls am Woking on Immunological assessment of Hiv and Hepatitis B in pregnant women, pls wot kind of assumption and statistical study I wl employ. is my research a retrospective, prospective or cross section. What statistical analysis am expected to use .ANOVA, t test, z test, correlation or regression
Thanks in Advance.
Sorry, but it is not possible for me to answer your question without more details.
Charles
Does all of these test have an assumption of independence?
T test
Paired t test
CRD ANOVA
Mario,
For the two sample t test or CRD ANOVA, the group samples must be independently drawn
For the paired t test, the pairs of observations are independent, but clearly each observation in the pair is not independent of the other observation in the pair.
For
what are the there statistical assumptions made about the population when testing a hypothesis?
It depends on the test.
Charles
hello i am sahibzadi from pakistan
kindly tell me when we say that observation should be independent in parametric test then is it possible in repeated measure t test
For the paired / repeated measures t test, the pairs of observations are independent, but clearly each observation in the pair is not independent of the other observation in the pair.
Charles
state the assumptions for testing the difference between two means .If those assumptions are met or not met what test are use in Multivarient data anaylysis
plz ans this question……………….
You can find this information by looking at the webpages on the t test. If the assumptions are not met, then the usual substitutes are the Mann-Whitney and Wilcoxon Signed Ranks tests (or occasionally the Signed test). These tests are also described on the website. Enter the approach test in the Search box.
Charles
I am not sure if a variable is creating an endogeneity bias in a regression. I collected the residuals from the estimated regression and there is no correlation between the potential endogenous variable and the errors. Is this an adequate test?
Jerry,
This seems like a reasonable approach to me. Having said that, I know that this issue has been studied and other tests such as Hausman’s Test can be used as well as instrumental variables. The following is a paper which maybe useful to you.
www-2.dc.uba.ar/alio/io/pdf/claio98/paper-12.pdf
Charles
Hi Charles,
If your researching 2 ways of working by comparing 2 factors (say costs and duration) with each other from data of 80+ projects (half being projects done by the new way of working, half done by traditional way), should you use z-test, or always add ANOVA and pearson/spearman to the analysis?
Thank you in advance!
Rick,
If you want to take the interaction of cost and duration into account, you should probably use ANOVA. If the interaction is not important then two t tests seems to be a reasonable way to go. In either case, you need to make sure that you satisfy the assumptions for that test.
Charles
Thank you!
Assumptions of the following statistic or statistical tool:
Classify whether parametric or non-parametric.
• z-test of mean difference
• t-test of mean difference
• z-test of correlated means
• t-test of correlated means
• Pearson Product-Moment correlation Coefficient
• Spearman Rank Correlation Coefficient(rho)
• Chi-square goodness-of-fit
• Chi-square of Independence
One Way ANOVA(Analysis of Variance
The first 4 are parametric. The 5th is not a test, but the usual tests are parametric. The next 3 are non-parametric and the last is considered to be parametric.
Charles
hi..i just wanna ask u. Is it right to test for significant difference or (parametric test) in convenience samples?
thanks in advance 🙂
You can use all the usual statistical tests with convenience sample, but you should be cautious about your conclusions since the nature of the sampling technique introduces all sorts of biases in comparison to random sampling.
Charles
hello 🙂
i am master student from malaysia.
my advisor asked me to include the assumptions in my thesis.
can you help me which chapter should i include the assumptions?
is it under the research methodology or is it under findings?
thanks in advance 🙂
Probably under research methodology, but this depends on the organization of your thesis.
Charles
Are there any other statistical assumptions to be aware of?
Pete,
I have listed the principal types of assumptions for statistical tests on the referenced webpage. Not all tests use all these assumptions. Other assumptions are made for certain tests (e.g. sphericity for repeated measures ANOVA and equal covariance for MANOVA). For each test covered in the website you will find a list of assumptions for that test.
Charles
what do assumption mean in statistic? what do they provide?
Soniya,
Many statistical tests give valid results only when certain assumptions are met. E.g. the data must be normally distributed or the variances of the data are equal.
Charles
Hello
My name Bahram from Iran. now, I am a ph.D student in watershed management in Malaysia.
about my thesis, my supervisory committee have a question:
– Explain the reason for using ANOVA, do you the data collected meet parametric statistical assumptions?
Thank you
Hello Bahram,
The reason for using ANOVA is given on the webpage https://real-statistics.com/one-way-analysis-of-variance-anova/
The assumptions for ANOVA are given on the webpage https://real-statistics.com/one-way-analysis-of-variance-anova/assumptions-anova/
Charles