In this part of the website we explore the concept of correlation and association (especially using Pearson’s correlation coefficient) and how to perform one and two-sample hypothesis testing, especially to determine whether the correlation between populations is zero (in which case the populations are independent) or equal. We briefly explore alternative measures of correlation, including Spearman’s rho and Kendall’s tau, as well as the relationship between the t-test and chi-square test for independence and the correlation between dichotomous variables.
Topics
- Basic Concepts
- Scatter Diagrams
- One Sample Hypothesis Testing
- Two Sample Hypothesis Testing
- Multiple Correlation
- Spearman’s Rank Correlation
- Kendall’s Tau Correlation
- Relationship with t-test
- Relationship with Chi-square Test for Independence
- Resampling for Correlation Testing
- Biserial Correlation
- Real Statistics Correlation Data Analysis Tool
- Box-Cox Transformation
- Tetrachoric and Polychoric Correlation
- Ordered Chi-square Test of Independence
- Lambda Measure of Asymmetric Association
- Gamma Measure of Symmetric Association
- Somers’ d Measure of Asymmetric Association
References
Howell, D. C. (2010) Statistical methods for psychology (7th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf
Siegel, S., Castellan, N. J. (1988) Nonparametric statistics for the behavioral sciences, 2nd ed.
https://psycnet.apa.org/record/1988-97307-000
Hello Charles,
I am trying to determine the association between disease stage in three ordered categories and severity of surgical complications that is also in three ordered categories. The Ordered Independence Test Tool calculates a “non-ordered” Chi-sqare test with p = 0,047 , an Ordered Analysis with p = 0,039, and Kendall’s correlation with p = 0,051.
How should this result be interpreted?
If I am not mistaken, as the Kendall’s correlation is sensitive to the order of categories, I would expect it to elicit dependencies in the frequencies that the Chi-square test does not “see”, i.e. to provide lower p=values than the Chi-square test, and here it is the opposite?
Thank you for your time and effort!
Your website is really useful!
Hello Svetoslav,
With ordered data, you are probably better off using the ordered chi-square test of independence rather than the non-ordered version. If you want to determine whether the correlation is zero, you can use Kendall’s correlation test. Both results are pretty close to p = .05, which is the usual cutoff, and so the test is quite marginal in either case.
Charles
Hi Charles,
Thank you for this, though, I have already determined the relationship between the time-points of the protein samples and did the same for the miRNA samples. But what I am really interested in is to see if there’s a correlation (or any similar test that shows relatedness/association) between all the values (top values in previous comment) of protein Vs those at the bottom (miRNA levels). As I mentioned before, values at the top and bottom are from different samples. I just think there is some relationship because when I plot the values on a scatter plot, there appears to be one. That’s why I wanted to know if there’s any alternative way to do a correlation if my observations aren’t paired? Basically, I need stats to back up the relationship when placing all 900TMZ protein values (4-36h) on the y-axis and all the 900TMZ miRNA values (4-36h) on the z-axis. Note 900TMZ just means 900 units of temozolomide
Thank you for your time and patience.
Hartwig
Hi Harwig,
I am sorry, but I don’t know how to show that this sort of association exists.
Charles
Hi Charles,
Thank you for all your advice. It is greatly appreciated. Perhaps I can try something different. I have two questions:
1) Can I correlate protein levels with drug dose used (obviously this would mean the X-axis has fixed values)? In this case there shouldn’t be a problem of ‘relatedness’ between variables because the dose and the protein level is inextricably linked. I thought to use Spearman as the data is likely to be monotonic.
2) Can I correlate protein levels with time post-exposure (all of which are either 4, 18, or 36 h)? As with the case above, protein and sampling time (time post-exposure). I was just not sure if I can use time post-exposure as the x-axis variable considering that they are fixed values i.e. they can only be 4, 18, or 36 h. I would also like to do Spearman here for the same reasons as above, but also, there are some extreme values.
Thank you for your advice and time, hopefully this is an easier question than the previous ones :).
Cheers
Hartwig
Perhaps I can even use Kendall’s tau as this might be more appropriate to handle ties that result from the fact that using 4 h, 18 h, and 36 h as my one variable, each time-point will have multiple measurements?
Hello Hartwig,
Kendall’s tau may indeed be a better choice than Spearman’s correlation, but the real question is how you will use any of the correlation metrics
Charles
Hello Hartwig,
1) If you have the protein level and drug dose for each subject, then you can calculate the correlation. Why do you think that the data won’t be monotonic and so not normally distributed?
2) You can use fixed values such as time as one of the variables when calculating correlation. In fact, this has a special name when there are only two fixed values, namely the point biserial correlation. See https://www.real-statistics.com/correlation/dichotomous-variables-t-test/. The real question here is why you would want to calculate this sort of correlation.
Charles
Hi Charles,
Thank you kindly for your response.
I think I’ll go with Kendall’s tau since I have 3 fixed time points (so point biserial correlation probably won’t work) and I suspect there will be many ties. I think the data will be monotonic and not linear because it appears that way when I plot time and protein levels on a scatter plot. Ultimately I just want to see if there’s a significant association (and the direction thereof) between time and protein level. And likewise dose and protein level. This is basically my alternative approach to my previous problem of correlating miRNA Vs protein (which I couldn’t do, as you might recall, because samples were not paired). So now, I’ll do individual correlations with Kendall’s tau for time Vs protein and then time Vs miRNA (both after exposure to the same drug) and if the direction of association is opposite to each other, I will ‘loosely’ speculate that protein levels might be driving the miRNA levels. I think this is probably the best I’ll be able to do given the data. What do you think?
Cheers
Hartwig
Hartwig,
Good luck with your research.
Charles
Hi Charles,
Apologies for the confusion. No, I’m referring to my question on 2 Sept. This is the message:
Hi Charles,
Some context as to why I want to use averages. The variables I’m correlating is only linked in the sense that they are the same cell line and underwent the same drug treatment. Thus, as a consequence, when having 3 repeats, there is no meaningful way to pair the variables i.e. drug 1 cells repeat 1, 2, 3 (measuring miRNA) Vs drug 1 cells 1, 2, 3 (measuring protein). Because the protein and miRNA didn’t result from the same experiment, I can’t see a way to pair the variables (e.g. 1,2,3 Vs 1,2,3 or any variation thereof with the caveat that each time the correlation coefficient changes). Note, the experiments is over 3 time-points. So my thoughts was to ‘force’ a paired relationship that would be consistent. Thus, I could average drug 1 miRNA repeats (1,2,3) for respective time-points and do the same with drug 1 protein repeats (1,2,3), and thus doing away with the variable correlations that I get when e.g. individual data points were used but instead of ordering 1,2,3 data arranged as 3,2,1 has a different correlation. This is not ideal or really reproducible, hence my suggestion for using an average. Does this make any sense? Do you have any advice? I’m at my wit’s end with this.
Thank you for your help.
Hello Harwig,
If you have multiple samples from the same subject, you can take the average and use this as your measurement.
You seem to want to use correlation, although from all your responses it doesn’t seem like correlation is a fit for your situation. In any case, perhaps it is better for me to start at the beginning and ask what hypothesis you want to test.
Charles
Hi Charles,
You are correct, unfortunately they aren’t multiple responses from the same sample, instead they are biological replicates. I want to test whether a specific miRNA is related to the expression of a specific protein. This wasn’t our original plan, we just thought of it after seeing how the data started to take shape. That’s why we didn’t think to sample miRNA and protein from the same samples. I am convinced there is a relationship between miRNA and protein because as miRNA went down, the protein went up. Now I’m just looking for a statistical test to show this, within the confines of our statistical limitations (i.e. Mirna and protein not coming from the exact same sample).
Thank you for your help.
Cheers
Hartwig
Can you give me an example of data that showed that “as miRNA went down, the protein went up”?
Charles
Hi Charles,
Of course:
This is the protein levels. 900TMZ is the dose followed by time time post-exposure. Each row is a biological repeat.
900TMZ 4h 900TMZ 18h 900TMZ 36h
1.877 2.687 3.839
3.169 2.557 3.797
3.229 2.371 9.443
2.481 2.755 6.997
2.415 2.936 9.574
2.450 2.117 4.232
This is the miRNA levels each line being a biological replicate but independent from the above protein experiments.
900TMZ 4h 900TMZ 18h 900TMZ 36h
2.131 0.576 0.424
1.330 0.783 0.254
0.949 0.915 0.018
0.456 0.748 0.057
0.723 0.247 0.412
1.316 0.945 0.303
So what I did what line them up (protein vs miRNA) in order 4 h – 36 h into two columns, and when I look at the patters it appears there is a decrease in miRNA vs protein over time.
Cheers
Hartwig
Hi Hartwig,
While I don’t understand how you obtained the figures
900TMZ 4h 900TMZ 18h 900TMZ 36h
2.131 0.576 0.424
1.330 0.783 0.254
0.949 0.915 0.018
0.456 0.748 0.057
0.723 0.247 0.412
1.316 0.945 0.303
I did conduct a Repeated Measures ANOVA on the data and obtained a significant result with p = .01 (using a GG correction). Follow-up analysis using Tukey HSD showed that 4h vs 36h was significant (p = .004), while 4h vs 18h and 18h vs 36h were not significant, although all three comparisons had very large Cohen’s d effect sizes (2.5 for the significant result and 1.2 for the other two comparsons).
Charles
Hi Charles,
Can I average individual data points before doing spearman correlation? For example, if my X variable is 1, 2, 3 (each representing a biological replicate) and my Y variable is (4,5,6) (each representing a biological replicate), can I do a spearman correlation on X = 2 Vs Y=5?
Cheers
Hartwig
Hello Hartwig,
Probably not, but I suggest that you try using the specified process to calculate spearman’s correlation and see whether this value agrees with the approach that you are suggesting.
Charles
Hi Charles,
I’m not sure I understand what you mean?
Cheers
Hartwig
Hi Charles,
Some context as to why I want to use averages. The variables I’m correlating is only linked in the sense that they are the same cell line and underwent the same drug treatment. Thus, as a consequence, when having 3 repeats, there is no meaningful way to pair the variables i.e. drug 1 cells repeat 1, 2, 3 (measuring miRNA) Vs drug 1 cells 1, 2, 3 (measuring protein). Because the protein and miRNA didn’t result from the same experiment, I can’t see a way to pair the variables (e.g. 1,2,3 Vs 1,2,3 or any variation thereof with the caveat that each time the correlation coefficient changes). Note, the experiments is over 3 time-points. So my thoughts was to ‘force’ a paired relationship that would be consistent. Thus, I could average drug 1 miRNA repeats (1,2,3) for respective time-points and do the same with drug 1 protein repeats (1,2,3), and thus doing away with the variable correlations that I get when e.g. individual data points were used but instead of ordering 1,2,3 data arranged as 3,2,1 has a different correlation. This is not ideal or really reproducible, hence my suggestion for using an average. Does this make any sense? Do you have any advice? I’m at my wit’s end with this.
Thank you for your help.
Hi Charles,
I haven’t yet heard back from you so just wondered if you’ve had a moment to consider my question?
Cheers
Hartwig
Hi Hartwig,
If you are referring to your comment from June, I responded the same day. Here is the response.
Charles
Hi Hartwig,
Spearman’s correlation is based on paired data. You can calculate Spearman’s correlation for any two sets with the same number of elements, but it won’t have any meaning unless the elements from the two sets can be paired in a meaningful way. If not you can scramble the order of one or both of the sets and get a completely different correlation value.
The pairs don’t need to be from the same subject, but they need to reflect some relationship. E.g. Depression score before and after therapy. Here the subjects for both members of the pair are the same. The pair could, however, be the Depression score for a husband and his wife. Here the subjects for both members of the pair are different, but related.
Charles
Hi Charles,
Just a quick question, when doing a Spearman correlation let’s say p53 protein levels vs miRNA levels in TK6 cells, do those variables (p53 and miRNA) have to come from the exact same cell sample? Or can I do an experiment, collect protein, then another experiment and collect miRNA and then correlate the two? The experiments will be done using the same cell type and they are of the same lineage, but they will physically be different samples. However, they will be the same number of samples and treated under the same conditions. I just ask because one of the Spearman assumptions is that samples should be paired. Ultimately, I just want to know whether there is a correlation between the protein (p53) and miRNA levels in TK6 cells.
Any advice will be greatly appreciated.
Kind regards,
Hartwig
Hi Harwig,
Spearman’s correlation is based on paired data. You can calculate Spearman’s correlation for any two sets with the same number of elements, but it won’t have any meaning unless the elements from the two sets can be paired in a meaningful way. If not you can scramble the order of one or both of the sets and get a completely different correlation value.
The pairs don’t need to be from the same subject, but they need to reflect some relationship. E.g. Depression score before and after therapy. Here the subjects for both members of the pair are the same. The pair could, however, be the Depression score for a husband and his wife. Here the subjects for both members of the pair are different, but related.
Charles
Hi Charles,
Thank you so much for the information.
Based on what you said, the correlation I want to do should be appropriate? The only thing that I am somewhat uncertain of is whether the relationship between my datasets are ‘sufficient’ – this is based on your comment ‘The pairs don’t need to be from the same subject, but they need to reflect some relationship’. So as it stands, I have two datasets – in one experiment I collected protein levels 4 h, 18 h, and 36 h after ‘high’ dose treatment. In the other experiment I collected miRNA levels following the same treatment and times. Each time-point had the same number of data points. I then just lined up all the data from the 1st (protein) experiment in one column in order 4 h, 18 h, 36 h then did the same for the 2nd (miRNA) experiment and performed the correlation. Does this seem appropriate?
Thank you for your help:)
Kind regards
Hartwig
Hi Hartwig,
If I understand correctly, the relationship is protein level vs. mRNA level. I further understand that these are measured at the same time and using the same treatment. So far, so good. But now I need to understand the link between the protein level and mRNA level.
If for example, I measure the protein level on monkey A but mRNA level on monkey B, then things fall apart unless there is some relationship between monkey A and monkey B (e.g. one is the mother of the other).
Charles
Hi Charles,
Just to clarify, the experiments used the same time-frame (i.e. 4, 8, 36 h) (just in case it came across that I sampled experiment 1 and 2 at the same time, I didn’t, I simply meant both experiments were treated the same and sampled the same (i.e. at 4, 18, 36 h) :). In case its relevant, I measured miRNA (microRNA) not mRNA. Anyway, to your actual question…well, all these cells are from the same lineage (i.e. we bought a single vial and cultured them into multiple vials). Each one of these vials are then used for a single biological repeat of an experiment. So in this case, I guess you could say they are siblings? So ultimately, I extracted miRNA from one experiment and protein from another, using cells from the same lineage. The reason I want to correlate miRNA with protein level is because I suspect the protein is what drives the observed miRNA response (I know correlation is not causation, but its a useful starting point for my argument).
Is this good enough for Spearman correlation? If not, do you have any alternative to what I can do?
Cheers
Hartwig
Since there is a familiar relationship, correlation would make sense.
Depending on what you want to test, you could use Pearson’s correlation. Spearman’s correlation is used for data that is not normally distributed since it uses the ranks of the raw data and the ranks are normally distributed.
Charles
Or at the very least distant relatives 🙂 but ultimately had the same stock tube origin.
Hi Charles,
Apologies, it occurred to me that I just assumed you are familiar with cell culture (when working with it for so long I often forget its not everyones field). So to clarify my experimental model – I use human cells (TK6). These cells are purchased as one stock that we then grow into a massive batch (kind of like offspring of the original tube). Each one of the resulting ‘offspring’ tubes are then used for 1 biological repeat of an experiment. So if n=3, we use 3 tubes. When the batch of offspring tubes starts to run low, we culture another batch from the first offspring batch and so on and so forth. So if we put this is terms of human pedigree, it would be (e.g.) great-grandparents-grandparents-parents-children (depending on how many times the original tube was sub-cultured). That’s why I think the samples are related because they all came from the same original stock tube, they may just be at different generations (e.g. grandparents vs children) as the protein and miRNA experiments were done quite a while apart.
Apologies if this is over-explaining it, just thought you might need/want some context in case you were unfamiliar with the process :).
Kind regards
Hartwig
Hi Charles,
Suppose we want to study the effects of a drug on the blood pressure and heart rate of individuals.
The drug is thought to reduce blood pressure and heart rate. We got a control group (not using the drug) and a treated group using the drug. So from each individual, we obtained 2 values for the blood pressure and heart rate (2 continuous responses).
1. Now if we would like to perform a correlation test for the two response variables, would it be ideal to test the correlation between (the blood pressure) and (heart rate) of each of the groups (control & treatment) separately OR to test the correlation between (control group’s blood pressure minus treated group’s blood pressure) and (control group’s heart rate minus treated group’s heart rate)? What is the difference?
2. Should I use MANOVA or Fit General Linear Model?? Because I don’t know if my response variables are correlated or not. Should my responses be statistically correlated or just related to each other based on the conclusions of scientific papers?
Thanks in advance.
Nafis
Hello Nafis,
Before looking at the approach tests to use, please tell me the hypothesis (or hypotheses) you are trying to test. Why are you interested in correlation in the first place? I am interested in understanding the clinical objective without using statistical terminology.
Charles
Hi,
I was wondering why you have used correlation in “corr A” sheet of “Real-Statistics-Examples-Correlation-Reliability” for life expectancy and the number of cigarettes smoked instead of regression!!
Does life expectancy affect the number of smoked cigarettes??!!
I thought when only one variable can affect another variable, we would use regression!
I would appreciate any guidance on choosing between regression and correlation AND covariance.
Thanks,
Nafis Akbari
Hi Nafis,
Correlation and Regression are clearly related subjects.
If there are two variables x and y, then the correlation between x and y is also shown in the report on the regression of y on x.
Note that if x and y are correlated it does not mean that x affects y or y affects x.
If there are three variables, then
Which to use, correlation or regression depends on what your objective is. With the risk of oversimplifying, if you are looking to make predictions then regression is a good choice; otherwise, it is probably correlation that you want to use, although regression will give the same results.
Charles
Thanks a lot, Prof.Zaiontz.
Dear Dr.Zaiontz,
I have 4 independent sets of data (one control and 3 treatments) for a group and another 4 independent sets (one control and 3 treatments) for another independent group. There are 16 replications per set, and there is no specific order for the replications in each set (values in each set DOES NOT relate to values in the other 3 sets, two by two). I then transformed the data into logs since they are large numbers and are usually reported in logs. Then I identified outliers from each set and deleted them since I didn’t want to include outliers in my study. Now I got unequal sample sizes (14, 15, or 16 samples in each set).
1. I’d like to perform a one-way ANOVA and Tukey for every 4 sets of data of the 2 groups. Is there any probable problem with that? Any suggestions?
2. I want to do a correlation test (Pearson’s) to see whether and how the reduction or increase of the values compared to CONTROL in group 1 relates to that of group 2. So I have to subtract each value of each of the 3 treatments from values of control, right?? And do the subtraction between variables two by two, right?? Then I got 2 issues: the order of replications is independent within each set, and I got different sample sizes. helppp
Please suggest any other tests if you see will fit my purpose or correct me if I’m wrong.
I would appreciate any help with this.
Sincerely yours,
Nafis Akbari
hello Nafis,
1. There is no problem performing one-factor Anova with Tukey HSD follow-up even with unequal samples. I don’t quite understand why you have decided to use one-way Anova when you have two factors.
2. Sorry, but I don’t understand your question
Charles
Hi Prof.Zaiontz,
First, I would apologize for my poor English, which caused my grouping information unclear.
So I would correct it; I have 4 different populations of chickens (one of them is my control group and 3 treated groups (their diets treated) ). I sample 16 individuals from each population. From each individual, I obtained 2 values (one value as bacterial enumeration of the intestine and the other value as the same type of bacteria’s enumeration of meat). But unfortunately, we have lost the information on which of the two pairs of values are obtained from which individual. So I got 16 values for each intestine and meat enumeration of each of my 4 groups (8 sets of data, 16 values each, 4 sets for intestine and 4 sets for meat). I transformed data into logs. Then I deleted outliers from each set, leaving me unequal sample sizes (I, fortunately, got the raw data).
1. I want to see if there is a significant difference between 4 groups of intestinal bacterial enumeration and ALSO make the same comparison with 4 groups of meat bacterial enumeration separately. Then perform a post-hoc test. (values are between 2 and 9 logs) Which tests do you recommend?
2. In our study, I want to determine if decrease or increase of the intestinal bacteria in each of the 3 treated groups in comparison to the control group has any correlations (if there is, then how much it correlates?) and anything to do with the same bacteria’s increase or decrease in meat samples. In fact, I want to see how much the treated groups (diets) could reduce or increase the meat bacterial population by reducing or increasing the intestinal bacterial population. (A part of meat’s bacterial population is caused by cross-contamination between intestinal contents and meat of animals in slaughterhouses.)
Which tests would you recommend?
3. I believe your contribution to my research as a statistician coauthor will much improve the study. I’m looking forward to hearing from you.
Best regards,
Nafis Akbari
Hello Nafis,
1. Normally you could use MANOVA, but since you have lost the connections between the intestinal and meat data, the only thing that I can think of is to perform two separate ANOVAs, one for intestinal and one for meat. Since you have two tests, you should use an experimentwise correction. E.g. a Bonferroni correction would mean that you should use an alpha value of .05/2 = .025 for each ANOVA. Since the data for both ANOVAs are likely to be correlated, this value is probably a bit low.
2. I see that you want to “determine if decrease or increase of the intestinal bacteria in each of the 3 treated groups in comparison to the control group has any correlations…” How to do this depends on what sort of data you have that can be used to investigate this.
Charles
i have monthly time series data of consumption. what steps i need to do future forecast. I mean using excel. which tests are involved and which model is suitable. please guide
Abbas,
There are lots of approaches and so it depends on the nature of your data. See
Time Series
Charles
Hi Charles ,
I have time series data ( Forces, speed , current ) from machine during performing operation on metal pieces to straight them and i want to compare data for each piece with other to find the anomaly during operation to detect the defect . Also compare with best possible pattern to perform the operation with same settings to maintain the high reliability.
Can you suggest some method
Thanks
I would need more details to really be able to answer your question, but first of all, what criteria (i.e. metrics) do you use to defect an anomaly or defect?
Charles
Hi Charles, I am doing a comparative study of the quality of life in breast cancer patients in two different health situations: free of ilness patients and metastasic patients.
I have administered several questionnaires in order to obtain its quality of life, and now I don’t know how to follow and what analyses to do. My intention is to see if there are differences in the quality of life of the two groups of patients depending on the health situation.
I really hope you can help me,
Thanks very much,
Emma
Emma,
The usual approach is to perform a two independent sample t test (provided the assumptions are met). See
Two independent sample t test
Charles
Hi there,
I hope you can help me, I am doing my final university project and now I have to compare my results from different questionnaires that I have administered.
In the first step, I want to look for correlations between the results of a questionnaire that give a score of emotional regulation with another score from another test of emotional intelligence in sport. So I can show that if you have more emotional regulation it is normal to have more emotional intelligence. Can I do this with coef. Pearson? And if I want a graphic, is it possible to make one?
Sorry for my english, and thank you
Hello Monica,
If you are trying to see whether there is a correlation, then Pearson’s correlation seems appropriate. See the following webpage to create a graphic representation:
https://real-statistics.com/correlation/scatter-diagrams/
Charles
Hi, I need help regarding calculating a correlation between multiple varibles.
So, I have the following:
ACCEPTANCE (7 statements – 5 points Likert scale)
PERCEPTION (8 statements – 5 points Likert scale)
PURCHASE INTENTION (8 statements – 5 points Likert scale)
LOYALTY (6 statements – 5 points Likert scale)
So, for example, my question is related to calculating the correlation between acceptance and perception. So, how do I “merge” all results from acceptance’s sentences (there are 7 statements) into one per participant?
Thank you in advance.
Iva
Hello Iva,
Suppose that you have two variables x and y and 10 subjects each of whom provides one value for x and one value for y. If these values are inserted in the range A1:B10, then =CORREL(A1:A10,B1:B10) would yield the correlation coefficient.
Now if instead of one value per x and one value for y, you have 7 for x and 8 for y, then you need to decide how to combine these values so that the correlation will make sense. This is not really a statistical question (at least at first). You first need to decide why you are combining the values and what you plan to do with the result. E.g. in many standardized tests, you simply add up the the x values to create one x value and do the same for the y values.
Charles
Thanks very much. I appreciate your help.
Joel
Hello, Sir.
In my experiment I have 2 dependent variable (TOT & MET) with 2k factorial design that cause 4 condition. I want to know is there any correlation between TOT & MET. Should I do correlation test for each condition or just do it all in one test? If I have to do the test for each condition, 3 of 4 condition said that there’re no statistically significant correlation between TOT & MET, how should I conclude for the whole 4 condition?
Thankyou
Hi Charles,
I would appreciate your advice on the most appropriate statistical test for the following data. I’d like to correlate the subjective grade for severity of a disease, graded on an ordinal scale of 1-5, with an objective measure of the severity of the disease, namely visual acuity, graded on a continuous objective scale of -0.3 (best possible acuity/visual function) to +2.0 worst possible acuity/visual function).
Which statistical test would you recommend
Thanks very much.
Joel
Hello Joel,
What hypothesis are you trying to test?
Can’t you use a correlation coefficient to accomplish what you are looking for?
Charles
I’m trying to test the hypothesis that the subjective grading of disease severity correlates positively with the objective measure of disease severity.. The problem, I believe, is that the subjective scale, which ranges from 1-5, is not continuous.. I.e. subjects are (subjectively) graded as an integer from 1-5. Can I use a correlation coefficient for that data? Thanks very much.
Thanks very much.
Joel
Hello Joel,
Yes, you can use the correlation coefficient in this case as long as you accept that the difference between any of the adjacent scores 1 through 5 are equal; e.g. the difference between 4 and 3 is the same as the difference between 2 and 1. This may seem obvious, but is not always the case when 1 to 5 represents a Likert scale. Here we are treating the scores as if they were continuous.
Charles