Correlation and Association

In this part of the website we explore the concept of correlation and association (especially using Pearson’s correlation coefficient) and how to perform one and two-sample hypothesis testing, especially to determine whether the correlation between populations is zero (in which case the populations are independent) or equal. We briefly explore alternative measures of correlation, including Spearman’s rho and Kendall’s tau, as well as the relationship between the t-test and chi-square test for independence and the correlation between dichotomous variables.

Topics

Links

↑ Miscellaneous

References

Howell, D. C. (2010) Statistical methods for psychology (7^th ed.). Wadsworth, Cengage Learning.
https://labs.la.utexas.edu/gilden/files/2016/05/Statistics-Text.pdf

Siegel, S., Castellan, N. J. (1988) Nonparametric statistics for the behavioral sciences, 2nd ed.
https://psycnet.apa.org/record/1988-97307-000

174 thoughts on “Correlation and Association”

Graham Jervis

June 19, 2019 at 5:56 pm

Hello Charles,

i have two data sets, one is pressure, the other is water level. they are within the same time frame. however, one is every hour and the other is every 10 minutes. can one find the correlation of them as is? or does one need to convert the smaller time step to every hour, then find the correlation?
Reply
- Charles
  
  June 22, 2019 at 11:40 am
  
  Hello Graham,
  You need pairs of data values. You can take the average of the 6 elements in the hour or the first element in the hour or the last element in the hour or something similar.
  Charles
  Reply
muhammad nasir

April 3, 2019 at 11:58 pm

Pagi Bapak
Boleh saya bertanya : bagaimana menguji skor instrumen dari rater berjumlah tiga orang
Reply
- Charles
  
  April 4, 2019 at 8:46 am
  
  I used Google to create the following translation, but I don’t understand the question or what it is referring to.
  Morning Mr.
  May I ask: how is the test score instruments of rater amounted to three people
  Charles
  Reply
Cheryln

December 24, 2018 at 2:50 am

hi.. may I ask, what if the p-value is 0.000 and correlation coefficient is 75.247, how do I interpret this??
Reply
- Charles
  
  December 24, 2018 at 8:52 am
  
  Cheryln,
  I assume by 75.247, you mean 75.247%. This shows a high level of association between the two variables. That p-value = 0.000 means that the population correlation is significantly different from zero. This is not surprising since the correlation is so high.
  Charles
  Reply
GERARDO ARDILA DUARTE

April 16, 2018 at 8:28 pm

Dr. Zaiontz, good afternoon, I am very grateful for your invaluable and favorable contribution to the researchers to be able to give scientific rigor to their work; I would like to know if you have thought about implementing the canonical regression, and when would you publish it?

Thank you very much

Doctor Zaiontz, buenas tardes, estoy muy agradecido con su invaluable y favorable aporte a los investigadores para poder dar rigor científico a sus trabajo; me gustaría saber si ha pensado en implementar la regresión canónica, y cuando la publicaría?

Muchas gracias
Reply
- Charles
  
  April 17, 2018 at 8:13 am
  
  Gerardo,
  Nice to hear from you again and thanks for your continued support.
  I assume that you are referring to canonical correlation analysis. Many of the concepts are already included on the website, but I haven’t explicitly included this topic yet. I plan to add support for this topic. I don’t have a specific timeframe, but it should be added later this year.
  Charles
  Reply
  - GERARDO ARDILA DUARTE
    
    April 17, 2018 at 10:09 pm
    
    Doc. Muchas gracias
    
    Dr. Thank you very much
    Reply
Khaled Rifaat

April 11, 2018 at 10:22 pm

Thanks Charles for your great add in. I could not find any module in the add in to perform a partial correlation analysis. Under either correlation analysis or multiple regression, I get only two ranges of data to input leaving no space for additional independent variables to include in the analysis.
Thanks
Reply
- Charles
  
  April 12, 2018 at 6:06 pm
  
  Khaled,
  You can calculate the partial correlation matrix as described at
  https://real-statistics.com/multiple-regression/multiple-correlation-advanced/
  The Real Statistics array function PCORR calculates the partial correlation matrix.
  Charles
  Reply
James

March 22, 2018 at 12:39 pm

Hi Charles!
How do I correlate mine..? I really need help.
I used Likert Scale (about attitude towards school rules and regulations) and “yes or no” (knowledge about school rules and regulations)
And the attitude questions are negative while the knowledge part are positive…
Reply
- Charles
  
  March 22, 2018 at 1:38 pm
  
  James,
  You can use the CORREL function to calculate the correlation coefficient between two samples. It doesn’t really matter what sort of data you have as long as it is numeric. The specific properties of the data, however, are relevant when you interpret the meaning of the correlation coefficient.
  Charles
  Reply
Himani

March 1, 2018 at 7:02 am

Hello Charles ,
I have one question

A Clinical data consists of different values for Dengue positive and Dengue negative patients that correspond to different clinical symptoms.
For eg. Values corresponding to fever for Dengue positive patients are as follows:
10010011101111100011 and Values corresponding to fever to Dengue Negative patients are as follows: 11001100100001.
How will I find the significant differences between Dengue positive and Dengue Negative patients with respect to Fever .

Hoping for a favorable reply

Thanks
Reply
- Charles
  
  March 1, 2018 at 8:40 am
  
  Himani,
  Are values like 10010011101111100011 for one patient or for a number of patients (one digit for each patient)?
  If for one patient, do the specific digit position convey different meanings?
  Charles
  Reply
fatima

January 31, 2018 at 11:32 pm

Hi i need your help urgently!!

Maybe i’m nauve to ask this but i’m really in trouble in analysing my data collected through a survey :
i have three dependent variables(idea,risk,proactiveness) and each of these three is checked by level of agreement (likert scale 1to 5) to a statement(separate statements for all three).
and i have two independent variables (climate,support) climate is checked by level of agreement to 5 statements and support by level of agreement to 4 statements(both on likert scale) .
Now as i’m a newbie i don’t know how to analyse this data.i want to perform simplest of the techniques.i have to perform correlation analysis and i don’t know how to do it.Kindly help me out any kind of suggestions are welcomed.
Reply
Ilana

January 29, 2018 at 10:30 pm

Amazing website, thank you!

Two questions:

1. I have a sample n=31 with non-normal distributions of both continuous variables (number of hours at which an event took place, and volume). I think I should use Spearman’s rho or Kendall’s tau to test correlation, is that correct? Any reason to choose between them? If I collect more data and n were eg 50 would this make any difference? Also, is there an equivalent of R squared in these tests?

2. I have a sample n=32 with one ordinal variable (number of times a day an action was done) and one continuous variable (volume), neither are normally distributed. Again, I think I can use Spearman’s rho or Kendall’s tau, is that correct?
Reply
- Charles
  
  January 30, 2018 at 8:34 am
  
  Ilana,
  1. You can calculate Pearson’s correlation even if the data are not normally distributed. If you want to test the correlation, then normality is a requirement, in which case you should use Spearman’s rho or Kendall’s tau. See
  https://real-statistics.com/correlation/spearmans-rank-correlation/
  https://real-statistics.com/correlation/kendalls-tau-correlation/
  Kendall’s tau has some advantages over Spearman’s rho (e.g. you can calculate a confidence interval).
  I don’t think raising n = 31 to n = 50 makes any difference in the choice between Kendall’s tau and Spearman’s rho.
  2. Yes, this is correct.
  Charles
  Reply
alfred obligado

October 16, 2017 at 9:12 am

excuse me, i would like to ask the steps in creating a table to picture out if there is significant relationship
Reply
- Charles
  
  October 16, 2017 at 11:15 am
  
  Alfred,
  Click on One Sample Hypothesis Testing on the referenced webpage.
  Charles
  Reply
Anika

May 15, 2017 at 6:08 pm

Hey,
I need to know , is it possible to use correlation between two continuous variables but one scale measure is likert type and the other one is dichotomous type
Reply
- Charles
  
  May 16, 2017 at 8:01 am
  
  Anika,
  Yes.
  Charles
  Reply
Megan Smith

March 5, 2017 at 1:25 pm

Hello I am currently undertaking my dissertation. I am looking at if there is a relationship between organisations changing their employee benefit packages and their financial performance, using their profits from before and after the change has been put in place. Please could you suggest the best measure of analysis would be.
Reply
- Charles
  
  March 5, 2017 at 3:27 pm
  
  Megan,
  You might be able to use a paired t test for this purpose, but the devil is in the details.
  Charles
  Reply
  - Megan Smith
    
    March 5, 2017 at 5:06 pm
    
    I am using 3 organisations which have all changed their employee benefit packages in recent years. I am going to be looking at the profit from the year before the change and the year after to try and see if this change has impacting the profits. For example:
    organisation A- changed their employee package in 2009, so I wanted to look at their profits in 2008 and 2010.
    organisation B- changed their benefits in 2013 so was going to use data for 2012 and 2014.
    Organisation C- changed benefits in 2012 so want to look at 2011 and 2013 profits.
    
    From looking at this data I want to see if there is a correlation between employee benefits changes and the organisation profits. But struggling to find the best analysis tool.
    Thanks
    Reply
    - Charles
      
      March 5, 2017 at 5:17 pm
      
      Megan,
      Paired t test looks like a good choice (or the non-parametric equivalent Signed-ranks), but you only have a sample of size 3, and so you can’t expect much of a result.
      Charles
      Reply
micheal

February 19, 2017 at 6:54 am

Hi charles,
kindly tell me what kind of statistical test i can use on the hypothesis below.
using a likert scale of 1-5 on the questionnaire to collect data.
thanks

H1: Change orders on building construction will lead to changes in the procurement system
• H1A: changing orders in construction project alters procurement options
• H1B: Modifying orders in construction project will cause adjustment to the contracts towards time and cost
• H1C: variation in orders affects all systems in project construction
• H1D: the higher the extent client, contractors and project managers are jointly involved in project planning and initiation the better the outcome of the project towards quality, time and cost
• H1E : choice of procurement system implemented remains the same irrespective of variation and project

H2: The procurement system used in construction will determine the project success
• H2A: the higher the understanding of the ideal choice of procurement system to be used the better the economic and time performance.
• H2B: procurement system used provides opportunity to cut cost and time in construction
• H2C: selecting a suitable procurement system relies heavily on project constraint, cost, time and quality
• H2D: decision makers will improve quality of the project by having a good understanding of the procurement system and selecting the appropriate procurement method
• H2E: procurement systems implemented on any project have a significant impact on cost in pre and post contract phases.

H3: Procurement system changes in construction will lead to changes in project parameters
• H3A: the higher the focus on procurement systems affecting project parameters the better the cost, time and quality performance.
• H3B: the better the collaboration between procurement team and construction team the better the quality, time and cost performance
• H3C: decision makers decide on the choice of procurement systems to be implemented.
• H3D: Project parameters are characteristics and feature that will define a project.
• H3E: utilizing different procurement systems on a project affects project parameters
Reply
- Charles
  
  February 19, 2017 at 10:25 am
  
  Michael,
  You haven’t provided enough detail for me to be able to answer your question.
  Charles
  Reply
Haxel

January 9, 2017 at 4:00 pm

I am working on a research paper on “PERCEPTION OF UNIVERSITY OF KARACHI STUDENTS REGARDING QUALITY OF EDUCATION”, i took student perception as dependent variable and independent variable as quality of education.. i took 5 variables as indicator of quality of education i.e
RESPONSIVENESS
COMPETENCE
SECURITY
TANGIBLE
COST
i generate 5 hypothesis
Ho” responsiveness impacts positively on quality of education
HA: responsiveness impacts negatively on quality of education

Ho: competence impacts positively o quality of education
ha: competence impacts negatively on quality of education

Ho: security impacts positively on quality of education
ha: security impacts negatively on quality of education

ho: tangibles and infrastructure impacts postively on quality of education
Ha :tangibles and infrastructure impacts negatively on quality of education

ho: Cost impacts positively on quality of education
ha:Cost impacts negatively on quality of education

i choose dichotomous scale for questionnaire (YES/NO) .. i am doing this research for the first time .. could you please tell me what tool should be used for accepting or rejecting hypothesis and how should i analyze relationship between student’s perception and quality of education??
Reply
- Charles
  
  January 10, 2017 at 9:29 am
  
  Haxel,
  
  Let’s look at the first hypothesis:
  Ho: responsiveness impacts positively on quality of education
  Ha: responsiveness impacts negatively on quality of education – actually you need to say doesn’t impact positively
  
  If the only responses allowed are 1 for yes (presumably to H0) and 0 for no, then you can use a binomial distribution for hypothesis testing with p = .5, indicating that there is no difference.
  
  Since you are running 5 different tests, you should reduce the alpha value to .05/5 = .01 to take care of experimentwise error. You should also make sure that the sample size is big enough to get reasonable statistical power.
  
  Charles
  Reply
Nathan Nioda

December 30, 2016 at 2:01 pm

Hey Charles!
I’m a 4th year Environmental Science student, currently taking up my undergraduate thesis. My proposed study is about Economic Loss on Tourism and Fishermen due to Marine Debris in Mayo Bay and Pujada Bay, City of Mati, Davao Oriental, Philippines. The objectives of my study are:
1) Determine the economic loss on tourism due to marine debris
2) Determine the economic loss of fishermen due to marine debris
I’ll be using an interview method. My respondents are tourists and fishermen and supposed to have a number of respondents for about 200 for tourists each bay and 100 for fishermen each bay. My concern is, one of my panel recommended that I should use statistical tool on getting the number of my respondents because the number of respondents that I proposed is too big for me as a student. Do you know any statistical tool that can help on solving my problem? Thank you!
Reply
- Charles
  
  December 30, 2016 at 11:43 pm
  
  Nathan,
  I can’t say for sure what your panelist had in mind, but perhaps he/she wanted you to determine what is the minimum sample size that you should use to conduct the statistical test that you had in mind. This is part of Power analysis and is covered on the Real Statistics website.
  Charles
  Reply
  - Nathan Nioda
    
    January 3, 2017 at 8:39 am
    
    Thank you, Charles!
    Where can I locate this power analysis?
    Reply
    - Charles
      
      January 3, 2017 at 10:20 am
      
      Nathan,
      
      If you are using the Real Statistics Resource Pack, then you need to press Ctrl-m and choose the Statistical Power and Sample Size option. See
      https://real-statistics.com/hypothesis-testing/statistical-power/
      https://real-statistics.com/hypothesis-testing/real-statistics-power-data-analysis-tool/
      
      You can also use the G*Power tool that is available free online.
      
      Charles
      You can also use the
      Reply
Chayme Cundiman

September 25, 2016 at 2:18 pm

Hi Charles, i hope i can solicite ideas from you… My study is gender difference in perception of indelity.

My hyphothesis is “there is no difference in gender (male/female) in the perception of infidelity

I am using a 7point likert scale for level of aggreement in “attitudes toward infidelity scale”

What is the best statistical treatment for this study?
Hope to hear feedback from u soon.thank you
Reply
- Charles
  
  September 26, 2016 at 7:19 am
  
  Chayme,
  It really depends on the details of the study, but it seems likely that you will use a two sample t test or if the assumptions of the t test are not met, then probably a Mann-Whitney test (assuming independent samples).
  If you are comparing a husband with his wife, then you would probably use a paired t test or a Wilcoxon Signed Ranks test.
  All these tests are described on the Real Statistics website.
  Charles
  Reply
Takwa

September 15, 2016 at 10:56 pm

Dear Charles,
I posited a hypothesis of association, in which I will be testing if there is a significant relation between vocabulary size test (VST) and success of inferencing. There are two inferencing times. It is worth noting that2 groups followed the same procedure (VST, Incidental time 1 ;incidental Time2) and the only difference is that language of inferencing was L1 for group 1 and L2 for group 2. My questions are:
Should I run the Pearson’s product-moment correlation for both groups and both times or separately ? What about the language effect, should it be studied as an interaction or be held as a control variable ? a related issue is that I have some missing data in both time 1 and time2, is “sum” function a good option for this case?
Reply
- Charles
  
  October 1, 2016 at 3:19 pm
  
  Takwa,
  Which approach to use: Sorry, but I don’t have enough information to answer these questions. Often the approach to use depends on what you are trying to test (here “the devil is in the details”).
  Missing data: Please explain how you would use the sum functions.
  Charles
  Reply
Jenny

September 12, 2016 at 12:23 am

Hi sir charles. I want to measure the awareness and perception of solo parents to their benefits in the government. I’m using Likert scale for this study. I want to know what statistical test should I need to use. Please help.
Reply
- Charles
  
  September 12, 2016 at 6:49 am
  
  Jenny,
  You need to provide more details about the situation and your objective before I am able to answer your question.
  Charles
  Reply
  - Jenny
    
    September 12, 2016 at 10:22 am
    
    – Our study aims to Find out the level of awareness of the solo parents in our town in Republic Act 8972 (Republic Act here in the Philippines or Solo parents Welfare Act of 2000)
    – Determine their perception about the Republic Act 8972
    – and Explain the relationship between the level of awareness and perception of the solo parents to Republic Act 8972.
    
    The are 75 total population of solo parents in our town. What statistical tool should I need to use for this study? Thank you in advance.
    Reply
Farzad

July 28, 2016 at 3:41 pm

Hi dear Charles,
I am trying to investigate the frequency of use and perceived effectiveness of memorization vocabulary learning strategies among Iranian EFL university students. I used two questionnaires, one in which the students say which strategy they frequently use and the other is to check which strategy they think most effective. Now, the question is: how can I calculate the correlation between these two sets of data?
In fact, my problem is in entering the data; how can I get correlation between two sets of data which have likert scale?
I was wondering if YOU could help me?
Thank YOU very much
Reply
- Charles
  
  July 28, 2016 at 7:49 pm
  
  Farzad,
  You simply use Excel’s CORREL function. Suppose you have the following data in range A1:J2
  1 3 5 2 2 5 4 2 1 3
  2 3 4 3 1 4 4 3 2 3
  Then the correlation coefficient can be calculated using the formula CORREL(A1:J1,A2:J2).
  Charles
  Reply
  - Farzad
    
    July 29, 2016 at 11:51 am
    
    Hi dear Charles,
    Thanks for replying me.
    I have done what YOU said, but it works for one item.
    what should I do for the rest of items?
    I have 27 items in each Questionnaire which have Likert Scale.
    Reply
Rowena Dsouza

June 2, 2016 at 8:44 am

Hello Charles,
Our study is about the penetration of core values among the employees for which we have designed a questionnaire with likert scale- Strongly disagree(1) , Disagree(2), Agree (3), Strongly agree(4) for superiors and subordinates commenting about each other. Now we want to analyse the data using statistical tools. Kindly suggest the most apt statistical tools for the same. Our sample size is 50.
Reply
- Charles
  
  June 2, 2016 at 10:37 am
  
  Rowena,
  First you need to decide what hypotheses you want to test. What is the objective of your research_
  Charles
  Reply
dhruv

April 19, 2016 at 10:37 am

Hello Charles,

Your website is really helpful. I have a problem for which i need certain answers.

background: I have to analyze a database to know which is (are) the driving parameter (like Industry, cognitive function etc.)that affects my Y variable most strongly. My Y variable is a continuous variable. I have four parameters and each has about 4-5 categories in them (for example, Industry has: Oil, manufacturing, coal, nuclear, marine).
I want to know few things
1) Which correlation should I use in this case?
2) for each parameter, I will be coding the categories with number like oil= 1, manufacturing= 2,coal= 3,nuclear= 4, marine=5. Will this coding have an effect on the correlation? i.e if I change the order of coding then will the correlation change?
If it does then is there any correlation that I can do which is independent of coding?
3) And lastly, what tests can I use to test the correlation values?

I would really appreciate your help on this.

Best regards,

Dhruv
Reply
Hanis

April 7, 2016 at 9:44 am

Hi Charles,
I have a question for you about my project I am doing. What other analysis that I can use to predict stock price movement other than Correlation Coefficient? I have different factors that may affect the stock price but I don’ know what analysis I should be using for this. But the factors that affect stock price is Consumer Price Index

Please reply me . Thanks (:
Reply
- Charles
  
  April 7, 2016 at 10:12 pm
  
  Hanis,
  Generally, some form of regression is used to address these sorts of issues, more specifically some form of time series analysis.
  Charles
  Reply
Timothy

March 31, 2016 at 7:32 pm

Sir,
I have a problem with my hypothesis testing. My hypothesis is : application of value management at briefing stage significantly minimise challenges involved in developing the client’s brief. I have computed means of problems in clients brief and means of solutions provided by value management; using 4 points likert scale for each. Now I want to test the correlation between the two groups in order to accept or reject the hypothesis. How can I approach this? Secondly, how should I link the MOST appropriate solution to each problem using correlation
Reply
- Charles
  
  March 31, 2016 at 10:28 pm
  
  Timothy,
  I don’t completely understand your null hypothesis, but it is likely that you want to perform one-sample or two sample hypothesis testing of the correlation coefficient. These are described on the following webpages:
  One sample correlation testing
  Two sample correlation testing
  Charles
  Reply
Peter Lynch

March 30, 2016 at 6:10 pm

Hello Charles, I wonder if you can help me decide on which statistic to use.

I have Likert scale (1-5) data (40 questions representing 8 factors) for two types of employees (A and B). I have calculated the means for A and B for each of the 8 factors and scatter plotted the data (A on the X-axis and B on the Y-axis). This has given me a linear plot, with each point on the graph representing one of the 8 factors. I have calculated the correlation coefficient (Pearsons) showing a very strong positive correlation.

I just don’t think this is the correct way to do this, but can’t reason why. I feel I should concentrate on one of the 8 factors only, taking the actual results and plotting accordingly for each one. It seems to me in the approach above, I would have a linear plot for 8 factors which in effect could be used to predict the scores for all factors in any future study provided I had one score only – this seems ridiculous to me.

Can you help me work out where I’m going wrong and what better statistic I could use?

Thanks in advance
Peter
Reply
- Charles
  
  March 30, 2016 at 6:27 pm
  
  Peter,
  Let’s start at the beginning. Before I can help you decide on which statistic to use, I (and you) need to understand what real-world problem you are trying to address or what hypothesis you are trying to test. This drives the selection of statistic to use.
  I wait for further information from you.
  Charles
  Reply
  - Peter Lynch
    
    March 30, 2016 at 8:44 pm
    
    Many thanks Charles
    
    The real world problem is the measure of safety climate in an organisation, comparing two groups of employees. Safety Climate is measured by responses to 40 questions that are grouped to address 8 specific factors (5 questions per factor). Each question is scored by Likert scale choice of 1-5. The questions are established and already validated etc. by safety organisations, so they themselves are good to use.
    
    I have taken the means of the Likert scores for each question (over 100 respondents) and then calculated the means for each specific factor (so 8 results). This I have done for both employee populations, giving a total of 16 results. I have then scatter plotted the results for each factor/population and calculated the correlation coefficient for the plot, which is strongly positive (0.91).
    
    I wish to look at the relationship between safety climate and the perceptions of the two employee populations. I want to see if they are correlated/related in any way. But the measure of safety climate is multi-factorial (8 factors).
    
    Many thanks for any help you can give.
    
    Peter
    Reply
    - Charles
      
      March 31, 2016 at 5:25 pm
      
      Peter,
      You can correlate one factor with many others. This is essentially what you are doing in regression analysis. To calculate this sort of correlation, please look at the following webpage
      Advanced Multiple Correlation.
      Charles
      Reply
      - Peter Lynch
        
        March 31, 2016 at 6:44 pm
        
        Thank you Charles, much appreciated.
Leo

November 23, 2015 at 12:37 pm

Good day Charles,

Here`s the details for my research, I have 4 Ivs and 1 Dv

Under each Iv has several statements under 5 point likert skill.

Now I wanted to run SPSS, Bivariate correlation test and the test run every statements in the Iv to Dv

Can I just group all the statement in the Iv as one and to Dv`s statement as one ?
Reply
- Leo
  
  November 23, 2015 at 12:42 pm
  
  Charles,
  
  Thanks !! I have found the solution.
  
  It has to compute the variables before do the bivariate correlation.
  Reply
  - linta
    
    January 21, 2016 at 7:56 pm
    
    can you please explain?
    Reply
Belle Perez

November 15, 2015 at 2:27 am

Hi Sir!

I would like to ask for advice on which statistical treatment to use on this study. We would like to look at the correlation of local government unit (LGU) assistance and Self concept of indigenous people (IP) teachers. There are 3 subcategories for LGU assistance with 5 questions each, (likert-type scale). The same for self-concept. Is it alright to use pearson or spearman? Im getting confused with likert-type scale data, because I’ve read some study that treated such as interval data. Thanks!
Reply
- Charles
  
  November 17, 2015 at 9:20 pm
  
  Hi Belle,
  If, for example, you have a Likert scale of 1, 2, 3, 4, 5, the real question is whether you can assume that the intervals between the scores are equally spaced. In this case you can treat the data as continuous (although a 7-scale Likert is better than a 5-scale Likert), and so Pearson’s is probably ok. If the intervals between the scores is not equal, then you should probably use Spearman’s rather than Pearson’s.
  Charles
  Reply
david

November 13, 2015 at 9:03 pm

Hello sir, I used two soils for antibiotics uptake studies, I have determined the soil properties and want to run correlation matrix to see if there is any correlation between the soils and the antibiotics. I am not able to do it. Can correlation be done with two data set? which statistical tool can I use to know if there is any relationship between the soils and the uptake of antibiotics
Reply
- Charles
  
  November 15, 2015 at 9:18 am
  
  David,
  
  Just use the CORREL function as described on the webpage
  Basic Concepts of Correlation
  
  If necessary, you can do hypothesis testing as described on the webpage
  Correlation Hypothesis Testing
  
  Charles
  Reply
Abe

October 12, 2015 at 4:15 pm

Hello, I have a problem. I need to validate my 12 item questionnaire with a 7 point likert scale against a 19 item questionnaire with a 6 point liket scale. I know how to run a correlations test between similar scales but not when one is 7 points and the other a 6 point likert scale. How would I do this on SPSS please? Many thanks.
Reply
- Charles
  
  October 12, 2015 at 6:26 pm
  
  Sorry, but I don’t use SPSS. This website is about statistical analysis using Excel. In any case you can use the correlation coefficient even if the scales are different.
  Charles
  Reply
Ashad

July 28, 2015 at 6:52 pm

I have a problem, in my analysis correlation is positive but in t-test null hypothesis is accepted. Have any problem about that….please answer me….
Reply
- Charles
  
  July 29, 2015 at 6:38 am
  
  Ashad,
  There is no problem if the correlation is positive. The important thing is that this value be statistically equal to zero (which is what the t test is designed to test). If the positive value is relatively small, then there shouldn’t be a problem. Just because the null hypothesis is that the population correlation is zero doesn’t mean that the sample correlation will be exactly zero.
  Charles
  Reply
mukhtar

July 19, 2015 at 7:21 pm

I was used my thesis sample linear regression but unfortunate the models of the question Y= ax+ b plz help me if you have any idea?
Reply
- Charles
  
  July 19, 2015 at 9:03 pm
  
  Sorry, but I don’t understand your question.
  Charles
  Reply
Rosa

July 14, 2015 at 10:49 pm

Hello Sir,

I would appreciate for your help.
My problem is that I would like to test the relationship between the ordinal (5 point Likert scale – Strongly agree, agree, neutral, disagree and strongly disagree) and dichotomous (Yes/No question), is it appropriate to use Spearman’s rho test? If not, which test would you suggest?

The hypotheses is finding out whether consumers’ attitude towards have positive relationship with their purchase intention.

Thank you.
Reply
- Charles
  
  July 15, 2015 at 9:03 am
  
  Rosa,
  If I understand what you are trying to do correctly, you can use a correlation test (Pearson’s or Spearman’s) or an equivalent t test (or Mann-Whitney test). This is explained on the webpage Correlation in relationship to t test.
  Reply
alsim

June 25, 2015 at 5:11 pm

Hi sir,

Pls Help me, im using likert scale, and i have 3 variable (2 independent, and 1 dependent). this 2 independent has 9 question each, and 1 dependent has 5 question, with 5 point likert scale. how can i do correlation analysis between them ? if independent max score is 9 Question * 5 =45 and dependent max score = 5 question* 5 = 25, did i need to make those variable have same big score ? like : max score div max dependent = 45/25 = 1.8, so all total score for dependent must multiply by 1.8 for each respondent ?

Thank you
Reply
- Charles
  
  June 25, 2015 at 8:49 pm
  
  On what basis have you decided which variables are dependent and which are independent? Why do you want to do correlation analysis?
  Charles
  Reply
  - alsim
    
    June 29, 2015 at 2:25 pm
    
    base on model that has been using on many research. correlation analysis used to see if theres a correlation between them and whether the independent variables affect the dependent variable. also i want to know how big in percent was the affect
    Reply
    - Charles
      
      June 29, 2015 at 3:23 pm
      
      I am not 100% sure I understand the question, but assuming that you use the total score for the 9 questions for each independent variable and the total score for the 5 questions for the dependent variable (or average score), you can use the multiple correlation coefficient (calculated as described on the Multiple Correlation webpage). Alternatively you can perform a multiple linear regression (see Multiple Regression). You can use R^2 as the effect size.
      Charles
      Reply
      - alsim
        
        June 30, 2015 at 9:02 am
        
        Thank you sir for your help. im new at statistics, so your answere very help me alot, btw sir my i ask 1 more things, what is relation between demographics and regression analysis, what i know demographics used to map respondent, was this map should be used to populate and calculate all matters and put it in regression analysis ? etc i want to make prediction from sample (using questioner) with independent is user satisfaction and dependent is user impact using services. i map demographic all respondent base on age, and my question is should i use that demographics (which mean sorted base on age) and calculate for each variable and do analysis ?
      - Charles
        
        June 30, 2015 at 2:29 pm
        
        I don’t completely understand your question “…should i use that demographics (which mean sorted base on age) and calculate for each variable and do analysis ?” But you can certainly perform regression using demographic data plus the other types of data that you have listed.
        Charles
leizel

June 18, 2015 at 12:38 am

i used likert type scale
Reply
leizel

June 18, 2015 at 12:37 am

hello,
pls help me what statistical test i must use if i want to know the profile of my respondents then if i want to know if there is a significant relationship
Reply
- Charles
  
  June 18, 2015 at 6:52 am
  
  You might use the correlation coefficient, but you need to describe what you are trying to accomplish in more detail.
  Charles
  Reply
John Leung

June 17, 2015 at 8:26 pm

Hi sir,
I would like to know what type of tests (e.g. anova, t-test) will be suitable for the questionnaire.I want to compare both Qn 1 and Qn 2 with a suitable test. What test should I use? Qn1) If the product weights between 500g to 1 kg, would you accept the weight range for this product? Data collected using likert scale: Likely 4 male, 1 female. Neutral 25 male, 7 female. Unlikely 7 male, 4 female. Mostly unlikely 2 male.
Qn2) Would you accept the weight of the product if its is above 1 kg? Likely 3 male. Neutral 23 male, 6 female. Unlikely 9 male, 8 female. Most unlikely 3 male.
Thank you for your help
Reply
Patni

June 6, 2015 at 9:21 am

Hello Sir,
I am glad I find this blog.
I want to analyze questionnaire data about students attitude for a study. I distributed to 50 students questionnaire that consists of 20 questions which then are grouped into 5 categories (variables). The overall cronbach alpha reliability is 0.87. But when analyzed per group, cronbach alpha for variable 1, 2, 3, 4, 5 are 0.61, 0.70, 0.65, 0.81, 0.80 respectively. If I delete two out of 6 questions in variable 1, cronbach alpha becomes 0.73. However, cronbach alpha is not increased if any one of 3 questions in variable 3 is deleted. I have several questions below:
1. how to calculate inter-correlation among items in the questionnaire, so that I have excuse to still use variable 3
2. how to know if the data is normally distributed? should i do it for each question item, or for each student, or for all data? How?
3. if I want to see relationship between variables, do I have to calculate the average score of all questions in the variable so the result becomes score of the variable for each student?

Thank you so much for your help.
Reply
- Charles
  
  June 7, 2015 at 7:01 am
  
  1. I am not sure why you want to do this, but in any case you can look at the Intraclass Correlation webpage to find out how to do this.
  
  2. The webpage Testing for Normality and Symmetry provides a variety of methods for testing whether a data set is normally distributed. You should test the specific data sets for normality based on the requirements of the analysis tool that you are planning to use. Some tests don’t require normality at all.
  
  3. It really depends on what you want to do with this information.
  
  Charles
  Reply
  - Patni
    
    June 7, 2015 at 9:21 am
    
    Thank you for replying my question.
    Actually I want to study students’ attitude towards e-learning. And honestly I do not know if it has normality test requirement.
    
    I distribute 5-point Likert type scale questionnaire containing 20 questions. Then I categorize these questions into 5 variables. Variable 1 (design of website) contains 6 questions, variable 2 (efficacy to use e-learning) contains 3 questions,variable 3 (enjoyment using e-learning) contains 3 questions, Variable 4 (usefulness of e-learning) contains 6 questions, and variable 5 (intention to use e-learning) contains 2 questions.
    I want to calculate correlation between variable 2 and variable 5, variable 3 and variable 5, variable 4 and variable 5.
    
    Because each variable has more than one question and thus more than one response, should I calculate the average response of all questions in each variable, so the result becomes the value for corresponding variable? In order to get use the formula for Pearson product-moment correlation, r?
    
    Many thanks.
    Reply
Lucy

June 5, 2015 at 9:33 am

dear sir… greetings
please i need your help on how to conduct correlation analysis in excel. i have rainfall and water flow data i need to know if there is any relationship between rainfall and water flow.
Many thanks
Reply
- Charles
  
  June 5, 2015 at 10:40 am
  
  Just use the CORREL(R1, R2) where R1 contains the rainfall data elements and R2 contains the corresponding water flow data elements. You can also do hypothesis testing as described on the website.
  Charles
  Reply
anita

May 30, 2015 at 8:46 pm

hello charles,
I introduced a new chart for the nurses to use/practice documentation. After introduction i distributed questionnaire with likert scale type to find out if the new chart was useful and easy to use with few more questions like if it was evidence based practice, also included a question if it added to the burden of nursing documentation etc. it is more like an audit. it was introduced in two different wards where a mixture different years of experienced nurses work. now what type of data analysis should i use please suggest something that i can use with excel spread sheet please. really confused.
it would be great help. thank you.
Reply
- Charles
  
  May 31, 2015 at 9:37 pm
  
  Anita,
  The data analysis tool to use depends on what you are trying to demonstrate, e.g. what hypothesis are you trying to prove or disprove.
  E.g. suppose you want to test whether the responses to the question “is the new chart easy to use” is different for nurses with more than 5 years of experience from those with less than 5 years of experience, then a t test with two independent samples might be the right analysis.
  You need to first decide what you want to analyze. Then you can determine which is the best test to use.
  Charles
  Reply
zanzi

May 26, 2015 at 11:21 am

Hi,
SA means strongly Agree, A means agree, U stands for undecided, D means disagree and SD means strongly disagree. The numbers in the brackets represents the proportion of the sample population with same response choice. Like I wrote earlier the questionnaire had 25 questions in total and was administered to 165 people.
Many thanks.
Reply
- Charles
  
  May 26, 2015 at 3:18 pm
  
  Sorry, but you haven’t really provided enough information for me to give you a definitive answer answer. How many responses in each of the Likert scales doesn’t really help. It looks like you want to perform a correlation test. Why?
  Charles
  Reply
zanzi

May 23, 2015 at 8:19 pm

Many thanks for your reply, OHSMS means Occupational health and Safety Management System, I got my data from questionnaires (containing 4 sections with 25 questions in total) administered to a sample size of 165 . Like I mentioned earlier, I used a Likert scale structure and have summed my responses from each questions to have sets of data in this format SA(69), A(46), U(6),D(16), SD(3). I don’t want to rely only on median and inter quantile analysis.
Reply
- Charles
  
  May 24, 2015 at 7:31 am
  
  What do SA(69), A(46), U(6), D(16), SD(3) represent?
  Charles
  Reply
zanzi

May 22, 2015 at 4:36 pm

Sir,
I would appreciate your help, am carrying out a research on impact of effective OHSMS on work performance. Can I do a correlation analysis on the following data I got from my questionnaire(I used Likert scale) SA(69), A(46), U(6),D(16), SD(3).
Reply
- Charles
  
  May 23, 2015 at 7:28 am
  
  Sorry, but I don’t know what OHSMS stands for and you haven’t provided enough detail for me to answer your question.
  Charles
  Reply
Jamie

May 18, 2015 at 11:10 am

Hi charles,
Just wanna ask u wht method should i use if my research is about determining the awareness of eclampsia among women Age 21 to 45?
Reply
- Charles
  
  May 18, 2015 at 2:25 pm
  
  Jamie,
  You need to to supply more information before I am able to answer your question. In particular, what are you trying to demonstrate?
  Charles
  Reply
Ezin

April 12, 2015 at 2:09 pm

Sir, How to compute correlation of gender to level of awarenes (poor, average and good). Do I need to assign female as 1 and male as 2? I have 100 respondents and 86 answered the gender profile and 4 respondents leave it blank.
Reply
- Charles
  
  April 13, 2015 at 7:15 pm
  
  Ezin,
  Yes, you could code female as 1 and male as 2.
  Charles
  Reply
zach

April 5, 2015 at 6:00 pm

hello…i want to ask a specific method for my case…my objective is to assess relationship between socio demographic of visitors and attitude of visitors…the attitude for visitors used likert scale which from 1 to 5…(1.strongly agree …….5. strongly disagree.) but i do not know how to used my data to do the test…whether i would use correlation or other method…tq
Reply
- Charles
  
  April 6, 2015 at 7:25 am
  
  It really depends on what you mean by “assess relationship”. It sounds like you want the correlation coefficient as described on the referenced webpage.
  Charles
  Reply
Godspower

March 21, 2015 at 8:20 pm

Pls i have a problem on split half test reliability, i don’t know how to compute for the “r” in the formula. 2r/1+r
Reply
- Charles
  
  March 22, 2015 at 10:32 pm
  
  r is the correlation coefficient between the data in the two halves. Once you split the data in half (into ranges R1 and R2) you can use Excel’s CORREL(R1, R2) function to calculate r. See webpage Split Half Methodology for more details.
  Charles
  Reply
Certainty

February 25, 2015 at 5:40 pm

Sir pls which method of analysis and statistical tool will i use to analyze “relationship between parental variables and academic achievement of secondary schools”.
Reply
- Charles
  
  February 25, 2015 at 10:57 pm
  
  Sorry but I would need more information to answer your question.
  Charles
  Reply
Danielle

February 11, 2015 at 5:29 pm

Hello Charles,
I’m having a problem analyzing my data. We polled 5 experts and asked them to rank 6 tests for 82 scenarios. They were asked to rank the tests in order of how likely they were to use that test given a particular scenario. My issue is that one expert gave one test the same rank across all scenarios. When using the correlation function from Excel’s data analysis package, this “constant” gives a #DIV/0! error. I’m trying to see how the experts overall responses correlate. Do they agree for the most part? Is there a different statistical test I can use to find my answer? My statistics skills are not very strong and I’m becoming lost in the details. Any help is greatly appreciated.
Thank you,
Danielle
Reply
- Charles
  
  February 11, 2015 at 5:59 pm
  
  Danielle,
  Yes, the correlation coefficient will be undefined if all the elements in one data set are the same. Generally you can use measures such as Cohen’d kappa, but this too will give disappointing results (or zero no matter what elements are in the other data set).
  Charles
  Reply
  - Danielle
    
    February 11, 2015 at 6:14 pm
    
    Charles,
    I appreciate your quick reply. Do you have a recommendation for analyzing the data in another manner? Or will the results always be disappointing because all of the elements in one array are the same?
    Thank you,
    Danielle
    Reply
    - Charles
      
      February 12, 2015 at 10:41 am
      
      Danielle,
      I don’t have another recommendation for you. I would guess that all the results will be disappointing because all the elements in one sample are the same.
      Charles
      Reply
Natalie

February 9, 2015 at 11:19 am

Hello,
I really hope you can help me solve this problem,

I have calculate the correlation of return between 10 sectors in stocks using excel.
As the results, the correlation between Manufacturing and Miscellaneous sector is around 87%. I want to create a range of correlation between 75%-95% and see how it affect the sectors’ mean and standard deviation. Can I use data table for that? Can you explain how to create the data table?

Please help me.
Thank You
Reply
- Charles
  
  February 9, 2015 at 8:06 pm
  
  Hi Natalie,
  I need more information before I can answer your question.
  Charles
  Reply
Anisah

November 2, 2014 at 5:24 pm

Hi! I really need your help.
I want to know the appropriate statistical analysis used to test my hypotheses.
Here are my hypotheses:
Ho1: There is no significant relationship (independent) between business profile of the SMEs, to the level of awareness on climate change and related business risks.
Ha1: There is a significant relationship (dependent) business profile of the SMEs, and the level of awareness on climate change and related business risks.
Ho2: There is no significant relationship (independent) between the level of awareness on climate change and the related business risk, and the adaptive measures employed by the SMEs.
Ha2: There is a significant relationship (dependent) between the level of awareness on climate change and the related business risk, and the adaptive measures employed by the SMEs.

The content of my questionnaire
I. Business profiles composed of:
Type of Ownership: Sole Proprietorship, Partnership, Corporation
Number of Years operating: 0-10 years, 11-20 years, 21-30 years, 30 years above
Number of employees: 0-10 employees, 11- 50 employees, 51- 250 employees
Initial Capitalization: 0-3,000,000, 3,000,001-15,000,000, 15,000,001-100,000,000

II. Level of Awareness about Climate Change
10 questions answerable by Aware (Rating scale: 1) and Unaware (Rating scale:0)

III.Level of Awareness about Business Risk associated with Climate Change
A total of 21 questions (7 risk: financial, logistics, legal and regulatory, market, people, operational and physical….3 questions each risk)
And still answerable by Aware (Rating scale: 1) and Unaware (Rating scale:0)

IV. Adaptive measure
A total of 21 statements—>adaptive measures (7 aspects: financial, logistics, legal and regulatory, market, people, operational and physical….3 statements each aspect)
Answerable by Adapted (Rating scale: 1) and Not Adapted (Rating scale:0)

Please help me. Thank you.
Reply
- Charles
  
  November 4, 2014 at 8:39 am
  
  Based on a very quick and preliminary review of what you wrote, my first thought is to use Manova. The business profile is the independent variable and Level of awareness and business risk are the dependent variables.
  Charles
  Reply
  - Anisah
    
    November 4, 2014 at 4:41 pm
    
    Thank you, Sir.
    Reply
farah

June 22, 2014 at 1:32 am

hello sir,
i really hope u will help me with this problem

i have 19 questions that use likert scale 1-4 (1 never, 2 rarely, 3sometime,4 always)
between this 19 questions i only choose 6 questions that i can say positive (e.g question 1: do you use seat belt?) to indicate positive practice in driving so do the rest of the question. Moreover this questionnaire doesn’t have total score.

so now, how can i analyze this data?
my research question is
1) there is significant different between good practice and gender
2) there is significant different between good practice and year of driving(1: 1-2 years, 2: 3-4 years, 3: 5-6 years, 4: 7 years above)
Reply
- Charles
  
  June 28, 2014 at 9:12 am
  
  Farah,
  
  For the first research question, I understand that you want to determine whether there is a significant difference between the good practice scores for males and females. The typical test used in this case is two sample t test for independence or the Mann-Whitney test if the data is not normally distributed. See the webpages https://real-statistics.com/students-t-distribution/two-sample-t-test-equal-variances/, https://real-statistics.com/students-t-distribution/two-sample-t-test-uequal-variances/ and https://real-statistics.com/non-parametric-tests/mann-whitney-test/.
  
  This test can also be accomplished using the correlation coefficient as described in the webpage https://real-statistics.com/correlation/dichotomous-variables-t-test/
  
  For the second question you could use one-way ANOVA or chi-square testing of independence. See the webpage https://real-statistics.com/chi-square-and-f-distributions/independence-testing/ for information about independence testing.
  
  This test can also be accomplished using the correlation coefficient as described in the webpage https://real-statistics.com/correlation/dichotomous-variables-chi-square-independence-testing/.
  
  Charles
  Reply
  - farah
    
    July 21, 2014 at 4:44 pm
    
    Thank you very much…
    You help me alot…:)
    Reply

Topics

Links

References

174 thoughts on “Correlation and Association”

Leave a Comment Cancel reply