Correlation Data Analysis Tool

Real Statistics Data Analysis Tool: The Real Statistics Resource Pack provides the Correlation data analysis tool. This tool calculates the Pearson’s, Spearman’s (rho) and Kendall’s (tau) correlation coefficients, as well as various versions of a one-sample correlation test.

Example 1: Repeat Example 1 of Correlation Testing via the t Test (regarding Pearson’s correlation) using the Correlation data analysis tool.

To use this tool, press Ctrl-m and select Correlation from the menu of choices that appears. Fill in the dialog box that appears as shown in Figure 1 and press the OK button.

Correlation test dialo box

Figure 1 – Correlation data analysis tool dialog box

The output for the two-tailed test is shown on the right side of Figure 2 (starting at cell J3). The output contains the Pearson’s, Spearman’s and Kendall’s correlation coefficients (cells K5, K6 and K7). In addition, both the two-tailed t-test and normal test using the Fisher transformation are displayed where the hypothetical population Pearson’s correlation ρ = 0.

If you want to conduct a one-sample correlation test where ρ is equal to some value that is not zero, you would simply perform that same steps as described above and then change the value in cell N11 to this value. In this case, you would need to ignore the results of the t-test since they would not be accurate.

You can also change the value for alpha and/or the number of tails, and the output values will change automatically.

Pearson's correlation analysis tool

Figure 2 – Correlation data analysis (Pearson’s)

Example 2: Repeat Example 3 of Spearman’s Correlation using the Correlation data analysis tool.

Press Ctrl-m and select Correlation as before, but when the dialog as in Figure 1 appears, select the Spearman’s option. The result is shown in Figure 3.

Spearman's rho analysis tool

Figure 3 – Correlation data analysis (Spearman’s)

We see from the figure that Spearman’s rho is -.67442 (cell N14). Since p-value = .005821 < .05 = α, we are pretty confident that rho is not zero. If you change the values in cells N11 and/or N12 the other values of rho, t-stat and p-value will change automatically.

Example 3: Repeat Example 1 of Kendall’s Tau using the Correlation data analysis tool.

Press Ctrl-m and select Correlation as before, but when the dialog as in Figure 1 appears, select the Kendall’s option. The result is shown in Figure 4.

Kendall's tau analysis tool

Figure 4 – Correlation data analysis (Kendall’s)

We see from the figure that Kendall’s tau is -.52381 (cell Y13) and that we are 95% confident that the actual value is in the interval (-.901, -.147). Since p-value = .006493, we are pretty confident that tau is not zero.

If you change the values in cells Y10 and/or Y11 the values of tau, s.e., etc. will change automatically.

Observation: The ties correction factor is used for the Kendall’s Tau option of the Correlation data analysis tool.

74 thoughts on “Correlation Data Analysis Tool”

  1. Dear Dr. Zaiontz,

    Would it be appropriate to use correlation to check for trends in submerged aquatic vegetation change over time?
    For example: average seagrass density in an area measured each year
    (Year, Average Seagrass Density)
    2002 2.12
    2003 2.11
    2004 3.2
    2005 2.99
    2006 2.78
    2007 2.63
    2008 2.34
    2009 2.38
    2010 2.84
    2011 2.4
    2012 2.43
    2013 2.72
    2014 2.62
    2015 2.72
    2016 2.95
    2017 2.79
    2018 2.81
    2019 2.89
    2020 2.9
    2021 2.7
    2022 2.91

    If so would Pearsons or Kendalls tau-b be better? If I shouldn’t be using correlation here, should I be using a linear regression or a Mann-Kendall test instead?
    I can’t use this data to forecast because it is Braun-Blanquet data and the goal is just to track if there are any changes over time not predict the future values. The hypothesis is there is no trend, the null would be that there is some trend.

    Thank you for your time and your whole website it is very helpful.

    Kate

    Reply
  2. Correlations
    AVERAGE AVERAGE (Criterion Test)
    AVERAGE Pearson Correlation 1 .789**
    Sig. (2-tailed) .000
    N 123 123
    AVERAGE (Criterion Test) Pearson Correlation .789** 1
    Sig. (2-tailed) .000
    N 123 123
    ** Correlation is significant at the 0.01 level (2-tailed).

    How will I interpret this in validity?

    Reply
    • Hannah,
      A correlation of .789 is quite high. A significance of .000 probably means that it is very unlikely that the real population correlation is zero (a significant result). Without more context, I can’t say much more.
      Charles

      Reply
  3. Dear Charles,
    whe do I need to use 1 Tail and when do I need to use 2 Tails in the spearman´s correlation?
    Best regards
    Hannah

    Reply
  4. how to calculate (std err – Upper – Lower), of Pearson Correlation and linear regression manually? step by step in excel

    Reply
  5. Table 4. Relationship between socio-economic characteristics of the respondents and their buying behavior
    SOCIO-ECONOMIC CHARACTERISTICS COEFFICIENCT P-VALUE

    Age
    .301**
    .002

    Civil Status
    .364**
    .000

    **. Correlation is significant at the
    *. Correlation is significant

    Table 5. Relationship between Islam culture and their buying behavior of the respondents
    VARIABLES COEFFICIENCT P-VALUE

    Islam culture and buying behavior
    .775**
    .000

    **. Correlation is significant at the
    *. Correlation is significant
    dicuss for me

    Reply
  6. Hi, i wana know that i wana check the correlation between the management and their practices…..I mean i have 20 practices as a example and i have 4 management parts and i wana check the correlation of those 20 practices comes under which domain of management out of those 4 parts.??

    Reply
  7. please dear sir, how to correct rainfall data from remote sensing or satellite data in excel sheet with ground station data, for example annual rainfall time series data record in ground station and same time get remote sensing data how to correct the remote sensing data according to ground station data

    Reply
      • sorry, i mean correction data, for example
        time series rainfall data which is download from any remote sensing system should de corrected and calibration with ground station rainfall data, anyway you have two type of data, the first one it is original as ground record and the second type it is satellite data, i want correct time series data which is get it from satellite depending on ground station data
        thanks dear sir
        best regard

        Reply
    • Please sir in a situation whereby one’s research topic consist of relationship between four Independent variables and one dependent variable can correlation design be used in such research?

      Reply
  8. Dear Dr. Zaiontz,
    thank you very much for this versatile, efficient, easy to use and free add-on. I am still an amateur and the more I’m using it, the more I understand.
    I am confused, though, by “upper and lower” values: what do they represent?
    Here’s the example: I have determined a correlation between two columns of values, with
    Spearman Index of -0,28
    p value= 0,073
    lower= -0,55
    upper 0,023

    How would you interpret this results?

    Reply
    • Serge,
      The upper and lower refer to the confidence interval for the parameter being estimated (Spearman’s rho in your case). Assuming alpha = .05, the a 95% confidence interval for rho of (.55,.023) would mean that if you repeated the sample experiment a large number of times, you would expect that the true value of rho would occur in this interval 95% of the time. This is a way of looking at the population (true) value of rho as an interval instead of as a single value. It does not mean that the probably that the population rho is in this interval is 95%.
      Charles

      Reply
  9. Dear Charles,
    I have two data sets each consisting of 17 means±S.D.
    How can I calculate coefficient of correlation in this case?
    Many thanks,
    Juozas

    Reply
    • Juozas,
      I don’t completely understand your question, but you can’t calculate the correlation coefficient from just the means and standard deviations.
      Charles

      Reply
  10. Great article Charles. I am working with a series of non-random numbers (stock price changes on a given day). On 30 random days, I have the returns on 10 different stocks. I would like to find out of A, these companies are correlated with each other, and B, which of these companies are the most correlated with each other.

    There is logical data behind why they should be correlated, but I’m not sure that part is relevant to the explanation here. Thank you

    Reply
    • Ross,
      Assuming that “companies” and “stocks” refer to the same thing, I understand that you have data that can be organized into the range A1:J30 as follows: Each column contains the data for one of the 10 stocks and each row contains the data for one of the 30 days.
      You can now create a correlation matrix using the Real Statistics formula =CORR(A1:J30). This will show all the pairwise correlations. The values that are the highest are the most correlated.
      Charles

      Reply
  11. Hi,
    I need to compare correlations of testscores from different test-results of the same Test/differents Tests (same construct) from same/different populations.
    I think I need to to standardize the results via Fishers’ Z? Is there any tool her?

    Reply
  12. Hey Charles wonderful job. But here is one problem that correlation is of 2 types; phenotypic and genotypic. What about calculation of genotypic correlation?

    Reply
    • From a statistical point of view there is only “correlation”. I am not a biologist or geneticist and so don’t have specific knowledge about how this term might be used in those fields.
      Charles

      Reply
  13. Hi Charles,

    I am into random numbers testing for online gambling. I need to perform below tests:-
    1. Serial Correlation Test : The Serial Correlation Test determines whether or not the numbers picked for the current draw in any way determines subsequent plays. We calculate a correlation coefficient r based on pairs of data taken from the same number position on consecutive draws.
    2. Interplay Correlation Test : This test determines whether or not there is any relation between two positions of numbers chosen within the same draw. It is essentially the same analysis that is performed in the Serial Correlation Test, but it is performed on pairs of data taken from different number positions within the same draw rather than between draws.

    I had figured out approach for serial correlation test, but not sure about calculating correlation coefficient for second test. Can you please help ? Below the example of number drawn:-
    A B C D E
    3 1 3 0 2
    0 2 4 1 1
    2 2 4 3 1
    2 2 1 1 0
    2 1 1 0 1
    2 0 3 1 2
    3 2 1 1 2
    1 1 5 0 2

    Reply
  14. Hi, Sir.

    Can I use correlation test (Pearson’s) to check the correlation between two dependent/response variables? Here’s a background of the study: I have 3 factors with 2 levels each then there are about 5 response variables measured after the study but I am interested to know the correlation between the 2 only. I used anova (three fixed factors using realstats addin) to check the significance of the 3 factors. How would I know if this significance means positive or negative? Also, regarding the 2 response variables obtained from the study, can I use correlation test to check correlation between them?

    Ming

    Reply
    • Ming,
      You can use a correlation test to test the correlation between two variables.
      Anova tests whether there is a significant difference between two or more levels in a factor or various interactions. I don’t know what you mean by positive or negative significance. Significance is not directional; there is only significance.
      Charles

      Reply
      • So I can use correlation test between the 2 response variables measured, right? Thanks. “Anova tests whether there is a significant difference between two or more levels in a factor or various interactions.” – Only for the levels in a factor? Can I also conclude from Anova that a factor has a significant effect to the response measured if for example, I get a “yes” in the ‘sig column’ referring to ‘A’ factor in the Anova table. Can I also conclude that there is an interacting effect between two factors if I get a “yes” for ‘AxB’ for the response measured? By positive, I mean if I increase the magnitude of the factor (in case of my study it is metal concentration), the response measured (metal uptake in the study) also increases. In negative, if I decrease the factor, the response also decreases. Does any input in the anova table say something about it? Thanks for the help!

        Reply
        • Ming,
          The answer to your first to questions is yes.
          Regarding your third question, about positive/negative, what do you mean by increasing/decreasing the magnitude of the factor? Does this mean, for example, doubling (or halving) all the sample data values?
          Charles

          Reply
          • For example, the ‘A’ factor is metal concentration with two levels of concentrations 0 and 50, if I increase the concentration from 0 to 50, the response which is metal uptake by the plants also increases. That is if I increase the amount of metal in the water, the amount of metal taken by the plant also increases. Is that what it means if in the anova table it says ‘yes’ in the ‘sig column’ for ‘A’?

  15. Apreciado Dr. Zaionits, reciba un cordial saludo, Dr excuseme una pregunta, si yo hago una tranformación de Box Cox Normal para mis datos, como calculo la inversa de la tranformación para interpretarlos?

    Dear Dr. Zaionits, Yours sincerely, Dr Excuse me a question, if I make a transformation of Box Cox Normal for my data, as I do calculating the inverse of the transformation in order to interpret?

    Reply
  16. Hello,

    I was wondering, if I have a survey and want to see the correlation between gender and political affiliation, can that be done with your instructions? I was thinking of coding the responses, (0=female, 1=male) and doing the same with question two. If I have this coded data in excel, can I run a correlation? Thanks.

    Reply
    • Diana,
      Yes, you can do that. keep in mind that since one of the variables is dichotomous (Male/Female), essentially your test is equivalent to testing whether the mean of question 2 for males is significantly different from that of females (i.f. a t test or Mann-Whitney).
      Charles

      Reply
  17. Hi. I am doing the correlation between no of dengue cases and maximum temperature. Correlation is done by using Spearman, Kendall, and Pearson. My problem is, how to compare which method is best to used? I don’t know what performance evaluation tools should i use. Thank you.

    Reply
    • Generally you should use the Pearson correlation coefficient. Only when you want to see whether there is a significant difference between this coefficient and some hypothetical value would you consider using one of these other coefficients. This would be the case if the test assumptions are not met (e.g. normality). These issues are discussed on the website.
      Charles

      Reply
  18. sir
    is there difference in the correlation value calculated from excel and the values calculated values from statsitistical software , in case in those data which have been applied formula in excel sheet

    Reply
  19. Hi,

    I want to represent results of two workshops using correlation
    i.e. correlation between time spent on writing user stories and accuracy/ decrease in errors related to that would be time spent in workshop 1 and 2,
    Thanks
    Vandana

    Reply
  20. Hi! I want to ask you on what correlation data analysis tool I should use for my study on self-esteem’s relationship with academic performance. I used the Sorensen Self-Esteem Test to evaluate the respondents’ self-esteem and took their GPA’s from last year for the academic performanc. In the Sorensen Self-Esteem Test, you can get four results: Fairly Good, Mild Low, Moderately Low, and Severely Low. What statistical tool should I use to get the correlation coefficient? Thanks.

    Reply
    • Jimmy,
      You can simply use Excel’s CORREL function to calculate the correlation coefficient. The specific correlation data analysis tool to use really depends on what hypothesis you are trying to test.
      Charles

      Reply
  21. Hello Charles,

    If I am analyzing survey results which contain Likert scale answers, is it better to input them as:
    strongly disagree = -2
    disagree = -1
    agree = 1
    strongly agree = 2

    OR

    strongly disagree = 1
    disagree = 2
    agree = 3
    strongly agree = 4

    Also wondering something the same thing for a question that asks how long a call took, with answer choices being Less Time than Expected, About as Much Time as Expected, and More Time than Expected. Less=1, About=2, More=3 OR Less=-1, About=0, More=1?

    Thank you!

    Reply
  22. Pls I have data to analyze using spearman rank correlation but don’t know how to go about.

    Pls help me as how to go about it. Thanks

    Reply
  23. Dear Mr. Zaiontz

    I have a question regarding the Kendall Tau function in the Correlation Data Analysis Tool.

    Above you state the following: “Observation: The ties correction factor is used for the Kendall’s Tau option of the Correlation data analysis tool.”

    Does this “ties correction factor” correspond to Kendalls Tau-b, Kendalls-Tau-c or is it something completely different?

    Thanks in advance.
    N.

    Reply
  24. Hi. If i have two variables and data points on only one of them is available. However the correlation cofficient between the two variables is given. How do i calculate the data points of the second variable?

    Reply
    • You can’t. There are an infinite number of possible values for the second variable that yield the same correlation coefficient.
      Charles

      Reply
  25. i have a problem with converting my original survey results in to statistical analysis. i designed a questionnaire to determine relationship between service quality & promotional effort impact on brand equity.the like-rt scale define results between strongly agree( +2) to strongly disagree(-2).( eg; leasing company should maintain all necessary modern equipment – (agree 3 )which in like-rt scale is 0).how do i convert this results to correlation analysis data.

    Reply

Leave a Comment