Spearman’s Rank Correlation

Basic Concepts

When data is not normally distributed or when the presence of outliers gives a distorted picture of the association between two random variables, Spearman’s rank correlation is a non-parametric test that can be used instead of the Pearson’s correlation coefficient.

On this webpage, we will explain the basic concepts about Spearman’s correlation and how to calculate it. Click here to find out how to perform hypothesis testing to determine whether Spearman’s correlation statistically equal to zero (i.e. to determine whether two samples are independent).

Definition 1: The Spearman’s rank correlation (also called Spearman’s rho) is the Pearson’s correlation coefficient on the ranks of the data.

Example

Example 1: The left side of Figure 1 displays the association between the IQ of each adolescent in a sample with the number of hours they listen to rock music per month. Determine the strength of the correlation between IQ and rock music using both Pearson’s correlation coefficient and Spearman’s rank correlation. Compare the results.

Spearman's rho

Figure 1 – Data for Example 1

To calculate Spearman’s rho, we need to determine the rank for each of the IQ scores and each of the Rock scores. E.g. the rank of the first IQ score (cell A4 in Figure 1) is =RANK.AVG(A4,A$4:A$13,1), and so we put this formula in cell C4. If you are using Excel 2007 you would use the Real Statistics function RANK_AVG instead of RANK.AVG (as explained in Ranking).

We now calculate both correlation coefficients as follows:

Pearson’s correlation = CORREL(A4:A13,B4:B13) = -0.036

Spearman’s rho = CORREL(C4:C13,D4:D13) = -0.115

We see that there isn’t much of a correlation between IQ and listening to rock music based on the sample.

Outliers

When conducting an analysis, if you discover the presence of outliers (e.g. via a histogram or scatter diagram), proceed as follows:

Calculate the Pearson’s correlation coefficient for the sample with and without the outliers. If there isn’t much difference, then you can be pretty confident that the outliers are not influencing the results. You can also calculate the Spearman’s rank coefficient. If this is pretty similar to Pearson’s correlation coefficient, this is also a good indicator that the outliers are not substantially influencing the results.

If there are clear differences then you will need to be cautious about how you treat the outliers.

Similarly, if you test the sample data for the x and y variables and see that either one of them is not roughly normal (using the techniques described in Testing for Normality and Symmetry), then you will need to use Spearman’s coefficient rather than Pearson’s.

Figure 2 displays a scatter diagram for the data in Example 1. We see that the data is pretty randomly scattered although there is a potential outlier where the rap music listening spikes to 45. This gives some evidence that Spearman’s rho might be a better choice.

Scatter plot Spearman's rho

Figure 2 – Scatter diagram for data from Example 1

Worksheet Formulas

Spearman’s rho for the data in ranges R1 and R2 can be calculated in Excel via the formula

=CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))

For versions of Excel prior to Excel 2010, the following formula will do the job.

=CORREL(RANK(R1,R1,1)+(COUNTIF(R1,R1)-1)/2,RANK(R2,R2,1)+(COUNTIF(R2,R2)-1)/2)

Real Statistics Function: The Real Statistics Resource Pack supplies the following function:

SCORREL(R1, R2) = Spearman’s rho for the data in ranges R1 and R2

For Example 1, SCORREL(A4:A13,B4:B13) = -0.115.

No Ties

When there are no ties in the ranking, there is alternative way of calculating Spearman’s rho using the following property.

Property 1: When there are no ties, Spearman’s rho is equal to

Spearman's rho

where di = rank xi – rank yi.

Example 2: Calculate Spearman’s rho for the data from Example 1 using Property 1.

Spearman's rho quick version

Figure 3 – Alternative way of calculating rho for Example 1

Using Property 1 and the data in Figure 3:

image3655

To find out how to test whether Spearman’s correlation coefficient is statistically equal to zero (i.e. to determine whether two samples are independent) click here.

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

References

eGyanKosh (2017) Unit 2: other types of correlation
http://egyankosh.ac.in/bitstream/123456789/20956/1/Unit-2.pdf

Wikipedia (2015) Spearman’s rank correlation coefficient
https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient

152 thoughts on “Spearman’s Rank Correlation”

  1. Hi Good Day! we are conducting a research with this spearman’s rho, I have already the data but we just don’t know how we will start. because there are so many data.

    Reply
  2. Thank you for your explanation! It is very helpful. I would like to ask if I will use spearman correlation if i have likert five-point scale datas? or it would be pearson r?

    Reply
  3. Hello,
    I am doing a correlational study using the ordinal scale (strongly agree/disagree) on the relationship between a generational work value and a parent aspirations. The generations will most likely include Boomers, Generation X, and Generation Y. How would I use the Spearman Rank? I am very bad at math and need some advice on how to calculate it and if I could with four different independent variables (the generations).

    Reply
    • Daniella,
      It really depends on what you are trying to accomplish and the data that you have.
      I understood that your objective is to determine the relationship between a generational work value and parents’ aspirations. I, therefore, make the following comments:
      1. Do you have data for work value for Boomers, Generation X, and Generation Y?
      2. Do you have data for parents’ aspirations for Boomers, Generation X, and Generation Y?
      3. For item #2 do you have data for the parents of the subjects in item #1?
      4. Do you want to be able to predict item #2 based on item #1 (regression)?
      5. Do you only need the relationship between items #1 and #2 for each of the 3 x 3 = 9 possibilities (correlation)?
      6. Do you have sufficient data to accomplish your objectives?
      Charles

      Reply
  4. Hi Charles and thanks again for all of this.

    I’ve been doing Spearman’s rho and Pearson’s r, and no matter which one I chose, Real Statistics output displayed both, plus Kendall. I guessed that’s how it is designed, so didn’t pay attention to that and was just taking the result I needed — i.e. used the value next to the cell Pearson if I had the “Pearson” button ticked in the Real Statistics pop-up, and used the value next to the cell Spearman if I have turned raw data into ranked data beforehand and had the “Spearman” button ticked. That’s until I encountered the following…

    One particular dataset was naturally ranked. Respondents were allowed to choose 1, 2 or 3 as their individual characteristic (so I did not need to rank them myself for Spearman’s rho), and I wanted to see if there’s any correlation between that characteristic and their answers to a question in Likert scale 1-5. Just for fun I first tried a calculation choosing Real Statistics “Pearson” button, then did another calculation for the same pair of ranges this time choosing “Spearman” button. For these two calculations Pearson, Spearman and Kendall came out exactly the same each (about -0.16, -0.13, and -0.12 respectively). However, p-values were different: 0.20 for “Pearson” calculation and 0.12 for “Spearman”. So here are my questions:
    – I thought if the data is already ranked, Pearson and Spearman should result in the same number. Am I wrong? As you see I got -0.16 for Pearson and -0.13 for Spearman.
    – Related to the previous, I also believed Excel function CORREL will result in Spearman’s rho if the data is already ranked. Am I wrong? Because trying Excel’s CORREL with my data resulted -0.16, i.e. the same as Pearson in Real Statistics.
    – Why the p-values for “Pearson” and “Spearman” calculations with Real Statistics are different, when for both calculations Pearson, Spearman and Kendall coefficients are exactly the same?

    Sorry for a long message, just trying to make it clear.

    Reply
  5. Hi Charles,
    First of all, thank you for your beautiful narration. However, there is something I want to ask. I will apply it to my rainfall data. And data with the same values ​​in my dataset, what to do in such a situation? For example;

    {3,6,5,3,4,7,2,6,5}

    In such a case, will data with the same value get the same rank number?

    Reply
    • like this i have same data values on my problem and they competing to that current rank for example 1rank you need to do is add the 1 and 2 rank then identify how many numbers are the same for example just two of them are competing just do 1+2/ 2= the answer will be the rank of the of them.

      Reply
      • Hi, as dr. Charles said you can use the rank.avg function for this case (or just use the scorrel directly. But I’d like to know why we always need to take the average rank, why not take the same rank. In your example you put 1+2/2=1.5, why we can’t take rank 1 for both observations just like in sports when 2 players gets the same rank (10th rank for example and the next player will take the 12th rank) just like in rank.eq function.

        Reply
        • You can use rank instead of average rank. It depends on what you are trying to accomplish and which is the metric for this. Usually average rank is the better choice.
          Charles

          Reply
  6. Hi, Charles,
    suppose we have 1000+ data, what if on column rock, there are some empty cells.
    how do you get the rank ? and how do you calculate Spearman’s coeff ?

    Thanks

    Reply
    • Hello Ian,
      Suppose that you have 5 variables and 8 data elements for each variable. The data could be stored in an 8 x 5 range, say A1:E8. Follow the following steps:
      1. Insert the array formula =RANK.AVG(A1:A8,A1:A8,1) in range G1:G8 (this creates a column range with the ranks of the first column of data)
      2. Highlight range G1:K8 and press the key sequence Ctrl-R (this yields the ranks of each of all 5 columns)
      3. Insert the Real Statistics array formula =CORR(G1:K8) in range M1:Q5
      Charles

      Reply
  7. Hi Charles,

    Thanks for your invaluable content!

    If you have time, could you take a look at my statistics issue?
    Purpose is to calculate how much these two sets of variable values are alike.

    The first variable could have zero, one or more data points (numerical value) for very day day. The same for the other variable (also numerical value).

    I am trying to compare these two variables that could have a different number of data points. So variable1 could have 100 data points and variable2 127 data points.

    Please advise? Thanks.
    1. Should I sum the values per day and set the value to zero when there is no data point. To get the same number of “data points” for both variables?
    2. Or is there a good method to work with different number of data points?

    Thanks again.
    All the best.

    Reply
  8. Hello, Charles

    I would like to ask you regarding Spearman’s rank correlation on three variables, if it is okay with you.

    I have three variables: speed, accuracy, and performance. Could Spearman’s rank correlation be used? The problem is that there would be so many d’s to use, and the formula gets all jumbled. I am sorry if I sound a little ignorant on my part, I just really want to know.

    Thank you!

    Reply
    • Just calculate the ranks of the data in each of the three variables individually. Then either use the Real Statistics MCORREL function on the ranked data or perform linear regression (which outputs the correlation coefficient) on the ranked data.
      Charles

      Reply
      • Hello Charles,
        I have the same problem: I would calculate the Spearman’s correlation rho between more than three variables (seven or more) and I used the Real Statistics MCORRELL function, but it didn’t work: the program gave me an error message: How can I do?
        Thanks so much,
        Davide

        Reply
        • Davide,

          If you are looking for the pairwise Spearman’s rho between the 7 variables, then you can use the following Real Statistics array formula
          =CORR(RANKCOL(R1))
          Here R1 is a range that contains your data, one column for each variable.
          If, for example, you have 7 variables with the data in range A1:G100 (i.e. each sample contains 100 elements), then you need to highlight a 7 x 7 range enter the formula =CORR(RANKCOL(A1:G100)) and press Ctrl-Shft-Enter to get the result.

          If instead you are looking for one number, i.e. the correlation between one of the 7 variables (the dependent variable) and the other 6 (the independent variable), you would typically conduct a linear regression and use the square root of the R-square value. Since you want Spearman’s correlation, then you would first need to rank the data. You have two choices: (1) rank all the data together or (2) rank the data for each variable separately. In this later case, the result would be the same as the output from the following array function:
          =RSquare(RANKCOL(R1),1)
          In this case the first column would contain the data for the dependent variable (if the 5th column corresponds to the dependent variable then you would replace the 1 by a 5). Even though this produces a single value, it is still an array formula and so you need to press Ctrl-Shft-Enter.
          Charles

          Reply
  9. Hello Charles

    Could you please clarify something for me.. (I am statistically very naïve)

    I would like to correlate 2 continuous variables of accuracy and efficiency. The efficiency score is the quotient of the accuracy score and time taken. The raw data accuracy values vary from 1 to 50 in increments of 1, whereas the efficiency score varies from 0.01 to 0. Can I correlate these two measures even though an increase in 1 for accuracy might result in an increase of only 0.005 in the efficiency score?

    Am I totally off on the wrong track? I considered methods comparison as an alternative but am worried about the same concept as described above

    Thank you

    Reply
    • You can certainly correlate these two variables provided they have the same number of elements. Note that if you multiply all the data for one of the variables by a fixed amount, the correlation doesn’t change.
      Once you calculate the correlation coefficient, how do you plan to use the result? What are you trying to test?
      Charles

      Reply
    • Hello Maurice,
      Excel’s RANK.AVG handles ties. E.g. the values 2, 4, 6, 7, 9, 20 have ranks 1, 2, 3, 4, 5, 6. But 4, 4, 5, 5, 5, 6 have ranks 1.5, 1.5, 4, 4, 4, 6. In this last example the two 4’s have ranks 1 and 2, and so the ranks are reassigned as 1.5 and 1.5 (the average). Similarly, the three 5’s have ranks 3, 4, 5, and so these ranks are reassigned as the average, namely 4, 4, 4 (here 4 = (3+4+5)/3).
      Spearman’s rank correlation is simply =CORREL(R1,R2) where R1 = the ranks of the first data set and R2 = the ranks of the second data set.
      Charles

      Reply
  10. Hello!

    First of all, thank you for building up this website!! your didactic style is really good.

    Could you please help me out with my questions?
    I calculated a spearman’s rho of 0,6 in a sample of 50. Could one say that there is a probable correlation/there could be a correlation between the two tested variables?
    And how do I calculate a significance for Spearman’s rho?

    Thank you very much in advance!

    Greetings,

    Ruth

    Reply
  11. Hi Charles

    I did the example that is in the web site and the results of the Spearman’s formula vs the Pearson’s correlation of the ranks are the same, only if no exist one or more observations with the same value (for each variable).

    In your example, if you change the 6th observation of the “IQ” variable (105) with the same value of the 5th observation (123) the result using Spearman’s formula is -0.0212121, but if we do the calculation using the Pearson’s correlation of the ranks [ with excel formula is CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1)) ] the result is -0.0243162.

    Reply
      • Hi Charles

        I can’t see the SCORREL() code, but I suppose that this methodology uses the Pearson correaltion in the ranks [as CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))] because the results are the same that you showed. Now, if you use the “tratidional” Spearman’s formula, not using SCORREL(), the results are different.

        Reply
          • Sorry, I haven’t been so clear. I’ll try to explain it.

            In your example, all the observations of “IQ” are different (even the observations of “Rock”, but the analysis is in “IQ”); with this values the Spearman’s correlation calculated with the “traditional” formula [1 – (6*sum(d^2))/(n^3 – n)] is -0.11515152, and if it calculated with the Pearson’s correlations of the ranks [CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))] the value is the same (-0.11515152).

            But, if one or more observations has the same value the results are not the same. In the “IQ” variable, I changed the 6th observation (105) for the same value of the 5th observation (123); with the “traditional” formula [1 – (6*sum(d^2))/(n^3 – n)] the Spearman’s corrlation is -0.02121212 and, with the Pearson’s correlations of the ranks [CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))] the result is -0.02431622.

            Thank you for your time.

  12. Hello Charles,

    I’m trying to run a Spearman correlation analysis on certain data. I’ve read that Spearman isn’t recommended for non-monotonic trends. Based on scatterplot analyses, I noticed that some of my variables do not have a monotonic relationship.

    Is there a way to transform the data, to allow me to run a Spearman analysis? Or should I just proceed with Spearman?

    Thank you.

    Reply
    • Gem,
      It probably depends on why you want to use Spearman’s correlation in the first place. I don’t which transformation you would use since the most common ones preserve monotonicity.
      Charles

      Reply
      • Thank you for your reply.

        I’m using Spearman because one of my variables are ordinal, and my variables don’t exhibit a linear relationship and there are outliers. My aim is to just determine correlation among y (a continuous variable) and my different x variables. My only concern is the lack of monotonicity.

        Would the lack of monotonicity be a big issue? Thank you again.

        Reply
        • Gem,
          Correlation is a measure of association when the data is more or less linear (and so monotonic). Spearman’s correlation is on the ranks, but the ranking function is monotonic. I would say that monotonicity is important, but as usual the devil is in the details — i.e. minor violations of monotonicity are probably not very important; more important are trends and major departures from monotonicity (actually in linearity in the ranks).
          Charles

          Reply
  13. Hey can you help me out on a project on how to do the R2 spearman rho?
    I’m doing a correlation study on bullying and self-esteem level.
    I have 40 participants who took a survey to determine their level of self-esteem and to determine if they’ve been bullied before or not.
    So i have 2 variables, Bullied and self-esteem. The range of the self esteem is Moderate=(21-33) and High=(34-45)
    For bullying, (Yes=28 people) (No=12 people)
    For self-esteem (Moderate Self-esteem=23 people)(High S.E.=17 people)
    The chi-square statistic is 1.7586. The p-value is .184797. This result is not significant at p < .05.

    So how can I use my data and use it on R-squared spearman rho?
    Thanks.

    Reply
  14. Hi Charles,

    I have 100 randomly selected variables(v1) and I need a set of 100 variables (v2) which has Spearman’s rho of 0,4. I mean, I know the correlation(Spearman’s rho) between two sets of data(0,4) but for each of the v1, I need to find a correlated v2 value. I would appreciate if you can explain how I can formulate this on excel?

    Thank you for your kind help..

    Reply
    • I assume that you have a sample S1 with 100 elements and you are looking for another sample S2 such that Sprearman’s rho(S1, S2) = .4.

      I can solve the problem if you use Pearson’s correlation instead of Spearman’s correlation in the following manner:
      1. Assume that the data in S1 is located in range A1:A100
      2. Place a zero in all cells of range B1:B100 (actually you place any values you like in these cells)
      3. Insert the formula =CORREL(A1:A100,B1:B100) in cell D1
      4. Use Excel’s Goal Seek capability (Data > What-if Analysis|Goal Seek) and enter D1 for Set cell, .4 for To value and B1 for By changing cell, and then press the OK button.
      Range B1:B100 contains the values that you are looking for.

      For Spearman’s rho you can use the Real Statistics function SCORREL instead of CORREL, but the Goal Seek capability may not converge to an answer unless you are more clever about the values you put in range B2:B100.

      Charles

      Reply
  15. For those who have been asking about how to calculate Spearman’s rho when there are ties, this is how you would do it manually.

    1) Rank the values in each of the two groups (call them X and Y) individually, representing tied values by the average of the ranks they would have been assigned had they not been tied.

    2) Calculate the average of the ranks for the two groups (X-bar and Y-bar).

    3) Calculate (X minus X-bar)*(Y minus Y-bar) for each paired data point and add these values together. Note that some of these products may be negative if one value is above its mean and the other is below it.

    4) Divide the total in step 3 by n-1, where n is the number of pairs. This is the covariance of the ranks.

    5) Divide the above covariance by the product (Sx * Sy), where Sx and Sy are the sample standard deviations (using n-1) of the ranks of the two groups X and Y.

    The result of this division is the Spearman’s rho value. Note that it may not be equal to the value of SR given by the formula with the 6; as a. Matter of fact, if there are extensive ties, there may be quite a difference. But if there are no ties, it is mathematically equivalent to the formula with the 6.

    Good luck, everyone!

    Reply
    • I have since learned of perhaps a simpler way to perform Spearman’s rank when there are ties, so here it is for those who are interested. It certainly requires less calculation.

      For each case of n-way ties involved for both variables, calculate their respective weights (n^3-n)/12. So a 3-way tie in one variable would have a weight of (3^3 – 3)/12, or 2. Add the weights for each individual case of ties to the sum of the squared rank differences, then complete the rest as usual (multiplying by 6, dividing by n^3-n, and subtracting from 1). Perhaps this will be easier to implement.

      Reply
  16. Hey, I’m running some calculations in Excel and have very strange results: is it possible that for the same dataset Pearson is positive and Spearman is negative?

    Reply
  17. Hi All,
    If i have such data (Below). How do i calculate the Spear man’s Rank order for the

    Statistical Parameter D R A S T I C
    Mean 6.5 4.5 7.0 8.0 6.0 8.5 1.5
    Std Error 3.5 1.5 3.0 2.0 3.0 1.5 0.5
    Median 6.5 4.5 7.0 8.0 6.0 8.5 1.5
    Minimum 3.0 3.0 4.0 6.0 3.0 7.0 1.0
    Maximum 10.0 6.0 10.0 10.0 9.0 10.0 2.0
    SD 4.95 2.12 4.24 2.83 4.24 2.12 0.71
    CV (%) 76.15 47.14 60.61 35.36 70.71 24.96 47.14

    NB: DRASTIC is a model used for groundwater Vulnerability Assessment.

    Any body with an idea kindly assist.

    Reply
  18. Hi Charles,

    Thanks for your brilliant information.

    I have 150 rows of data across 20 variants to compare. I am using ranked data. My plan is to use Spearman Rho, and I need to do the complex method as I have quite a lot of tied data. Your example is very clear and I have this under control.

    Can you tell me what is the difference between the Excel CORRELL function, and doing all the calculations needed for Spearman Rho? They do give slightly different answers, but when I looked up the CORRELL function it suggested to me it was doing what I’m doing to get Spearman Rho? Sorry I am probably confused, but would love to know the answer.

    Many thanks,
    Claire

    Reply
    • Claire,
      The CORREL function outputs the Pearson’s correlation coefficient. Spearman’s rho is Pearson’s correlation coefficient, not on the original data, but on the ranked data.
      Charles

      Reply
  19. Hi Charles,

    Thank you for your explanations.

    I need to do a Spearman correlation with correction for multiple testing of 2 variables but instead of getting the rho coeficient of the 2 variables, I need to get rho for each pair of data, i.e., the table should look like this:

    gene var1 var2 rho p-value FDR
    SFRP1 -0.44 -8.96 -0.0130 0.7130 0.9258
    PCDH10 -0.28 -7.21 0.2530 0.3630 0.7256
    STMN2 -0.10 -6.76 -0.7050 0.0050 0.0158
    LCE3D -0.35 -6.00 0.0590 0.8340 0.9657
    LY6K 0.64 -5.64 -0.5970 0.0190 0.6852

    By “pair” I mean each individual gene considering the var1 and var2 values.

    Any light on how I could do this analysis?

    Thank you very much in advance.

    Best
    Gema

    Reply
    • Gemma,
      It sounds like you are looking for the correlation matrix using Spearman’s correlation instead of Pearson’s correlation. You can do this by ranking the data (as for Spearman’s correlation). Then create a correlation matrix based on the ranked data. E.g. you can use the Real Statistics CORR function for this.
      Charles

      Reply
  20. Hi charles I am wondering because I am doing a likert scale analysis using herzeberge two factor theory, do i use the mean for my central tendency and sd? How about pearson and anova? I am very new at this TIA

    Reply
    • Zak,
      Spearman’s rank correlation is used because ranked data is normally distributed, and so has some desirable mathematical properties.
      Charles

      Reply
      • Sorry, I probably wasn’t very good at asking what I meant, I was wondering as to exactly why the formula for Spearman’s Rank itself actually works. Why ρ = 1 − (6 ∑ d^2)/(n ( n 2 − 1 )) is a formula that gives you a correlation between to sets of data and why this digit is between 1 and -1. Thank you, though, for the reply.

        Reply
        • Zak,
          That formula was popular when people used to do these calculations by hand. Today with computers you don’t need to use this formula, but simply calculate the correlation coefficient of the ranks.
          As to why the formula works, you need to do a little algebra to show that this formula is equivalent to calculating the correlation coefficient of the ranks (at least when there are no ties). I haven’t included this algebra because I thought that there wouldn’t be much interest in it.
          Charles

          Reply
    • Revathy,
      The approach is exactly as explained on the referenced webpage. The RANK.AVG function takes care of ties. If ROCK 45 is repeated then each will have a rank of 9.5.
      Charles

      Reply
  21. Dear Charles,

    Thank you for this! I would just like to ask if there is any way this could be done when comparing data across time. Thank you!

    Reply
      • Sorry to hijack this, but I would like to do the same. You see, I am comparing data in which the correlation would be lagged eg weather data that effect population further down the line. How would I do that? Also, how does one get a p-value from here?

        Reply
        • Chisara,
          Please explain what you want to do in more detail. I don’t completely understand. Are you looking for a ranked version of autocorrelation?
          Charles

          Reply
          • Does ranked autocorrelation allow you to see where the best correlation would occur? I am trying to see if weather affects on earlier generations affect future ones and I need to match up the the variables to see where there is the most correlation.

          • Chisara,
            I still don’t really understand your questions. Why can’t you simply calculate the correlation matrix and see which coefficients are highest?
            Charles

  22. Hi
    I would like to calculate spearman’s rank order correlation coefficients for 30 variables and then create a matrix to determine which variables are highly correlated with each other. May I have any appropriate example for that?

    Reply
      • Hi
        I was wondering if this will be still spearman rank order correlation coefficient?
        Are you suggesting to use this formuala CORR(R1) = MMULT(TRANSPOSE((R1-R2)/R3),(R1-R2)/R3)/(ROWS(R1)–1)

        Reply
        • Yask,
          Yes, where R1 contains the ranks of the original data and R2 and R3 are derived from the ranked data.
          You can check to see whether this works by using the SCORREL function on any pair of columns in the original data.
          Charles

          Reply
  23. Dear Charles,
    First I want to thank you for your efforts on the website. Now my question:compared to using the formula for calculating spearman,using =CORREL (…) is a very easy way to get the same result. Unfortunately, in some rare cases I achieve the same result for spearman rho through both formula and =CORREL. This happens mostly in a sample of 10 respondents. In all other cases I’ve two different results. Meanwhile I checked every step so many times but unfortunately I’m not able to achieve the same outcome through both ways consistently. I hoop my problem is clear.

    Thank you.
    Lily

    Reply
    • Dear Lily,
      I don’t completely understand the problem that you are identifying. Can you send me an Excel file with an example? You can send it to my email addres, which is listed on the Contact Us webpage.
      Charles

      Reply
    • Lily,

      Thanks for sending me the Excel file with your data.

      The formula (1-6*sum/n(n^2-1)) is only valid when there are no ties. In my example there were no ties; in your example there are ties. Using CORREL on the ranked data will always work. Using SCORREL on the original data will always work.

      Charles

      Reply
  24. Would you mind changing the example above? I often find that rap music is a stand in for race in many circles. Psychology has such a tainted history of manufacturing differences in IQ according to racial lines. I’m sure you didn’t mean to create a racist example, but I think there are so many other scenarios you could use that don’t reinforce the experience of so many people with stereotypes.

    Reply
  25. Good afternoon,
    Thank you for the page, it is really helpful.
    I just have comment: how can I deal with missing observations? I mean, I need to calculate a Spearman correlation matrix, and unfortunately I need to deal with missing observations which are different for each of the variables.
    Can you please help?
    Thanks!

    Reply
    • Henry,
      When you say that you have a Spearman correlation matrix, are you referring to a square matrix whose elements are pairwise Spearman’s correlation coefficients based on ranks (and not Pearson’s correlation coefficients)? How many variables are there (i.e. what is the dimension of the correlation matrix)? How many subjects are in the sample?
      Charles

      Reply
      • Yes I have a set of 5 ordinal variables for which I would calculate all the correlations, something like the normal square correlation matrix but using Spearman rho instead.
        My problem is that each variable has different missing observations (for different units, so the size of the sample is different for each variable, going from 87 to 98). I am not sure on how to deal with this in excel.
        Thanks for your help!
        Henry

        Reply
        • Henry,
          You can calculate the correlation coefficient for each pair (with listwise deletion of missing data for each pair). This will give you C(5,2) = 10 correlation coefficients plus 5 variances. Put these into a 5 x 5 matrix (here 10 x 2 + 5 = 25) and you have a pseudo correlation matrix. This matrix won’t have all the properties of a real correlation matrix, but if you don’t have too much missing data it will be pretty close.
          Charles

          Reply
    • Hello Faith,
      Just change the first IQ value in Figure 1 of the referenced webpage from 99 to 90 and you will get a tied value. You can do this on the Excel worksheet that you can download from Examples Workbooks.
      Charles

      Reply
  26. Hi, Charles.

    Using the Resource Pack, the value Ive got for Spearman for the above unranked data is 0.1244 and NOT -0.115. However, the Spearman value is the same for unranked and ranked dataset in Example 1 of Spearman’s Rank Correlation Hypothesis Testing page (https://real-statistics.com/correlation/spearmans-rank-correlation/spearmans-rank-correlation-detailed/). As I understand it, the Resource Pack correlation option does the ranking work for us such that we do not have to rank the dataset by ourselves and we just put the range of the unranked datasets in the correlation menu of the Resource Pack to get the correlation coefficient, is this so? Thanks very much.

    Reply
    • Allan,

      Unfortunately, there is an error in the SCORREL function. The correct Spearman’s correlation value is -0.115 and not 0.1244. This error occurs when one of the data elements is zero. I will correct this in the next release, which I will issue in the next day or so.

      The Resource Pack does the ranking for you. You don’t need to do it yourself.

      Thanks for your comment. Thanks to following up on your comment I was able to discover that the SCORREL has a bug.

      Charles

      Reply
    • Lukku,
      R1 and R2 represent two data ranges. If you want the correlation coefficient corresponding to the data in A1:A10 and B1:B10, you can think of R1 as representing A1:A10 and R2 as representing B1:B10.
      Charles

      Reply
  27. I want to know the correlation of the weighted means of my two set of respondents, the parents and students… I used five-point likert scale. I ranked responses based on the statements given on the two groups of respondents according to the weighted means. Now, my question is how am I going to correlate it? Thank you.

    Reply
  28. Hello,

    I have tried using =CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1)) but Excel really doesn’t like the (R1,R1,1) or (R2,R2,1) bits.

    I’ve substituted them for (A2,A12,1) and (B2,B12,1) because my data is in the first two columns, 11 pieces each from rows 2-12. It doesn’t help. I don’t know why there are three terms in the brackets anyway – what is the “1” for at the end?

    I’ve also tried to do it the long way round – do a RANK.AVG on each column and then do another column for d and one for d². But I can’t work out what to type after RANK.AVG. The Help function doesn’t really help.

    (I know it would be easier to rank by hand, but ultimately I want to enter 120 pairs of data).

    Thanks to anyone who can help me…

    Reply
    • Madeline,

      R1 and R2 are just abbreviations for range 1 and range 2. You can’t enter these letters in the formulas unless you are actually referring to cells R1 and R2.

      More importantly, when you enter a range into any Excel formula you need to specify the range by its end points separated by a colon (not a comma). Thus your two ranges are probably A2:A12 and B2:B12. The formula you are probably trying to use if therefore

      =CORREL(RANK.AVG(A2:A12,A2:A12,1),RANK.AVG(B2:B12,B2:B12,1))

      Charles

      Reply
    • Phoenix,
      Sorry, but I don’t understand what you mean by “solve two weighted means” nor why you would want to use spearman’s to do this.
      Charles

      Reply
  29. Dear Sir
    What can I do if i have 3 repetitions.According to your table if student have the same IQ score how can you rank them and how can you calculate?
    Best Regard

    Reply
    • As described on the referenced webpage, you can calculate Spearman’s rank correlation by using the formula
      =CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))

      The RANK.AVG function takes care of any ties.

      You can also use the Real Statistics SCORREL(R1,R2) function.

      Charles

      Reply
  30. Hi Charles,

    I am completing an elective project and am trying to establish if there is a correlation between being a member of a certain organisation and whether or not someone thinks a subject matter should be included in training and development (answer options to this are simply yes and no).

    Firstly, is this possible to do? And if so how do I do it?

    Stacey

    Reply
    • Stacey,
      You can calculate the correlation using the CORREL function. It is hard to tell from the summary you have given, but a t test may be sufficient for your purposes.
      Charles

      Reply
  31. Is it possible to use Spearman’s rank correlation between 3 and 5 point likert scale questions?
    In my survey I am asking question on attitude and purchase behavior, but both of them have questions in different likert scale. Attitude questions are in 5 point likert scale and purchase behavior question are in 3 point likert scale. So is it possible to use Spearman Rank correlation between these two different likert scale question in testing the Hypothesis?

    Reply
    • Yes, you can calculate a Spearman’s rank correlation coefficient from different Likert scales. What hypothesis do you want to test?
      Charles

      Reply
  32. Dear Charles!
    Is there any way to find Spearman’s Rank Correlation, when number of variables are not same?
    example X has 35 values and Y has 30 values.

    Reply
    • Just like Pearson’s correlation, this statistic performs a calculation based on pairs of data elements from X and Y. Thus the X and Y counts need to be the same.
      Charles

      Reply
  33. Charles,

    Once again – great website!

    How do you conclude with a spearman rho of -0.115 and a Pearson of -0.036 that ‘the Spearman’s rho is closer to zero (indicating independent samples) than the Pearson’s.’? Isn’t Spearman indicating they are more negatively correlated than Pearson?

    Reply
    • Thanks for identifying this mistake. I have removed this phrase, which is clearly in error, from the webpage.
      Charles

      Reply
  34. Hello,
    This is a very helpful explanation, thank you.
    You mention that a different approach is needed with a set of data with many ties, and I hope you can help me out, as I am no mathematician, but trying to analyse data from some language tests. I have 46 participants with three test results each (they have scored between 1-6 in each test). I would like to compare the results in two sets of test results to see if they correalate, but there is a large number of ties – what would you suggest that I do?
    Thanks in advance, Tina

    Reply
    • Tina,
      Spearman’s handles ties by averaging the ranking values; this is the equivalent of RANK.AVG. I haven’t been able to find a ties correction factor for Spearman’s. In any case, Kendall’s tau is usually recommended over Spearman’s and Kendall’s tau does have a ties correction factor as explained on the webpage Kendall’s Tau with Ties.
      Charles

      Reply
  35. Would you please explain..
    1. In which order we rank the data? Ascending or descending? Which value is labelled as no 1, highest or lowest???

    2. Why it is multiply by 6?

    Thanks.

    Reply
    • 1. As you can see from Figure 1, the largest data gets the highest ranking.
      2. I don’t know why the formula has a multiplier of 6. I guess the mathematics just worked out that way.
      Charles

      Reply
      • Hi Charles,

        Here’s the reasoning behind why there is a 6 in the Spearman’s rank formula…it’s actually pretty insightful. If the two rankings are identical in every place, obviously the sum of their squared differences will be 0, the minimum. If the rankings are in reverse order, the sum of squared differences is a maximum, and it can be shown to equal (n*(n^2-1))/3, where n equals the number of pairs. We want a “common” scale to judge these differences by; that is, we do not want this scale to depend on the number of pairs. In addition, we want the scale to range from -1 (perfect negative correlation) to 1 (perfect positive correlation). Therefore, we are looking for a transformation that takes 0 (minimum sum of squares) to 1 and that takes (n*(n^2-1)/3) to -1. If you find the slope of the line connecting the two points (0,1) and (n(n^2-1)/3, -1), you will find that it equals -6/(n(n^2-1)). There’s your 6!

        Reply
  36. Thanks a lot. I see in method 2 above you have ranked the values with the smallest being assigned position 1. I tried the opposite so that the biggest value is number 1 and was getting the same answer. is that right? I have tried with other sets of data and the answer is the same whatever the ranking used.

    Reply
      • Yes, this is correct. As long as you are consistent in your rankings of both groups, it doesn’t matter whether 1 corresponds to the highest or lowest value in the group. But if 1 corresponds to the highest value in group 1, it must also correspond to the highest value in group 2, and vice versa.

        Reply
  37. Thank you, your information was very helpful. I am Civil engineer and my information about statistic is low. I want to use the first paragraph of this page in my thesis our article, I would appreciate if you give me a reference which includes the meaning of first paragraph.

    Reply

Leave a Comment