Spearman’s Rank Correlation

When data is not normally distributed or when the presence of outliers gives a distorted picture of the association between two random variables, Spearman’s rank correlation is a non-parametric test that can be used instead of the Pearson’s correlation coefficient.

On this webpage, we will explain the basic concepts about Spearman’s correlation and how to calculate it. Click here to find out how to perform hypothesis testing to determine whether Spearman’s correlation statistically equal to zero (i.e. to determine whether two samples are independent).

Definition 1: The Spearman’s rank correlation (also called Spearman’s rho) is the Pearson’s correlation coefficient on the ranks of the data.

Example 1: The left side of Figure 1 displays the association between the IQ of each adolescent in a sample with the number of hours they listen to rock music per month. Determine the strength of the correlation between IQ and rock music using both Pearson’s correlation coefficient and Spearman’s rank correlation. Compare the results.

Figure 1 – Data for Example 1

To calculate Spearman’s rho, we need to determine the rank for each of the IQ scores and each of the Rock scores. E.g. the rank of the first IQ score (cell A4 in Figure 1) is =RANK.AVG(A4,A$4:A$13,1), and so we put this formula in cell C4. If you are using Excel 2007 you would use the Real Statistics function RANK_AVG instead of RANK.AVG (as explained in Ranking).

We now calculate both correlation coefficients as follows:

Pearson’s correlation = CORREL(A4:A13,B4:B13) = -0.036

Spearman’s rho = CORREL(C4:C13,D4:D13) = -0.115

We see that there isn’t much of a correlation between IQ and listening to rock music based on the sample.

Observation: When conducting an analysis, if you discover the presence of outliers (e.g. via a histogram or scatter diagram), proceed as follows:

Calculate the Pearson’s correlation coefficient for the sample with and without the outliers. If there isn’t much difference, then you can be pretty confident that the outliers are not influencing the results. You can also calculate the Spearman’s rank coefficient. If this is pretty similar to Pearson’s correlation coefficient, this is also a good indicator that the outliers are not substantially influencing the results.

If there are clear differences then you will need to be cautious about how you treat the outliers.

Similarly, if you test the sample data for the x and y variables and see that either one of them is not roughly normal (using the techniques described in Testing for Normality and Symmetry), then you will need to use Spearman’s coefficient rather than Pearson’s.

Figure 2 displays a scatter diagram for the data in Example 1. We see that the data is pretty randomly scattered although there is a potential outlier where the rap music listening spikes to 45. This gives some evidence that Spearman’s rho might be a better choice.

Figure 2 – Scatter diagram for data from Example 1

Observation: Spearman’s rho for the data in ranges R1 and R2 can be calculated in Excel via the formula

=CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))

For versions of Excel prior to Excel 2010, the following formula will do the job.

=CORREL(RANK(R1,R1,1)+(COUNTIF(R1,R1)-1)/2,RANK(R2,R2,1)+(COUNTIF(R2,R2)-1)/2)

Real Statistics Function: The Real Statistics Resource Pack supplies the following function:

SCORREL(R1, R2) = Spearman’s rho for the data in ranges R1 and R2

For Example 1, SCORREL(A4:A13,B4:B13) = -0.115.

Observation: When there are no ties in the ranking, there is alternative way of calculating Spearman’s rho using the following property.

Property 1: When there are no ties, Spearman’s rho is equal to

where d_i = rank x_i – rank y_i.

Example 2: Calculate Spearman’s rho for the data from Example 1 using Property 1.

Figure 3 – Alternative way of calculating rho for Example 1

Using Property 1 and the data in Figure 3:

To find out how to test whether Spearman’s correlation coefficient is statistically equal to zero (i.e. to determine whether two samples are independent) click here.

146 thoughts on “Spearman’s Rank Correlation”

Sheena

June 25, 2021 at 6:34 am

Hi Good Day! we are conducting a research with this spearman’s rho, I have already the data but we just don’t know how we will start. because there are so many data.
Reply
- Charles
  
  June 25, 2021 at 10:25 am
  
  Hello Sheena,
  How much data do you have? Why did you choose Spearman’s rho instead of Pearson’s correlation or Kendall’s tau?
  Charles
  Reply
Sheena

April 19, 2021 at 6:30 am

Thank you for your explanation! It is very helpful. I would like to ask if I will use spearman correlation if i have likert five-point scale datas? or it would be pearson r?
Reply
- Charles
  
  April 19, 2021 at 8:21 am
  
  Sheena,
  In this case, Pearson’s and Spearman’s correlation are equal.
  Charles
  Reply
Daniella Weinstein

November 8, 2020 at 12:44 am

Hello,
I am doing a correlational study using the ordinal scale (strongly agree/disagree) on the relationship between a generational work value and a parent aspirations. The generations will most likely include Boomers, Generation X, and Generation Y. How would I use the Spearman Rank? I am very bad at math and need some advice on how to calculate it and if I could with four different independent variables (the generations).
Reply
- Charles
  
  November 8, 2020 at 9:15 am
  
  Daniella,
  It really depends on what you are trying to accomplish and the data that you have.
  I understood that your objective is to determine the relationship between a generational work value and parents’ aspirations. I, therefore, make the following comments:
  1. Do you have data for work value for Boomers, Generation X, and Generation Y?
  2. Do you have data for parents’ aspirations for Boomers, Generation X, and Generation Y?
  3. For item #2 do you have data for the parents of the subjects in item #1?
  4. Do you want to be able to predict item #2 based on item #1 (regression)?
  5. Do you only need the relationship between items #1 and #2 for each of the 3 x 3 = 9 possibilities (correlation)?
  6. Do you have sufficient data to accomplish your objectives?
  Charles
  Reply
Vahe Odabashian

September 3, 2020 at 9:00 pm

Hi Charles and thanks again for all of this.

I’ve been doing Spearman’s rho and Pearson’s r, and no matter which one I chose, Real Statistics output displayed both, plus Kendall. I guessed that’s how it is designed, so didn’t pay attention to that and was just taking the result I needed — i.e. used the value next to the cell Pearson if I had the “Pearson” button ticked in the Real Statistics pop-up, and used the value next to the cell Spearman if I have turned raw data into ranked data beforehand and had the “Spearman” button ticked. That’s until I encountered the following…

One particular dataset was naturally ranked. Respondents were allowed to choose 1, 2 or 3 as their individual characteristic (so I did not need to rank them myself for Spearman’s rho), and I wanted to see if there’s any correlation between that characteristic and their answers to a question in Likert scale 1-5. Just for fun I first tried a calculation choosing Real Statistics “Pearson” button, then did another calculation for the same pair of ranges this time choosing “Spearman” button. For these two calculations Pearson, Spearman and Kendall came out exactly the same each (about -0.16, -0.13, and -0.12 respectively). However, p-values were different: 0.20 for “Pearson” calculation and 0.12 for “Spearman”. So here are my questions:
– I thought if the data is already ranked, Pearson and Spearman should result in the same number. Am I wrong? As you see I got -0.16 for Pearson and -0.13 for Spearman.
– Related to the previous, I also believed Excel function CORREL will result in Spearman’s rho if the data is already ranked. Am I wrong? Because trying Excel’s CORREL with my data resulted -0.16, i.e. the same as Pearson in Real Statistics.
– Why the p-values for “Pearson” and “Spearman” calculations with Real Statistics are different, when for both calculations Pearson, Spearman and Kendall coefficients are exactly the same?

Sorry for a long message, just trying to make it clear.
Reply
- Charles
  
  September 4, 2020 at 9:17 am
  
  If you email me an Excel file with your data and the results, I will try to see why you are getting an unexpected result.
  Charles
  Reply
TUĞÇE HIRCA

August 19, 2020 at 9:56 am

Hi Charles,
First of all, thank you for your beautiful narration. However, there is something I want to ask. I will apply it to my rainfall data. And data with the same values in my dataset, what to do in such a situation? For example;

{3,6,5,3,4,7,2,6,5}

In such a case, will data with the same value get the same rank number?
Reply
- Charles
  
  August 20, 2020 at 9:49 am
  
  I am sorry, but I don’t understand your question.
  Charles
  Reply
- vincent
  
  October 21, 2020 at 3:19 am
  
  like this i have same data values on my problem and they competing to that current rank for example 1rank you need to do is add the 1 and 2 rank then identify how many numbers are the same for example just two of them are competing just do 1+2/ 2= the answer will be the rank of the of them.
  Reply
  - hghwy
    
    November 8, 2020 at 8:13 am
    
    Hi, as dr. Charles said you can use the rank.avg function for this case (or just use the scorrel directly. But I’d like to know why we always need to take the average rank, why not take the same rank. In your example you put 1+2/2=1.5, why we can’t take rank 1 for both observations just like in sports when 2 players gets the same rank (10th rank for example and the next player will take the 12th rank) just like in rank.eq function.
    Reply
    - Charles
      
      November 8, 2020 at 9:17 am
      
      You can use rank instead of average rank. It depends on what you are trying to accomplish and which is the metric for this. Usually average rank is the better choice.
      Charles
      Reply
Abqori

June 25, 2020 at 3:13 am

Hi, Charles,
suppose we have 1000+ data, what if on column rock, there are some empty cells.
how do you get the rank ? and how do you calculate Spearman’s coeff ?

Thanks
Reply
Ian

November 6, 2019 at 12:18 am

Does anyone know if it is possible to create a correlation matrix using the spearman’s correlation in Real Statistics?
Reply
- Charles
  
  November 7, 2019 at 6:54 pm
  
  Hello Ian,
  Suppose that you have 5 variables and 8 data elements for each variable. The data could be stored in an 8 x 5 range, say A1:E8. Follow the following steps:
  1. Insert the array formula =RANK.AVG(A1:A8,A1:A8,1) in range G1:G8 (this creates a column range with the ranks of the first column of data)
  2. Highlight range G1:K8 and press the key sequence Ctrl-R (this yields the ranks of each of all 5 columns)
  3. Insert the Real Statistics array formula =CORR(G1:K8) in range M1:Q5
  Charles
  Reply
Aad

September 23, 2019 at 11:30 am

Hi Charles,

Thanks for your invaluable content!

If you have time, could you take a look at my statistics issue?
Purpose is to calculate how much these two sets of variable values are alike.

The first variable could have zero, one or more data points (numerical value) for very day day. The same for the other variable (also numerical value).

I am trying to compare these two variables that could have a different number of data points. So variable1 could have 100 data points and variable2 127 data points.

Please advise? Thanks.
1. Should I sum the values per day and set the value to zero when there is no data point. To get the same number of “data points” for both variables?
2. Or is there a good method to work with different number of data points?

Thanks again.
All the best.
Reply
- Charles
  
  September 24, 2019 at 9:22 am
  
  It really depends on what criteria you are using for the variables being alike. The two sample t test may be used even if the samples have different sizes.
  See https://real-statistics.com/students-t-distribution/two-sample-t-test-equal-variances/
  Charles
  Reply
  - Aad
    
    September 24, 2019 at 10:37 am
    
    Hi Charles,
    
    Thanks for your reply.
    
    Can you clarify what criteria I can choose from?
    
    I have tried a t-test, but because the variables are time series I hoped there would be a better method.
    
    Thanks again
    Reply
vichr

August 24, 2019 at 5:24 am

Hello, Charles

I would like to ask you regarding Spearman’s rank correlation on three variables, if it is okay with you.

I have three variables: speed, accuracy, and performance. Could Spearman’s rank correlation be used? The problem is that there would be so many d’s to use, and the formula gets all jumbled. I am sorry if I sound a little ignorant on my part, I just really want to know.

Thank you!
Reply
- Charles
  
  August 24, 2019 at 9:06 am
  
  Just calculate the ranks of the data in each of the three variables individually. Then either use the Real Statistics MCORREL function on the ranked data or perform linear regression (which outputs the correlation coefficient) on the ranked data.
  Charles
  Reply
  - DAVIDE
    
    December 12, 2019 at 5:26 pm
    
    Hello Charles,
    I have the same problem: I would calculate the Spearman’s correlation rho between more than three variables (seven or more) and I used the Real Statistics MCORRELL function, but it didn’t work: the program gave me an error message: How can I do?
    Thanks so much,
    Davide
    Reply
    - Charles
      
      December 13, 2019 at 9:31 am
      
      Davide,
      
      If you are looking for the pairwise Spearman’s rho between the 7 variables, then you can use the following Real Statistics array formula
      =CORR(RANKCOL(R1))
      Here R1 is a range that contains your data, one column for each variable.
      If, for example, you have 7 variables with the data in range A1:G100 (i.e. each sample contains 100 elements), then you need to highlight a 7 x 7 range enter the formula =CORR(RANKCOL(A1:G100)) and press Ctrl-Shft-Enter to get the result.
      
      If instead you are looking for one number, i.e. the correlation between one of the 7 variables (the dependent variable) and the other 6 (the independent variable), you would typically conduct a linear regression and use the square root of the R-square value. Since you want Spearman’s correlation, then you would first need to rank the data. You have two choices: (1) rank all the data together or (2) rank the data for each variable separately. In this later case, the result would be the same as the output from the following array function:
      =RSquare(RANKCOL(R1),1)
      In this case the first column would contain the data for the dependent variable (if the 5th column corresponds to the dependent variable then you would replace the 1 by a 5). Even though this produces a single value, it is still an array formula and so you need to press Ctrl-Shft-Enter.
      Charles
      Reply
      - DAVIDE PELLEGRINO
        
        December 16, 2019 at 12:38 pm
        
        Hi Charles,
        Please, might I send you an email with an attached excell sheet containing the dataset?
      - Charles
        
        December 16, 2019 at 7:04 pm
        
        You can send me an email with an attached Excel sheet with your data.
        Charles
mayuri

April 13, 2019 at 8:01 pm

Hello Charles

Could you please clarify something for me.. (I am statistically very naïve)

I would like to correlate 2 continuous variables of accuracy and efficiency. The efficiency score is the quotient of the accuracy score and time taken. The raw data accuracy values vary from 1 to 50 in increments of 1, whereas the efficiency score varies from 0.01 to 0. Can I correlate these two measures even though an increase in 1 for accuracy might result in an increase of only 0.005 in the efficiency score?

Am I totally off on the wrong track? I considered methods comparison as an alternative but am worried about the same concept as described above

Thank you
Reply
- Charles
  
  April 14, 2019 at 10:07 am
  
  You can certainly correlate these two variables provided they have the same number of elements. Note that if you multiply all the data for one of the variables by a fixed amount, the correlation doesn’t change.
  Once you calculate the correlation coefficient, how do you plan to use the result? What are you trying to test?
  Charles
  Reply
Maurice Kioko

February 23, 2019 at 10:36 am

Hello, what happens when so many values are repeated, how can we rank them and how do we calculate coefficient.
Reply
- Charles
  
  February 23, 2019 at 11:02 am
  
  Hello Maurice,
  Excel’s RANK.AVG handles ties. E.g. the values 2, 4, 6, 7, 9, 20 have ranks 1, 2, 3, 4, 5, 6. But 4, 4, 5, 5, 5, 6 have ranks 1.5, 1.5, 4, 4, 4, 6. In this last example the two 4’s have ranks 1 and 2, and so the ranks are reassigned as 1.5 and 1.5 (the average). Similarly, the three 5’s have ranks 3, 4, 5, and so these ranks are reassigned as the average, namely 4, 4, 4 (here 4 = (3+4+5)/3).
  Spearman’s rank correlation is simply =CORREL(R1,R2) where R1 = the ranks of the first data set and R2 = the ranks of the second data set.
  Charles
  Reply
Ruth

August 16, 2018 at 2:51 pm

Hello!

First of all, thank you for building up this website!! your didactic style is really good.

Could you please help me out with my questions?
I calculated a spearman’s rho of 0,6 in a sample of 50. Could one say that there is a probable correlation/there could be a correlation between the two tested variables?
And how do I calculate a significance for Spearman’s rho?

Thank you very much in advance!

Greetings,

Ruth
Reply
- Charles
  
  August 16, 2018 at 4:12 pm
  
  Ruth,
  See https://real-statistics.com/correlation/spearmans-rank-correlation/spearmans-rank-correlation-detailed/
  Charles
  Reply
Pingback: Populismus und Ungleichheit in Europa – Weltneuvermessung
Mauricio

October 2, 2017 at 4:37 pm

Hi Charles

I did the example that is in the web site and the results of the Spearman’s formula vs the Pearson’s correlation of the ranks are the same, only if no exist one or more observations with the same value (for each variable).

In your example, if you change the 6th observation of the “IQ” variable (105) with the same value of the 5th observation (123) the result using Spearman’s formula is -0.0212121, but if we do the calculation using the Pearson’s correlation of the ranks [ with excel formula is CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1)) ] the result is -0.0243162.
Reply
- Charles
  
  October 3, 2017 at 8:27 am
  
  Mauricio,
  When I do what you have suggested, I get the same values, namely SCORREL(R1,R2) = -0.0243162.
  Charles
  Reply
  - Mauricio
    
    October 3, 2017 at 10:36 pm
    
    Hi Charles
    
    I can’t see the SCORREL() code, but I suppose that this methodology uses the Pearson correaltion in the ranks [as CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))] because the results are the same that you showed. Now, if you use the “tratidional” Spearman’s formula, not using SCORREL(), the results are different.
    Reply
    - Charles
      
      October 4, 2017 at 9:14 am
      
      Mauricio,
      What is the tradition Spearman’s formula? Is it the formula that is only valid when there are no ties?
      Charles
      Reply
      - Mauricio
        
        October 4, 2017 at 10:22 pm
        
        Sorry, I haven’t been so clear. I’ll try to explain it.
        
        In your example, all the observations of “IQ” are different (even the observations of “Rock”, but the analysis is in “IQ”); with this values the Spearman’s correlation calculated with the “traditional” formula [1 – (6*sum(d^2))/(n^3 – n)] is -0.11515152, and if it calculated with the Pearson’s correlations of the ranks [CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))] the value is the same (-0.11515152).
        
        But, if one or more observations has the same value the results are not the same. In the “IQ” variable, I changed the 6th observation (105) for the same value of the 5th observation (123); with the “traditional” formula [1 – (6*sum(d^2))/(n^3 – n)] the Spearman’s corrlation is -0.02121212 and, with the Pearson’s correlations of the ranks [CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))] the result is -0.02431622.
        
        Thank you for your time.
      - Charles
        
        October 5, 2017 at 8:30 am
        
        Mauricio,
        Yes, you are correct. The traditional formula is only valid when there are no tied values.
        Charles
Gem

July 31, 2017 at 6:15 pm

Hello Charles,

I’m trying to run a Spearman correlation analysis on certain data. I’ve read that Spearman isn’t recommended for non-monotonic trends. Based on scatterplot analyses, I noticed that some of my variables do not have a monotonic relationship.

Is there a way to transform the data, to allow me to run a Spearman analysis? Or should I just proceed with Spearman?

Thank you.
Reply
- Charles
  
  July 31, 2017 at 6:56 pm
  
  Gem,
  It probably depends on why you want to use Spearman’s correlation in the first place. I don’t which transformation you would use since the most common ones preserve monotonicity.
  Charles
  Reply
  - Gem
    
    July 31, 2017 at 7:33 pm
    
    Thank you for your reply.
    
    I’m using Spearman because one of my variables are ordinal, and my variables don’t exhibit a linear relationship and there are outliers. My aim is to just determine correlation among y (a continuous variable) and my different x variables. My only concern is the lack of monotonicity.
    
    Would the lack of monotonicity be a big issue? Thank you again.
    Reply
    - Charles
      
      August 1, 2017 at 7:18 am
      
      Gem,
      Correlation is a measure of association when the data is more or less linear (and so monotonic). Spearman’s correlation is on the ranks, but the ranking function is monotonic. I would say that monotonicity is important, but as usual the devil is in the details — i.e. minor violations of monotonicity are probably not very important; more important are trends and major departures from monotonicity (actually in linearity in the ranks).
      Charles
      Reply
Harrison

May 20, 2017 at 7:41 am

Hey can you help me out on a project on how to do the R2 spearman rho?
I’m doing a correlation study on bullying and self-esteem level.
I have 40 participants who took a survey to determine their level of self-esteem and to determine if they’ve been bullied before or not.
So i have 2 variables, Bullied and self-esteem. The range of the self esteem is Moderate=(21-33) and High=(34-45)
For bullying, (Yes=28 people) (No=12 people)
For self-esteem (Moderate Self-esteem=23 people)(High S.E.=17 people)
The chi-square statistic is 1.7586. The p-value is .184797. This result is not significant at p < .05.

So how can I use my data and use it on R-squared spearman rho?
Thanks.
Reply
- Charles
  
  May 20, 2017 at 7:59 am
  
  Harrison,
  Use the raw data to calculate Spearman’s rho as described on the referenced webpage and then square the result.
  Charles
  Reply
Pingback: Rank Correlation calculation – badripatro
Meggymegg

February 23, 2017 at 11:42 am

Hi Charles,

I have 100 randomly selected variables(v1) and I need a set of 100 variables (v2) which has Spearman’s rho of 0,4. I mean, I know the correlation(Spearman’s rho) between two sets of data(0,4) but for each of the v1, I need to find a correlated v2 value. I would appreciate if you can explain how I can formulate this on excel?

Thank you for your kind help..
Reply
- Charles
  
  February 23, 2017 at 5:13 pm
  
  I assume that you have a sample S1 with 100 elements and you are looking for another sample S2 such that Sprearman’s rho(S1, S2) = .4.
  
  I can solve the problem if you use Pearson’s correlation instead of Spearman’s correlation in the following manner:
  1. Assume that the data in S1 is located in range A1:A100
  2. Place a zero in all cells of range B1:B100 (actually you place any values you like in these cells)
  3. Insert the formula =CORREL(A1:A100,B1:B100) in cell D1
  4. Use Excel’s Goal Seek capability (Data > What-if Analysis|Goal Seek) and enter D1 for Set cell, .4 for To value and B1 for By changing cell, and then press the OK button.
  Range B1:B100 contains the values that you are looking for.
  
  For Spearman’s rho you can use the Real Statistics function SCORREL instead of CORREL, but the Goal Seek capability may not converge to an answer unless you are more clever about the values you put in range B2:B100.
  
  Charles
  Reply
- Meggymegg
  
  February 24, 2017 at 7:21 am
  
  Thanks Charles for your prompt response..
  Reply
Kevin Bluxome

January 28, 2017 at 9:26 pm

For those who have been asking about how to calculate Spearman’s rho when there are ties, this is how you would do it manually.

1) Rank the values in each of the two groups (call them X and Y) individually, representing tied values by the average of the ranks they would have been assigned had they not been tied.

2) Calculate the average of the ranks for the two groups (X-bar and Y-bar).

3) Calculate (X minus X-bar)*(Y minus Y-bar) for each paired data point and add these values together. Note that some of these products may be negative if one value is above its mean and the other is below it.

4) Divide the total in step 3 by n-1, where n is the number of pairs. This is the covariance of the ranks.

5) Divide the above covariance by the product (Sx * Sy), where Sx and Sy are the sample standard deviations (using n-1) of the ranks of the two groups X and Y.

The result of this division is the Spearman’s rho value. Note that it may not be equal to the value of SR given by the formula with the 6; as a. Matter of fact, if there are extensive ties, there may be quite a difference. But if there are no ties, it is mathematically equivalent to the formula with the 6.

Good luck, everyone!
Reply
- Kevin
  
  July 21, 2017 at 3:06 pm
  
  I have since learned of perhaps a simpler way to perform Spearman’s rank when there are ties, so here it is for those who are interested. It certainly requires less calculation.
  
  For each case of n-way ties involved for both variables, calculate their respective weights (n^3-n)/12. So a 3-way tie in one variable would have a weight of (3^3 – 3)/12, or 2. Add the weights for each individual case of ties to the sum of the squared rank differences, then complete the rest as usual (multiplying by 6, dividing by n^3-n, and subtracting from 1). Perhaps this will be easier to implement.
  Reply
  - Charles
    
    July 31, 2017 at 10:16 pm
    
    Thanks Kevin for your suggestion.
    Charles
    Reply
Marcin

January 20, 2017 at 7:24 pm

Hey, I’m running some calculations in Excel and have very strange results: is it possible that for the same dataset Pearson is positive and Spearman is negative?
Reply
- Charles
  
  January 20, 2017 at 10:18 pm
  
  Yes
  Reply
Alex

December 30, 2016 at 12:30 pm

Hi All,
If i have such data (Below). How do i calculate the Spear man’s Rank order for the

Statistical Parameter D R A S T I C
Mean 6.5 4.5 7.0 8.0 6.0 8.5 1.5
Std Error 3.5 1.5 3.0 2.0 3.0 1.5 0.5
Median 6.5 4.5 7.0 8.0 6.0 8.5 1.5
Minimum 3.0 3.0 4.0 6.0 3.0 7.0 1.0
Maximum 10.0 6.0 10.0 10.0 9.0 10.0 2.0
SD 4.95 2.12 4.24 2.83 4.24 2.12 0.71
CV (%) 76.15 47.14 60.61 35.36 70.71 24.96 47.14

NB: DRASTIC is a model used for groundwater Vulnerability Assessment.

Any body with an idea kindly assist.
Reply
- Charles
  
  December 30, 2016 at 11:45 pm
  
  Alex,
  I believe that you will need the raw data and not just the summary data that you have provided.
  Charles
  Reply
Claire Welburn

October 25, 2016 at 10:54 pm

Hi Charles,

Thanks for your brilliant information.

I have 150 rows of data across 20 variants to compare. I am using ranked data. My plan is to use Spearman Rho, and I need to do the complex method as I have quite a lot of tied data. Your example is very clear and I have this under control.

Can you tell me what is the difference between the Excel CORRELL function, and doing all the calculations needed for Spearman Rho? They do give slightly different answers, but when I looked up the CORRELL function it suggested to me it was doing what I’m doing to get Spearman Rho? Sorry I am probably confused, but would love to know the answer.

Many thanks,
Claire
Reply
- Charles
  
  October 26, 2016 at 7:57 am
  
  Claire,
  The CORREL function outputs the Pearson’s correlation coefficient. Spearman’s rho is Pearson’s correlation coefficient, not on the original data, but on the ranked data.
  Charles
  Reply
Gema Sanz

October 7, 2016 at 11:43 am

Hi Charles,

Thank you for your explanations.

I need to do a Spearman correlation with correction for multiple testing of 2 variables but instead of getting the rho coeficient of the 2 variables, I need to get rho for each pair of data, i.e., the table should look like this:

gene var1 var2 rho p-value FDR
SFRP1 -0.44 -8.96 -0.0130 0.7130 0.9258
PCDH10 -0.28 -7.21 0.2530 0.3630 0.7256
STMN2 -0.10 -6.76 -0.7050 0.0050 0.0158
LCE3D -0.35 -6.00 0.0590 0.8340 0.9657
LY6K 0.64 -5.64 -0.5970 0.0190 0.6852

By “pair” I mean each individual gene considering the var1 and var2 values.

Any light on how I could do this analysis?

Thank you very much in advance.

Best
Gema
Reply
- Charles
  
  October 7, 2016 at 12:36 pm
  
  Gemma,
  It sounds like you are looking for the correlation matrix using Spearman’s correlation instead of Pearson’s correlation. You can do this by ranking the data (as for Spearman’s correlation). Then create a correlation matrix based on the ranked data. E.g. you can use the Real Statistics CORR function for this.
  Charles
  Reply
  - Gema Sanz
    
    October 7, 2016 at 1:12 pm
    
    Thanks!
    I just installed the add-in and I will have a try
    
    Gema
    Reply
  - Gema Sanz
    
    October 7, 2016 at 1:17 pm
    
    PS. Is there any example of what you suggested in your example data files?
    Reply
    - Charles
      
      October 8, 2016 at 7:04 pm
      
      Gema,
      I don’t think so, but it is pretty easy to do.
      Charles
      Reply
vladimir miranda

October 5, 2016 at 3:21 pm

Hi charles I am wondering because I am doing a likert scale analysis using herzeberge two factor theory, do i use the mean for my central tendency and sd? How about pearson and anova? I am very new at this TIA
Reply
- Charles
  
  October 5, 2016 at 7:26 pm
  
  Vladimir,
  It depends on what you are trying to test.
  Charles
  Reply
  - vladimir miranda
    
    October 6, 2016 at 12:41 am
    
    Using the Minnesota Satisfaction Questionnaire i want to find out which factor affects the teacher the most. How do i analyze the data? I have seen some works that uses mean and standard deviation is that enough? Are there better test for this? N=15 only. TIA very much appreciated.
    Reply
    - Charles
      
      October 6, 2016 at 8:53 am
      
      Vladimir,
      I don’t know which test you are referring to.
      The answer depends on how you are doing your evaluation. E.g. if you are using regression, then one way to determine which factors have the largest effect is to use Shapley-Owen.
      Charles
      Reply
      - vladimir miranda
        
        October 6, 2016 at 9:18 am
        
        thanks =)
      - vladimir miranda
        
        October 7, 2016 at 3:42 pm
        
        hi Charles it’s me again. I am going to compare the job satisfaction and job motivation of new and old teachers. Can I use the T Test for this one. My hypothesis would be there is a relationship between tenure and job motivation and satisfaction. TIA
      - Charles
        
        October 8, 2016 at 7:17 pm
        
        Vladimir,
        If, for example, you can use the t test to compare the job satisfaction scores for teachers with tenure with the job satisfaction scores for those without tenure.
        Charles
      - vladimir miranda
        
        October 9, 2016 at 2:39 am
        
        Yes that is the hypothesis. I have read https://www.researchgate.net/publication/266212127_Five-Point_Likert_Items_t_test_versus_Mann-Whitney-Wilcoxon
        
        now that I am going to use the T test, do i get the total score or mean for each question before doing the test. TIA
      - Charles
        
        October 9, 2016 at 7:30 am
        
        Vladimir,
        It really depends on what you want to test.
        Charles
Zak Griffith

September 28, 2016 at 7:49 pm

I was wondering whether it is known as to how/why Spearman’s Rank formula works, rather than how to use it?
Reply
- Charles
  
  September 29, 2016 at 10:33 am
  
  Zak,
  Spearman’s rank correlation is used because ranked data is normally distributed, and so has some desirable mathematical properties.
  Charles
  Reply
  - Zak Griffith
    
    September 29, 2016 at 6:33 pm
    
    Sorry, I probably wasn’t very good at asking what I meant, I was wondering as to exactly why the formula for Spearman’s Rank itself actually works. Why ρ = 1 − (6 ∑ d^2)/(n ( n 2 − 1 )) is a formula that gives you a correlation between to sets of data and why this digit is between 1 and -1. Thank you, though, for the reply.
    Reply
    - Charles
      
      September 30, 2016 at 10:29 am
      
      Zak,
      That formula was popular when people used to do these calculations by hand. Today with computers you don’t need to use this formula, but simply calculate the correlation coefficient of the ranks.
      As to why the formula works, you need to do a little algebra to show that this formula is equivalent to calculating the correlation coefficient of the ranks (at least when there are no ties). I haven’t included this algebra because I thought that there wouldn’t be much interest in it.
      Charles
      Reply
Revathy

August 15, 2016 at 5:48 am

What If the given sample gets repeated? How to rank such repeated sample? For example if in rock 45 is repeated twice?
Reply
- Charles
  
  August 16, 2016 at 10:01 am
  
  Revathy,
  The approach is exactly as explained on the referenced webpage. The RANK.AVG function takes care of ties. If ROCK 45 is repeated then each will have a rank of 9.5.
  Charles
  Reply
Jem

July 28, 2016 at 6:53 am

Dear Charles,

Thank you for this! I would just like to ask if there is any way this could be done when comparing data across time. Thank you!
Reply
- Charles
  
  July 28, 2016 at 8:06 am
  
  Jem,
  Can you give me an example?
  Charles
  Reply
  - Chisara
    
    July 28, 2016 at 5:39 pm
    
    Sorry to hijack this, but I would like to do the same. You see, I am comparing data in which the correlation would be lagged eg weather data that effect population further down the line. How would I do that? Also, how does one get a p-value from here?
    Reply
    - Charles
      
      July 28, 2016 at 7:45 pm
      
      Chisara,
      Please explain what you want to do in more detail. I don’t completely understand. Are you looking for a ranked version of autocorrelation?
      Charles
      Reply
      - Chisara
        
        July 29, 2016 at 1:18 am
        
        Does ranked autocorrelation allow you to see where the best correlation would occur? I am trying to see if weather affects on earlier generations affect future ones and I need to match up the the variables to see where there is the most correlation.
      - Charles
        
        August 1, 2016 at 1:16 pm
        
        Chisara,
        I still don’t really understand your questions. Why can’t you simply calculate the correlation matrix and see which coefficients are highest?
        Charles
yask

July 8, 2016 at 7:56 pm

Hi
I would like to calculate spearman’s rank order correlation coefficients for 30 variables and then create a matrix to determine which variables are highly correlated with each other. May I have any appropriate example for that?
Reply
- Charles
  
  July 9, 2016 at 8:02 am
  
  Yask,
  Assuming you are interested in pairwise correlations, I suggest that you calculate the ranks of the data for each of the 30 variables. You can do this as described on the referenced webpage using the RANK.AVG function. Once you have done this you can use the technique shown on the webpage https://real-statistics.com/multiple-regression/least-squares-method-multiple-regression/. In particular, you can use the Real Statistics CORR function.
  Charles
  Reply
  - yask
    
    July 11, 2016 at 3:14 pm
    
    Hi
    I was wondering if this will be still spearman rank order correlation coefficient?
    Are you suggesting to use this formuala CORR(R1) = MMULT(TRANSPOSE((R1-R2)/R3),(R1-R2)/R3)/(ROWS(R1)–1)
    Reply
    - Charles
      
      July 12, 2016 at 4:48 pm
      
      Yask,
      Yes, where R1 contains the ranks of the original data and R2 and R3 are derived from the ranked data.
      You can check to see whether this works by using the SCORREL function on any pair of columns in the original data.
      Charles
      Reply
Kartik

June 12, 2016 at 3:03 pm

Who to find xi
Reply
- Charles
  
  June 12, 2016 at 3:19 pm
  
  Sorry, but I don’t understand which xi you are referring to.
  Charles
  Reply
Lily

April 17, 2016 at 1:47 am

Dear Charles,
First I want to thank you for your efforts on the website. Now my question:compared to using the formula for calculating spearman,using =CORREL (…) is a very easy way to get the same result. Unfortunately, in some rare cases I achieve the same result for spearman rho through both formula and =CORREL. This happens mostly in a sample of 10 respondents. In all other cases I’ve two different results. Meanwhile I checked every step so many times but unfortunately I’m not able to achieve the same outcome through both ways consistently. I hoop my problem is clear.

Thank you.
Lily
Reply
- Charles
  
  April 18, 2016 at 12:01 pm
  
  Dear Lily,
  I don’t completely understand the problem that you are identifying. Can you send me an Excel file with an example? You can send it to my email addres, which is listed on the Contact Us webpage.
  Charles
  Reply
- Charles
  
  May 24, 2016 at 9:05 am
  
  Lily,
  
  Thanks for sending me the Excel file with your data.
  
  The formula (1-6*sum/n(n^2-1)) is only valid when there are no ties. In my example there were no ties; in your example there are ties. Using CORREL on the ranked data will always work. Using SCORREL on the original data will always work.
  
  Charles
  Reply
Dan

April 16, 2016 at 4:09 pm

Would you mind changing the example above? I often find that rap music is a stand in for race in many circles. Psychology has such a tainted history of manufacturing differences in IQ according to racial lines. I’m sure you didn’t mean to create a racist example, but I think there are so many other scenarios you could use that don’t reinforce the experience of so many people with stereotypes.
Reply
- Charles
  
  April 18, 2016 at 12:04 pm
  
  Dan,
  No offence was intended. I see your point and will change the example shortly.
  Charles
  Reply
Henry

April 14, 2016 at 2:14 pm

Good afternoon,
Thank you for the page, it is really helpful.
I just have comment: how can I deal with missing observations? I mean, I need to calculate a Spearman correlation matrix, and unfortunately I need to deal with missing observations which are different for each of the variables.
Can you please help?
Thanks!
Reply
- Charles
  
  April 15, 2016 at 3:00 pm
  
  Henry,
  When you say that you have a Spearman correlation matrix, are you referring to a square matrix whose elements are pairwise Spearman’s correlation coefficients based on ranks (and not Pearson’s correlation coefficients)? How many variables are there (i.e. what is the dimension of the correlation matrix)? How many subjects are in the sample?
  Charles
  Reply
  - Henry
    
    April 15, 2016 at 11:13 pm
    
    Yes I have a set of 5 ordinal variables for which I would calculate all the correlations, something like the normal square correlation matrix but using Spearman rho instead.
    My problem is that each variable has different missing observations (for different units, so the size of the sample is different for each variable, going from 87 to 98). I am not sure on how to deal with this in excel.
    Thanks for your help!
    Henry
    Reply
    - Charles
      
      April 16, 2016 at 6:49 am
      
      Henry,
      You can calculate the correlation coefficient for each pair (with listwise deletion of missing data for each pair). This will give you C(5,2) = 10 correlation coefficients plus 5 variances. Put these into a 5 x 5 matrix (here 10 x 2 + 5 = 25) and you have a pseudo correlation matrix. This matrix won’t have all the properties of a real correlation matrix, but if you don’t have too much missing data it will be pretty close.
      Charles
      Reply
Faith Modie

March 11, 2016 at 8:49 am

Good morning, please give me an example of calculating spearman’s rho when having a tie.
Reply
- Charles
  
  March 12, 2016 at 11:06 pm
  
  Hello Faith,
  Just change the first IQ value in Figure 1 of the referenced webpage from 99 to 90 and you will get a tied value. You can do this on the Excel worksheet that you can download from Examples Workbooks.
  Charles
  Reply
Allan Castillo

March 7, 2016 at 4:13 am

Hi, Charles.

Using the Resource Pack, the value Ive got for Spearman for the above unranked data is 0.1244 and NOT -0.115. However, the Spearman value is the same for unranked and ranked dataset in Example 1 of Spearman’s Rank Correlation Hypothesis Testing page (https://real-statistics.com/correlation/spearmans-rank-correlation/spearmans-rank-correlation-detailed/). As I understand it, the Resource Pack correlation option does the ranking work for us such that we do not have to rank the dataset by ourselves and we just put the range of the unranked datasets in the correlation menu of the Resource Pack to get the correlation coefficient, is this so? Thanks very much.
Reply
- Allan Castillo
  
  March 7, 2016 at 4:21 am
  
  by “above unranked data” I mean Figure 1 of example 1. Thanks.
  Reply
- Charles
  
  March 21, 2016 at 5:25 pm
  
  Allan,
  
  Unfortunately, there is an error in the SCORREL function. The correct Spearman’s correlation value is -0.115 and not 0.1244. This error occurs when one of the data elements is zero. I will correct this in the next release, which I will issue in the next day or so.
  
  The Resource Pack does the ranking for you. You don’t need to do it yourself.
  
  Thanks for your comment. Thanks to following up on your comment I was able to discover that the SCORREL has a bug.
  
  Charles
  Reply
lukku

March 2, 2016 at 11:43 am

how to find R1 & R2..?
Reply
- Charles
  
  March 2, 2016 at 3:52 pm
  
  Lukku,
  R1 and R2 represent two data ranges. If you want the correlation coefficient corresponding to the data in A1:A10 and B1:B10, you can think of R1 as representing A1:A10 and R2 as representing B1:B10.
  Charles
  Reply
Irene

February 24, 2016 at 10:22 am

I want to know the correlation of the weighted means of my two set of respondents, the parents and students… I used five-point likert scale. I ranked responses based on the statements given on the two groups of respondents according to the weighted means. Now, my question is how am I going to correlate it? Thank you.
Reply
- Charles
  
  February 24, 2016 at 8:06 pm
  
  Irene,
  You can simply calculate the correlation coefficient using the CORREL function. Now the real questions is how meaningful is this correlation. This depends on what you are trying to accomplish.
  Charles
  Reply
  - rachel
    
    March 6, 2016 at 12:34 pm
    
    witty
    Reply
Madeleine

January 26, 2016 at 5:17 pm

Hello,

I have tried using =CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1)) but Excel really doesn’t like the (R1,R1,1) or (R2,R2,1) bits.

I’ve substituted them for (A2,A12,1) and (B2,B12,1) because my data is in the first two columns, 11 pieces each from rows 2-12. It doesn’t help. I don’t know why there are three terms in the brackets anyway – what is the “1” for at the end?

I’ve also tried to do it the long way round – do a RANK.AVG on each column and then do another column for d and one for d². But I can’t work out what to type after RANK.AVG. The Help function doesn’t really help.

(I know it would be easier to rank by hand, but ultimately I want to enter 120 pairs of data).

Thanks to anyone who can help me…
Reply
- Charles
  
  January 27, 2016 at 9:29 am
  
  Madeline,
  
  R1 and R2 are just abbreviations for range 1 and range 2. You can’t enter these letters in the formulas unless you are actually referring to cells R1 and R2.
  
  More importantly, when you enter a range into any Excel formula you need to specify the range by its end points separated by a colon (not a comma). Thus your two ranges are probably A2:A12 and B2:B12. The formula you are probably trying to use if therefore
  
  =CORREL(RANK.AVG(A2:A12,A2:A12,1),RANK.AVG(B2:B12,B2:B12,1))
  
  Charles
  Reply
Phoenix

January 6, 2016 at 8:00 pm

Is there any way I can solve two weighted means that do not have the same number of variables using spearman’s?
Reply
- Charles
  
  January 7, 2016 at 11:06 am
  
  Phoenix,
  Sorry, but I don’t understand what you mean by “solve two weighted means” nor why you would want to use spearman’s to do this.
  Charles
  Reply
Diswat

January 6, 2016 at 4:57 pm

Dear Sir
What can I do if i have 3 repetitions.According to your table if student have the same IQ score how can you rank them and how can you calculate?
Best Regard
Reply
- Charles
  
  January 7, 2016 at 11:21 am
  
  As described on the referenced webpage, you can calculate Spearman’s rank correlation by using the formula
  =CORREL(RANK.AVG(R1,R1,1),RANK.AVG(R2,R2,1))
  
  The RANK.AVG function takes care of any ties.
  
  You can also use the Real Statistics SCORREL(R1,R2) function.
  
  Charles
  Reply
Stacey Provan

January 4, 2016 at 8:59 pm

Hi Charles,

I am completing an elective project and am trying to establish if there is a correlation between being a member of a certain organisation and whether or not someone thinks a subject matter should be included in training and development (answer options to this are simply yes and no).

Firstly, is this possible to do? And if so how do I do it?

Stacey
Reply
- Charles
  
  January 5, 2016 at 9:03 am
  
  Stacey,
  You can calculate the correlation using the CORREL function. It is hard to tell from the summary you have given, but a t test may be sufficient for your purposes.
  Charles
  Reply
Ankit Gandhi

October 26, 2015 at 5:15 am

Is it possible to use Spearman’s rank correlation between 3 and 5 point likert scale questions?
In my survey I am asking question on attitude and purchase behavior, but both of them have questions in different likert scale. Attitude questions are in 5 point likert scale and purchase behavior question are in 3 point likert scale. So is it possible to use Spearman Rank correlation between these two different likert scale question in testing the Hypothesis?
Reply
- Charles
  
  November 3, 2015 at 4:33 pm
  
  Yes, you can calculate a Spearman’s rank correlation coefficient from different Likert scales. What hypothesis do you want to test?
  Charles
  Reply
Matthias

September 22, 2015 at 9:34 pm

Hi Charles,

what would be an example for the independence case, i.e., r = 0, with ranked values?
Reply
- Charles
  
  September 23, 2015 at 8:04 am
  
  I am not sure that I completely understand your question, but you may find the answer on the following webpage:
  Hypothesis Testing of Spearman’s Rank Correlation.
  Charles
  Reply
malik

September 1, 2015 at 8:48 am

Dear Charles!
Is there any way to find Spearman’s Rank Correlation, when number of variables are not same?
example X has 35 values and Y has 30 values.
Reply
- Charles
  
  September 2, 2015 at 6:40 am
  
  Just like Pearson’s correlation, this statistic performs a calculation based on pairs of data elements from X and Y. Thus the X and Y counts need to be the same.
  Charles
  Reply
Hoplite

June 26, 2015 at 5:09 am

Charles,

Once again – great website!

How do you conclude with a spearman rho of -0.115 and a Pearson of -0.036 that ‘the Spearman’s rho is closer to zero (indicating independent samples) than the Pearson’s.’? Isn’t Spearman indicating they are more negatively correlated than Pearson?
Reply
- Charles
  
  June 26, 2015 at 8:30 am
  
  Thanks for identifying this mistake. I have removed this phrase, which is clearly in error, from the webpage.
  Charles
  Reply
Tina

April 30, 2015 at 1:34 pm

Hello,
This is a very helpful explanation, thank you.
You mention that a different approach is needed with a set of data with many ties, and I hope you can help me out, as I am no mathematician, but trying to analyse data from some language tests. I have 46 participants with three test results each (they have scored between 1-6 in each test). I would like to compare the results in two sets of test results to see if they correalate, but there is a large number of ties – what would you suggest that I do?
Thanks in advance, Tina
Reply
- Charles
  
  April 30, 2015 at 7:45 pm
  
  Tina,
  Spearman’s handles ties by averaging the ranking values; this is the equivalent of RANK.AVG. I haven’t been able to find a ties correction factor for Spearman’s. In any case, Kendall’s tau is usually recommended over Spearman’s and Kendall’s tau does have a ties correction factor as explained on the webpage Kendall’s Tau with Ties.
  Charles
  Reply
Hammad Khan

April 20, 2015 at 1:01 pm

Would you please explain..
1. In which order we rank the data? Ascending or descending? Which value is labelled as no 1, highest or lowest???

2. Why it is multiply by 6?

Thanks.
Reply
- Charles
  
  April 20, 2015 at 9:04 pm
  
  1. As you can see from Figure 1, the largest data gets the highest ranking.
  2. I don’t know why the formula has a multiplier of 6. I guess the mathematics just worked out that way.
  Charles
  Reply
  - Kevin Bluxome
    
    September 4, 2016 at 6:13 pm
    
    Hi Charles,
    
    Here’s the reasoning behind why there is a 6 in the Spearman’s rank formula…it’s actually pretty insightful. If the two rankings are identical in every place, obviously the sum of their squared differences will be 0, the minimum. If the rankings are in reverse order, the sum of squared differences is a maximum, and it can be shown to equal (n*(n^2-1))/3, where n equals the number of pairs. We want a “common” scale to judge these differences by; that is, we do not want this scale to depend on the number of pairs. In addition, we want the scale to range from -1 (perfect negative correlation) to 1 (perfect positive correlation). Therefore, we are looking for a transformation that takes 0 (minimum sum of squares) to 1 and that takes (n*(n^2-1)/3) to -1. If you find the slope of the line connecting the two points (0,1) and (n(n^2-1)/3, -1), you will find that it equals -6/(n(n^2-1)). There’s your 6!
    Reply
    - Charles
      
      September 5, 2016 at 6:53 pm
      
      Kevin,
      Tanks for your explanations.
      Charles
      Reply
Kevin

March 22, 2015 at 10:36 am

Thanks a lot. I see in method 2 above you have ranked the values with the smallest being assigned position 1. I tried the opposite so that the biggest value is number 1 and was getting the same answer. is that right? I have tried with other sets of data and the answer is the same whatever the ranking used.
Reply
- Charles
  
  March 22, 2015 at 10:28 pm
  
  Kevin,
  I believe that you are correct.
  Charles
  Reply
  - Kevin Bluxome
    
    September 4, 2016 at 6:20 pm
    
    Yes, this is correct. As long as you are consistent in your rankings of both groups, it doesn’t matter whether 1 corresponds to the highest or lowest value in the group. But if 1 corresponds to the highest value in group 1, it must also correspond to the highest value in group 2, and vice versa.
    Reply
mohsen

February 22, 2015 at 11:00 pm

Thank you, your information was very helpful. I am Civil engineer and my information about statistic is low. I want to use the first paragraph of this page in my thesis our article, I would appreciate if you give me a reference which includes the meaning of first paragraph.
Reply
- Charles
  
  February 23, 2015 at 7:43 am
  
  If you want to cite the webpage use the citation at https://real-statistics.com/appendix/citation-real-statistics-software-website/
  Charles
  Reply
Adipati Camma

December 30, 2014 at 11:32 am

Many thanks Charles. Your explanation is very helpful.
Reply
sumarno

December 5, 2014 at 9:03 am

your explanation is very helpful. It’d better be utilized by excel file to be down loaded. Thank U
Reply

146 thoughts on “Spearman’s Rank Correlation”

Leave a Comment Cancel reply