Tetrachoric Correlation Estimation

We now describe how to estimate the tetrachoric correlation coefficient, i.e. the polychoric correlation coefficient for 2 × 2 contingency tables. In what follows we assume that the contingency table has 2 rows and 2 columns where the element in the ith row and jth column is aij.

Method 1

An estimate for the tetrachoric correlation is given by (method 1):

tetrachoric correlation estimatewhereGamma

when aij > 0  for all i, j. If a21 = 0 or a12 = 0 then ρ = 1.  If a11 = 0 or a22 = 0 then ρ = -1.

When aij > 0 for all i, j, the asymptotic variance is estimated to be

Variance

Another commonly used estimate for the tetrachoric correlation is given by (method 2):

Delta

Tetrachoric correlation estimate

AlsoTetrachoric correlation variance

wheren

p1 and p2

hi = NORM.S.DIST(NORM.INV(pi),FALSE)

Method 1 Example

Example 1: Calculate the tetrachoric correlation coefficient for the data in the 2 × 2 contingency table (range B4:C5) of Figure 1.

Tetrachoric correlation using Solver

Figure 1 – Tetrachoric correlation using Solver

Using Solver, as we did for Example 1 of Polychoric Correlation using Solver, we calculate the tetrachoric correlation coefficient ρ = .364.

In Figure 2, we calculate an estimate of the tetrachoric correlation coefficient using method 1. This time we also calculate an estimate for the standard error, the 95% confidence interval for the tetrachoric correlation coefficient, and test the null hypothesis that the tetrachoric correlation coefficient is equal to zero.

Tetrachoric correlation estimate

Figure 2 – Tetrachoric correlation using method 1

We see that ρ = .386 and that the null hypothesis is rejected.

Method 2 Example

In Figure 3, we repeat the same analysis using method 2.

Tetrachoric correlation estimate

Figure 3 – Tetrachoric correlation using method 2

Examples Workbook

Click here to download the Excel workbook with the examples described on this webpage.

8 thoughts on “Tetrachoric Correlation Estimation”

  1. Hello Mr. Zaiontz,

    I was wondering what I should do if any of my contingency tables contains the value of 0. Does that render it impossible to calculate?

    Kind regards,
    Thierry

    Reply
    • Hello Thierry,
      I just used the TCORREL function to calculate the tetrachoric correlation for the contingency table in Example 1 where I changed one of the values to 0. The result was 1 or -1. For method 1, it is not possible to calculate the standard error.
      Charles

      Reply
  2. Dear Charles,

    Thank you for your explanation. I started by calculating a tetrachoric correlation matrix in SPSS using the MACRO provided by Lorenzo-Seva and Ferrando (2012) (“TETRA-COM: A comprehensive SPSS program for estimating the tetrachoric correlation.”). I noticed that all negative correlations were not significant, despite some of them being rather high, and that all positive correlations are significant, despite some of them being close to zero. This seemed strange to me, so I was looking for another way to calculate the tetrachoric correlations and I stumbled upon your page.
    I have tried to implement your method, but I have a few questions: (1) there are quite large differences in the p-value between Figure 2 and Figure 3. Why is that and which method should I use? (2) When I try to put the NORM.S.INV formula into SPSS, I am not allowed to do it your way, it suggests to use a ; instead of , before FALSE/TRUE. What should I do? (3) do you have any experience with the macro for SPSS? I want to do it correctly but I am not a statistician and I am getting confused. Thank you very much in advance! Amber

    Reply
    • Hi Amber,
      I don’t use SPSS and so I can’t give any advice about how to calculate the tetrachoric correlation using SPSS (issues 3). Regarding issue 2, some versions of Excel use commas and others use semi-colons. This depends on whether a comma is used as the decimal symbol in your country (in which case a semi-colon is used to separate arguments in formulas). I don’t know whether SPSS adheres to this approach, but perhaps it does.
      Regarding issue 1, the p-values don’t seem that different, but I can’t say that this is not the case when using other data. I don’t know which method is more widely used nor which version SPSS uses. I suggest that you find an example in a paper from your field and use the Real Statistics to calculate the value using both methods; hopefully the resulting value of one of these methods will match the results in the paper. I realize that this isn’t always so easy since it is often different to get the raw data.
      Charles

      Reply
  3. Hi,

    When I put the same values and formulae from Figure 2 into Excel, the p-value that I get (cell R9) is .870. Are you sure that your image is showing the correct Excel formula?

    Cheers

    Reply
    • Tim,
      The formulas in the cells were and are correct, but the display of the formulas was incorrect. I have now fixed this on the webpage, and so you should be able to get the same results.
      Thanks very much for catching this error. I really appreciate your help in improving the website.
      Charles

      Reply

Leave a Comment