Introduction
We now review the support that Real Statistics provides for carrying out propensity score matching (PSM). In particular, we explain Real Statistics support for steps 2, 3, and 4 described in Propensity Score Matching. This support is provided via the Propensity Score Matching data analysis tool, which will be available shortly as part of Rel 9.4.
Reference will be made to various worksheet functions. Except for LogitPred which is already available, the other referenced worksheet functions will also be available with Rel 9.4.
Calculate propensity scores
This Propensity Score matching data analysis tool works by first performing logistic regression to determine the probability that each sample element belongs to the treatment group. This is done by using the formula =LogitPred(R1, R2, TRUE).
Here R1 is an array where each column contains sample data for one confounding variable. The data should be sorted so that rows corresponding to the control group all occur before rows containing data from the treatment group. R2 is a column array containing 1 if the corresponding row in R1 comes from the treatment group and 0 if the corresponding row in R1 comes from the control group.
The output from this formula is a column array with the same number of rows as R1 (or R2) containing the probabilities that each row in R1 came from the treatment group based on the logistic regression model (whether or not they actual came from the treatment group).
Match records between the groups
Records are matched by using the following worksheet function:
PSM_Matching(R0, tindex, cutoff)
R0 is an array of probabilities (i.e. the output from the logistic regression formula described above). tindex is the number of the first row in R0 that comes from data in the treatment group.
PSM_Matching matches all elements in the treatment group part of R0 (i.e. rows starting with tindex) with the nearest element in the control group part of R1 (i.e. rows before tindex), provided that distance is not greater than cutoff.
PSM_Matching then returns a column array of zeros and ones with the same number of rows as R1 where 1 = match and 0 = no match.
Pruning
The output from PSM_Matching is now used to eliminate non-matched rows from the original data. This is done by using the following new worksheet function:
Pruning(R1, R2, b)
R1 is an array containing data (for confounding variables as well as the outcome variable) and R2 is the output from PSM_Matching (a column array consisting of zeros and ones).
The output from the pruning function consists of the data in R1 where any row with a corresponding value of zero in R2 is deleted. If b = TRUE (default FALSE) then the non-zero values in R2 are appended to the output. For PSM, the default of b = FALSE is used.
Note that after pruning, the number of retained treatment samples is equal to the number of control samples.
Evaluate quality of the matching
The following new function is used to determine the efficacy of the PSM matching:
MatchQuality(R1, b)
R1 is an array containing data for the treatment variable (with values 0 or 1), any confounding variables and the outcome variable. The first column of R1 contains the data for the treatment variable.
The output of MatchQuality consists of statistics for the treatment and control groups, one column for each of the variables in R1 except for the treatment variable. The statistics reported are the mean, variance, min, 25th percentile, median, 75th percentile, and max.
In addition, since b = TRUE (default), the following four statistics are reported for the residuals comparing the treatment values with control values for each of the variables in R1 (except for the treatment variable): p-value of Shapiro-Wilk test for normality, p-value of a two independent sample t-test (assuming unequal variances), p-value of a Mann-Whitney test, and p-value of a Brunner-Munzel test.
If b = FALSE, then the last two rows are not output. For PSM, the default of b = TRUE is used.
Example
Click here for an example that shows how all the steps fit together.
References
Garrido, M. M., Kelley, A. S., Paris, J., Roza, K., Meier, D. E., Morrison, R. S., and Aldridge, M. D. (2014) Methods for Constructing and Assessing Propensity Scores
https://pmc.ncbi.nlm.nih.gov/articles/PMC4213057/pdf/hesr0049-1701.pdf
Luvsandorj, Z. (2023) A Beginner’s Guide to Propensity Score Matching
https://builtin.com/data-science/propensity-score-matching
Wikipedia (2024) Propensity score matching
https://en.wikipedia.org/wiki/Propensity_score_matching
McKee, D. (2015) An intuitive introduction to propensity score matching
https://www.youtube.com/watch?v=ACVyPp1Fy6Y