Basic Concepts
To determine whether there is a significant difference between the effect on two groups (e.g. a treatment group and a control group), we normally randomly select a sample from the population being studied and then randomly assign subjects in the sample to the two groups. This is called an experimental study.
Sometimes it is impossible or unethical to randomly assign subjects to the two groups. For example, when studying the effects of smoking, we can’t assign non-smokers to the smoking group, and so there is no random assignment. The goal of experimental design is to reduce the chances that confounding variables will influence the outcome of the study. This is achieved since the random assignment of the sample groups tends to eliminate the effects of confounding variables.
When there is no random assignment of the sample into groups, then we have an observational study. We now need to take confounding variables explicitly into account. One approach for doing this is called propensity score matching (PSM), in which we identify all important confounding variables, and then determine matching subjects from the sample based on similar confounding variable values, pruning non-matching pairs.
Objective
In this way, we explicitly eliminate the effects of the confounding variables (instead of counting on the random assignment to accomplish this).
Propensity score matching is a quasi-experimental technique by attempting to balance the impact of confounding factors on the two groups so that we can draw conclusions about the impact of a treatment on the outcome using observational data.
Steps
To use propensity score matching, you need to follow the following steps:
- Collect data.
- Calculate the propensity scores.
- Match records between the groups.
- Evaluate quality of the matching.
- Determine the treatment effect on the outcome
Collect data
You need to collect sample data as for an experimental design, but now you need to make sure that you include all relevant confounding variables that could affect the outcome. At the same time, you don’t want to include too many confounding variables since this will make the matching process more difficult. The smaller the sample, the more the tradeoff favors fewer confounding variables. In this case, variables that are weakly associated with the outcome may need to be excluded.
You also want to make sure that the variables that determine which subjects are in which group are also included. These variables must be observable (i.e. variables for which we have or can obtain data). If you have unobservable variables that might determine which subjects are in which group, you need to make sure that such associations are weak (since you don’t have any data for them).
Calculate propensity scores
This is usually done using logistic (or probit) regression where the outcome (dependent) variable is the group (e.g. 0 for the control group and 1 for the treatment group). Here, we use the estimated (predicted) probability p of success, i.e. p = P(yi = 1|X) for each sample record (i.e. row of data) as the propensity score. The real outcome variable under study is not included in this regression.
The propensity score now serves as summary of all the confounding variables and provides the vehicle for matching data from the treatment group group with data from the control group. The strength of this approach is that it maps multivariate data (from the various various, including confounding variables) to one univariate variable, namely the propensity score. This is also the weakness of this approach since there is no guarantee that the propensity score really maps well to data from all the variables.
Match records between the groups
The goal here is to identify pairs of records whose propensity scores match. Ideally, we want to match one or more subjects in the treatment group with propensity score p with one or more subjects in the control group with the same propensity score (exact match case).
Since not all subjects in the treatment group will have an exact propensity score match in the control group, usually this ideal matching approach isn’t practical, and can result in few or no matches with a great loss in data.
There are various approaches for creating matches without pruning away too much data. We will focus on a simple approach whereby each subject in the treatment group is matched with its nearest neighbor. Thus for each subject i in the treatment group, we find the subject j in the control group such that
min |pi – pj|
where pi and pj are the corresponding propensity scores.
Also, we will only use 1:1 matches. Thus, if a subject in the treatment group has the same match with two or more subjects in the control group, we will only use one such match, thereby making the second such match in the control group available for a match with a different subject in the treatment group. Similarly, subjects in the control group can only be matched with one subject in the treatment group.
Changing the order of the subjects in the treatment group can therefore impact the matches that result. A better approach would be to find matches that minimizes the sum of the absolute values of the treatment propensity scores minus total control propensity scores. We won’t use this approach, however.
Pruned subjects
After going through this process for all the subjects in the treatment group, any subject that remains unmatched in the control group is not used, i.e. is pruned.
We also set a minimum acceptable distance for a nearest neighbor of subject i in the treatment group, called a cutoff. If the nearest match in the control group exceeds this cutoff, i.e. min |pi – pj| > cutoff, then this match is invalid and subject i from the treatment group is also pruned.
Here, we assume that the control group starts with at least as many subjects as the treatment group. If not, you can reverse the roles of the treatment and control groups if you like.
Evaluate quality of the matching
You now want to make sure that the resulting matches yield treatment and control groups that don’t have significant differences for any of the observed confounding variables.
We therefore provide statistics (mean, variance, min, 25th percentile, median, 75th percentile, and max) for the resulting treatment and control groups so that you can evaluate whether there is a problem. We also provide statistical tests (t-test, Mann-Whitney test, and Brunner-Munzel test) for each of the confounding variables as well as the Shapiro-Wilk test for normality.
Determine the treatment effect on the outcome
You can now use the matched data to perform any of the usual statistical tests, including a t-test, regression, Mann-Whitney test, etc. This time you focus on the outcome variable that was not used in calculating the propensity score.
Real Statistics Support
Click here for a description of the Real Statistics worksheet functions and data analysis tool that supports PSM.
Click here for an example of how to use the Real Statistics Propensity Score Matching data analysis tool in Excel.
References
Garrido, M. M., Kelley, A. S., Paris, J., Roza, K., Meier, D. E., Morrison, R. S., and Aldridge, M. D. (2014) Methods for Constructing and Assessing Propensity Scores (2014) Methods for Constructing and Assessing Propensity Scores
https://pmc.ncbi.nlm.nih.gov/articles/PMC4213057/pdf/hesr0049-1701.pdf
Luvsandorj, Z. (2023) A Beginner’s Guide to Propensity Score Matching
https://builtin.com/data-science/propensity-score-matching
Wikipedia (2024) Propensity score matching
https://en.wikipedia.org/wiki/Propensity_score_matching
McKee, D. (2015) An intuitive introduction to propensity score matching
https://www.youtube.com/watch?v=ACVyPp1Fy6Y
King, G. (2015) Why propensity scores should not be used for matching
https://www.youtube.com/watch?v=rBv39pK1iEs