FCS for Binary Categorical Data

We start by creating a logistic regression model for Y based on the data in X using only the complete rows of X and Y (i.e. using listwise deletion). The result is a column vector of coefficients B and a covariance matrix S for the coefficients.

Let LLT be the Cholesky Decomposition of S and create the revised version of the coefficient vector

image7258

where V = [vi] is a (k+1) × 1 column vector where each vi is an independent random value from the standard normal distribution; i.e. each vi = NORM.S.INV(RAND()).

Each missing data value yi, calculate the probability pi that yi = 1 by

image7259

Now impute the value of yi as follows

image7260

where u is a random value from the uniform distribution between 0 and 1; i.e.

u = RAND()

This case has not yet been implemented in the Real Statistics Resource Pack and so we won’t discuss it further now. In fact, for now, all categorical variables will be implemented as if they were continuous but using the appropriate constraint. E.g. a binary categorical variable uses the constraint: min = 0, max = 1, round off = TRUE.

Reference

Mitani, A. A. (2013) Multiple imputation in practice: aproaches for handing categorical and interaction variables
https://ayamitani.github.io/files/mitani_qsuseminar_v2.pdf

Carpenter, J., Kenward, M. (2013) Multiple imputation and its application. Wiley
https://books.google.it/books?id=mZMlnTenpx4C&pg=PA122&lpg=PA122&dq=FCS+for+Binary+Categorical+Data&source=bl&ots=bi0hI0j6Ic&sig=ACfU3U2Ze4eT39ZfIEqyL-jOlSOEky0YHQ&hl=en&sa=X&ved=2ahUKEwiv7_L606v8AhX7RvEDHT8UAX84FBDoAXoECBEQAw#v=onepage&q=FCS%20for%20Binary%20Categorical%20Data&f=false

Leave a Comment