Real Statistics Advanced Missing Data Functions

The following is a summary of all the missing data functions provided in the Real Statistics Resource Pack based on the Multiple Imputation (MI), Full Information Maximum Likelihood (FIML) and Expectation-Maximization (EM) approaches.

Group 1

The following are array functions (with the exception of CountPatterns, which is not an array function) where R1 is a range where each column represents sample data for a random variable. Blank or non-numeric elements are assumed to represent missing data.

MissingFreq(R1, head) – generates a summary with the frequency of non-missing data in R1

MissingPatterns(R1, head, s) – generates a summary with missing patterns of data in R1. The argument s is used to fill up any extra rows in the output which do not contain data. If s is not specified then it defaults to the error value #N/A.

CountPatterns(R1) = the number of different missing data patterns there are for the data in R1

ImputeSimple(R1, head, R2, iter) – generates a range with all the missing data in R1 filled in using the simple imputation approach

ImputeReg(R1, j, head, R2, iter) – generates a range where all the missing data in column j is filled in using one step of the FCS algorithm

ImputeFCS(R1, head, iterf, R2, iter) – generates one imputation of the missing data in R1 using FCS; iterf is the number of iterations of the FCS algorithm (default = 20)

If head is TRUE (default) then it is assumed that the data range R1, as well as the output, contains column headings, while if head = FALSE then the R1 should not contain column headings and the output will not contain column headings either.

R2 is a range containing constraints (if R2 is omitted then no constraints are used) and iter is the maximum number of iterations used to obtain a value within the min/max constraints (default = 25). ImputeReg can contain only one constraint, namely for the variable corresponding to column j.

Group 2

For the following array functions, R1 and head are as in Group 1, except that the default for head is FALSE. If lab = TRUE then an extra column is inserted in the output which contains labels (default = FALSE).

MISummary(R1, head) – generates a compact summary of the regression model for R1 where the last column in R1 is assumed to be the data for the y variable and the other columns are assumed to contain the data for the x variables.

ImputedData(R0, R1, head) – generates a range with all the elements in R1 where there is missing data in range R0.

DescStats(R1, lab, head) – generates a mini descriptive statistics report for the data in range R1.

DescStats(R1, lab, head, R0) – generates a mini descriptive statistics report for the data in range R1 which corresponds to the missing data in range R0.

MissingPairwise(R1, head) – generates a summary of the percentage of non-missing data for each pair of variables in R1. If head = TRUE then the output includes row headings (as well as column headings).

Group 3

The following is an array function:

MICombine(R1, nimp, ncols, head, raw) – generates a combined compact regression summary derived from the compact summaries of nimp imputations if raw = FALSE or derived directly from the nimp imputations if raw = TRUE (default).

If raw = FALSE then R1 is the range containing the first of the nimp compact regression summaries, while if raw = TRUE then R1 is the range containing the first of the nimp imputations. The nimp imputations or compact regression summaries are separated by ncols blank columns (default = 1). head is as in Group 2.

Group 4

For the following function, R1 is a 1 × k row vector, R2 is a k × k covariance matrix and R3 is a 1 × k mean vector.

LLReg(R1, R2, R3) = the value of -2LLi for the row represented by R1, possibly containing missing data.

Group 5

For the following array functions, R1 is a 2m × n range where the first m rows represent the values of m imputed population parameters and the second m rows represent the corresponding standard errors for these parameters. The columns represent separate imputations. The argument size is the number of elements in the original sample (including missing data) and head and lab are as in Group 2.

ImputeVar(R1, size, lab, head) – outputs an array that summarizes the combination rules for the variance.

ImputeParam(R1, size, lab, head, alpha) – outputs a range based on the combination rules and the usual t-test using the stated value of alpha (default = .05).

Group 6

The following array functions relate to the EM algorithm. As in Group 1 and 2, R1 is a range where each column represents sample data for a random variable. Blank or non-numeric elements are assumed to represent missing data. If lab = TRUE then an extra column is inserted in the output which contains labels (default FALSE). iter = the maximum number of iterations (default 200). If none of the imputed values change by more than prec (default 0.00000001) then the iteration terminates.

EM_CHISQ(R1, iter, prec): outputs an array with the estimated multinomial p (i.e. probability) parameters

EM_CHISQ_IMPUTE(R1, iter, prec): outputs an array with imputed data values

EM_CHISQ_EXP(R1, iter, prec): outputs an array with the estimated multinomial p parameters assuming independence

EM_CHISQ_EXP_IMPUTE(R1, iter, prec): outputs an array with imputed data values assuming independence

EM_CHISQ_TEST(R1, lab, iter, prec): outputs an array containing the chi-square statistic, df and p-value of the test for independence

EM_MNORM_IMPUTE(R1, iter): outputs an array containing the data in R1, which is assumed to be multivariate normally distributed, but with any missing data elements imputed.

EM_MNORM(R1, iter): outputs an array containing the covariance matrix and mean vector for the data in R1, which is assumed to be multivariate normally distributed, where any missing data elements have been imputed.

6 thoughts on “Real Statistics Advanced Missing Data Functions”

  1. Hi – just a thought about multiple imputation, that I don’t recall seeing addressed.

    If I have a variable that can ordinal or nominal categories but one of the possible values in my data set is not observed, then how can I best create an imputed data set that does occasionally include such an outcome? One ad hoc partial solution I’ve tried for ordinal outcomes was to pretend it was continuous, impute on the assumption of Normality, say (perhaps after a transformation) and recode the results into categories after imputation. This fails for nominal scales of course.

    I’m now retired and I last did statistical analyses with Stata. I’m fascinated by the way Excel has continued to develop over the years.

    Reply
  2. Hi Mr Charles Zaiontz
    I have a question and am grateful if you respond and guide me:
    for discrete data in spss:
    when we use transform/replace Missing Values, data transform in continious type
    this situation isnot desirable. For example my variable is sex as male(value=1) and female(value=2) Now when we have missing value and use in spss transform/replace Missing Values data transform missing values trasforms 1.4.
    I donot know what use I from 1.4 in drawing graphs and report, because value=1=male and value=2=female but value=1.4=?
    in continious data we donot have such problem because type of tranformed data is contionious.
    I am grateful if you respond and guide me.

    Reply

Leave a Comment