Release 2.13 of the Real Statistics Resource Pack

I am pleased to announce Release 2.13 of the Real Statistics Resource Pack. The new release is available for free download (Download Resource Pack) and is compatible with all Windows versions of Excel. The Excel 2010/2013 versions are available now. The Excel 2007 version will be available later today and the other Windows version will be available for download shortly.

The Real Statistics Examples Workbook and the Multivariate Examples Workbook have been updated with some new examples. You can also download these files for free (Download Examples). The website is in the process of being updated for compatibility with the new release. This should be completed over the course of the next several days.

This release is focused on advanced ways of handling missing data. The website will explain some of the common approaches to handling missing data (single imputation and listwise deletion) as well as the limitations of these approaches. In addition the following advanced approaches are presented:

  • Multiple Imputation (MI)
  • Full Information Maximum Log-likelihood (FIML)

The website will describe how to use these approaches to perform multiple regression even when there is missing data. Additional capabilities will be provided in future releases.

The new release of the Real Statistics Resource Pack will contain the following new capabilities which implement the missing data techniques described in the website. Right now I consider these new capabilities to be in beta release. I hope to get feedback from you so that I can finalize these capabilities.

Multiple Imputation Data Analysis Tool: We implement the fully conditional specification (FCS), also called multivariate imputation by chained equations (MICE), to impute the values of the missing data based on existing data. We also show how to perform multiple regression using this approach when data is missing.

Full Information Maximum Log-likelihood (FIML) Data Analysis Tool: We implement the FIML to find the covariance matrix which best fits the sample data based on the maximizing the log-likelihood statistic. From this matrix we also create a multiple regression model even when there is missing data.

New worksheet functions: Complementing these new data analysis tools are the following new worksheet functions: ImputeVar, ImputeParam, ImputeSimple, ImputeReg, ImputeFCS, ImputedData, CountPatterns, DescStats, MissingFreq, MissingPatterns, MissingPairwise, MISummaryMICombine, RegImpute, QuadForm and LLReg

Cholesky decomposition: The new release also provides a function CHOL which calculates the Cholesky decomposition of a positive definite matrix (such as a covariance or correlation matrix).

Very large data set support: Some people are starting to use the Real Statistics Resource Pack for analyzing large samples. In many cases the ranges which have been supported in the software until now are limited to 65,500 cases. I am starting to introduce some new functions which enable analyses with up to 1 million cases.

In the previous release the functions that calculate the correlation and covariance matrices, COV(R1,False),  COVP(R1,False) and CORR(R1, False), were revised to support data sets with up to 1 million rows.

In this release the LogitMatches function is being introduced to support logistic regression with raw data sets of up to one million rows.