Objective
Our objective is to show how to use bootstrapping in regression modelling. In particular, we describe how to estimate standard errors and confidence intervals for regression coefficients, and prediction intervals for modeled data. Bootstrapping is especially useful when the normality and/or homogeneity of variance assumptions are violated.
See the following links for further information about bootstrapping:
Bootstrapping for regression coefficient covariance matrix
We provide two approaches for calculating the covariance matrix of the regression coefficients.
Approach 1
We assume that X is fixed. Using the original X, y data, estimate the regression coefficients via
B = (XTX)-1XTy
Next calculate
e = y – XB
We now create N bootstrap iterations. For each iteration, create an e* by randomly selecting n rows from e with replacement. Next calculate
y* = XB + e*
and use regression to calculate the regression coefficients B* based on the data in X, y*; i.e.
B* = (XTX)-1XTy*
We now calculate the average of these N bootstrap B* values.
Next, we calculate the k × k bootstrap covariance matrix as follows
The square roots of the values on the main diagonal of S* serve as the standard errors of the regression coefficients in B.
Approach 2
This time we create N bootstrap iterations as follows. For each iteration, randomly select n random numbers from the set 1, …, n. We now define X* and y* as follows. For each such random number i assign the ith row of X to the next row in X* and the ith row of y to the next row in y*. For this X*, y*, perform regression to estimate the regression coefficient matrix B*. Now calculate S* as described above to obtain the covariance matrix and standard errors of the regression coefficients.
Bootstrapping regression coefficient confidence intervals
We can use the same two approaches to create confidence intervals for the individual regression coefficients. using either approach, calculate the N bootstrap k × 1 coefficient matrices B* = [b*j]. The bootstrap standard error for each element βj in β can be estimated by
where
We can also calculate a 1 – α confidence interval for each βj by first arranging the jth coefficient for each bootstrap coefficient matrix in order
The 1 – α confidence interval is then
Bootstrapping confidence intervals for data
The standard error and confidence interval for y-hat = X0β where X0 is a 1 × k+1 row vector (with initial element 1), is produced by first generating N bootstrap values B*1, …, B*N for the coefficient matrix as described above. For each B*r we calculate the predicted value of y for X0, namely
We next take the average of these values
The standard error is then
Arranging the N bootstrapped predicted y values in order
we obtain a 1 – α confidence interval as follows:
PI should be replaced by CI in the above formula
References
Eck, D. J. (2017) Bootstrapping for multivariate linear regression models
https://arxiv.org/abs/1704.07040
Stine, R. A. (1985) Bootstrap prediction intervals for regression
https://www.jstor.org/stable/2288570?seq=1
Fox, J. (2015) Bootstrapping regression models.
Applied Regression Analysis and Generalized Linear Models, 3rd ed. Sage Publishing
https://us.sagepub.com/sites/default/files/upm-binaries/21122_Chapter_21.pdf






