Multivariate Regression Analysis Basic Concepts

Overview

Multiple Linear Regression is based on one or more independent variables and one dependent variable. A Multivariate Linear Regression model extends this approach by accepting more than one dependent variable.

Definition

Univariate model

We look at the univariate model first. If y is a dependent variable (aka the response variable) and x1, …, xk are independent variables (aka predictor variables), then the multiple regression model provides a prediction of response variable from the predictor variables of the form

y = β0 + β1x1 + ⋅⋅⋅ + βkxk + ε

where the βj are the regression coefficients. This can be expressed as

y = XTβ + ε

where β is a k+1 × 1 random vector and X is a k+1 × 1 random vector.

In practice the multiple regression model is built from sample data consisting of n observations whose ith observation is xi1, xi2, …, xik, yi. For i = 1, …, n

yi = β0 + β1xi1 + ⋅⋅⋅ + βkxik + εi

If Y is the n × 1 column vector with the entries y1, …, yn, ε is the n × 1 column vector with the entries ε1, …, εn, β is the k+1 × 1 column vector with the entries β0, β1, …, βk and X is the n × k+1 matrix (called the design matrix) where each row has entries 1, xi1, …, xik, then the above n equations can be expressed as the single matrix equation

Y = Xβ + ε

Multivariate model

Definition 1: For multivariate regression, we instead start with m equations for p = 1, …, m of the same form as for univariate regression, namely

yp = βp0 + βp1x1 + ⋅⋅⋅ + βpkxk + εp

In practice the multivariate regression model is built from sample data consisting of n × m observations. For i = 1, …, n

yi1 = β01 + β11xi1 + ⋅⋅⋅ + βk1xik + εi1

⋅⋅⋅

yim = β0m + β1mxi1 + ⋅⋅⋅ + βkmxik + εim

which can be expressed as in the univariate case by the matrix equation

Y = Xβ + ε

This is the same form as for multiple regression. The difference is that this time Y = [yip] and ε = [εip] are n × m matrices, X = [xij] is an n × k+1 matrix (exactly as for univariate multiple regression), and β = [βjp] is a k+1 × m vector of coefficients.

We also use the terminology Yp, βp,  and εp to represent the pth column of Y, β, and ε respectively. Thus, we note that for all p

Yp = Xβp + εp

Assumptions

The assumptions for the multivariate regression model are similar to those for the univariate case, namely

E[ε] = O

cov(ε(i), ε(i)) = Σ

cov(ε(i), ε(h)) = O for i ≠ h

for some m × m matrix Σ where ε(i) is the ith row of ε. O is the null matrix (n × m for the first O and m × m for the second O). Note there can be non-zero correlation for columns of ε, namely cov(εp, εq) ≠ O for p ≠ q

The equivalent set of assumptions is

E[Y] =

cov(Y(i), Y(i)) = Σ

cov(Y(i), Y(h)) = O for i ≠ h

where Y(i) is the ith row of Y. There can be non-zero correlation for columns of Y, namely cov(Yp, Yq) ≠ O for p ≠ q.

Thus, each column version of the regression

Yp = Xβp + εp

satisfies all the assumptions of the univariate regression model. If we add the normality assumption, then each εp∼ N(0, σp2I).

Observations

As for the case where m = 1, the least squares estimate B for β minimizes

SSE = (Y – XB)T(Y – XB)

and can be calculated by

B = (XTX)-1XTY

In particular, the columns in B can be calculated by separate univariate linear regressions for X on the corresponding column in Y. Since SSE is not a scalar, we won’t get into the details as to what constitutes a minimum result, but suffice it to say that (Yp – XBp)T(Yp – XBp) is minimized for each column p (see also Property 2).

We define the covariance σpq = cov(εp, εq) and define the m × m covariance matrix Σ = [σpq]. As usual, σp2 = σpp = var(εp).

If we apply the least square method to Yp = Xβp + εp, we generate the (k+1) × 1 coefficient vectors Bp = [bjp] where Bp = (XTX)-1XTYp

From these Bp we can define the (k + 1) × m matrix B = [bjp].

We can also define the n × m matrices Ŷ = [ŷjp] and E = [εjp] such that

Ŷ = XB          E = Y – Ŷ

Thus

Y = XB + E

In fact, for any values x1, …, xk (here the xj are not variables, nor are they necessarily values in our original sample), where X = [xj] is a 1 × (k+1) row vector with x0 = 1, we can define the 1 × m matrices Ŷ = jp] and E = [εjp] such that

Ŷ = XB          E = Y – Ŷ

Properties

Click here for proofs of the following properties.

Property 1:

B = (XTX)-1XTY

Property 2: B minimizes the trace

Tr((Y – XB)T(Y – XB))

Property 3:

E[ε] = 0

Property 4B is an unbiased estimator of β; i.e. E[B] = β

Property 5:

cov(Bp, Bq) = σpq(XTX)-1

Property 6:

E[Ep] = 0

Property 7:

E[EpTEq] = σpqdfRes

Property 8: SSE/dfRes is an unbiased estimate for Σ; i.e. E[SSE] = E[ETE] = dfRes

Observations: This means that sih/dfRes is an unbiased estimator for σih

Note too that

SSE = (Y – XB)T(Y – XB) = YTY – BTXTY = YTY – (XB)TY

Property 9:

cov(Bp, Eq) = 0          cov(B, E) = 0

References

Johnson, R. A., Wichern, D. W. (2007) Applied multivariate statistical analysis. 6th Ed. Pearson
https://mathematics.foi.hr/Applied%20Multivariate%20Statistical%20Analysis%20by%20Johnson%20and%20Wichern.pdf

Rencher, A.C., Christensen, W. F. (2012) Methods of multivariate analysis (3nd Ed). Wiley

Leave a Comment