Overview
Multiple Linear Regression is based on one or more independent variables and one dependent variable. A Multivariate Linear Regression model extends this approach by accepting more than one dependent variable.
Definition
Univariate model
We look at the univariate model first. If y is a dependent variable (aka the response variable) and x1, …, xk are independent variables (aka predictor variables), then the multiple regression model provides a prediction of response variable from the predictor variables of the form
y = β0 + β1x1 + ⋅⋅⋅ + βkxk + ε
where the βj are the regression coefficients. This can be expressed as
y = XTβ + ε
where β is a k+1 × 1 random vector and X is a k+1 × 1 random vector.
In practice the multiple regression model is built from sample data consisting of n observations whose ith observation is xi1, xi2, …, xik, yi. For i = 1, …, n
yi = β0 + β1xi1 + ⋅⋅⋅ + βkxik + εi
If Y is the n × 1 column vector with the entries y1, …, yn, ε is the n × 1 column vector with the entries ε1, …, εn, β is the k+1 × 1 column vector with the entries β0, β1, …, βk and X is the n × k+1 matrix (called the design matrix) where each row has entries 1, xi1, …, xik, then the above n equations can be expressed as the single matrix equation
Y = Xβ + ε
Multivariate model
Definition 1: For multivariate regression, we instead start with m equations for p = 1, …, m of the same form as for univariate regression, namely
yp = βp0 + βp1x1 + ⋅⋅⋅ + βpkxk + εp
In practice the multivariate regression model is built from sample data consisting of n × m observations. For i = 1, …, n
yi1 = β01 + β11xi1 + ⋅⋅⋅ + βk1xik + εi1
⋅⋅⋅
yim = β0m + β1mxi1 + ⋅⋅⋅ + βkmxik + εim
which can be expressed as in the univariate case by the matrix equation
Y = Xβ + ε
This is the same form as for multiple regression. The difference is that this time Y = [yip] and ε = [εip] are n × m matrices, X = [xij] is an n × k+1 matrix (exactly as for univariate multiple regression), and β = [βjp] is a k+1 × m vector of coefficients.
We also use the terminology Yp, βp, and εp to represent the pth column of Y, β, and ε respectively. Thus, we note that for all p
Yp = Xβp + εp
Assumptions
The assumptions for the multivariate regression model are similar to those for the univariate case, namely
E[ε] = O
cov(ε(i), ε(i)) = Σ
cov(ε(i), ε(h)) = O for i ≠ h
for some m × m matrix Σ where ε(i) is the ith row of ε. O is the null matrix (n × m for the first O and m × m for the second O). Note there can be non-zero correlation for columns of ε, namely cov(εp, εq) ≠ O for p ≠ q
The equivalent set of assumptions is
E[Y] = Xβ
cov(Y(i), Y(i)) = Σ
cov(Y(i), Y(h)) = O for i ≠ h
where Y(i) is the ith row of Y. There can be non-zero correlation for columns of Y, namely cov(Yp, Yq) ≠ O for p ≠ q.
Thus, each column version of the regression
Yp = Xβp + εp
satisfies all the assumptions of the univariate regression model. If we add the normality assumption, then each εp∼ N(0, σp2I).
Observations
As for the case where m = 1, the least squares estimate B for β minimizes
SSE = (Y – XB)T(Y – XB)
and can be calculated by
B = (XTX)-1XTY
In particular, the columns in B can be calculated by separate univariate linear regressions for X on the corresponding column in Y. Since SSE is not a scalar, we won’t get into the details as to what constitutes a minimum result, but suffice it to say that (Yp – XBp)T(Yp – XBp) is minimized for each column p (see also Property 2).
We define the covariance σpq = cov(εp, εq) and define the m × m covariance matrix Σ = [σpq]. As usual, σp2 = σpp = var(εp).
If we apply the least square method to Yp = Xβp + εp, we generate the (k+1) × 1 coefficient vectors Bp = [bjp] where Bp = (XTX)-1XTYp
From these Bp we can define the (k + 1) × m matrix B = [bjp].
We can also define the n × m matrices Ŷ = [ŷjp] and E = [εjp] such that
Ŷ = XB E = Y – Ŷ
Thus
Y = XB + E
In fact, for any values x1, …, xk (here the xj are not variables, nor are they necessarily values in our original sample), where X = [xj] is a 1 × (k+1) row vector with x0 = 1, we can define the 1 × m matrices Ŷ = [ŷjp] and E = [εjp] such that
Ŷ = XB E = Y – Ŷ
Properties
Click here for proofs of the following properties.
Property 1:
B = (XTX)-1XTY
Property 2: B minimizes the trace
Tr((Y – XB)T(Y – XB))
Property 3:
E[ε] = 0
Property 4: B is an unbiased estimator of β; i.e. E[B] = β
Property 5:
cov(Bp, Bq) = σpq(XTX)-1
Property 6:
E[Ep] = 0
Property 7:
E[EpTEq] = σpqdfRes
Property 8: SSE/dfRes is an unbiased estimate for Σ; i.e. E[SSE] = E[ETE] = dfRes∑
Observations: This means that sih/dfRes is an unbiased estimator for σih
Note too that
SSE = (Y – XB)T(Y – XB) = YTY – BTXTY = YTY – (XB)TY
Property 9:
cov(Bp, Eq) = 0 cov(B, E) = 0
References
Johnson, R. A., Wichern, D. W. (2007) Applied multivariate statistical analysis. 6th Ed. Pearson
https://mathematics.foi.hr/Applied%20Multivariate%20Statistical%20Analysis%20by%20Johnson%20and%20Wichern.pdf
Rencher, A.C., Christensen, W. F. (2012) Methods of multivariate analysis (3nd Ed). Wiley