Multivariate Reg Concepts | Real Statistics Using Excel

Overview

Multiple Linear Regression is based on one or more independent variables and one dependent variable. A Multivariate Linear Regression model extends this approach by accepting more than one dependent variable.

Definition

Univariate model

We look at the univariate model first. If y is a dependent variable (aka the response variable) and x₁, …, x_k are independent variables (aka predictor variables), then the multiple regression model provides a prediction of response variable from the predictor variables of the form

y = β₀ + β₁x₁ + ⋅⋅⋅ + β_kx_k + ε

where the β_j are the regression coefficients. This can be expressed as

y = X^Tβ + ε

where β is a k+1 × 1 random vector and X is a k+1 × 1 random vector.

In practice the multiple regression model is built from sample data consisting of n observations whose ith observation is x_i₁, x_i₂, …, x_ik, y_i. For i = 1, …, n

y_i = β₀ + β₁x_i1 + ⋅⋅⋅ + β_kx_ik + ε_i

If Y is the n × 1 column vector with the entries y₁, …, y_n, ε is the n × 1 column vector with the entries ε₁, …, ε_n, β is the k+1 × 1 column vector with the entries β₀, β₁, …, β_k and X is the n × k+1 matrix (called the design matrix) where each row has entries 1, x_i1, …, x_ik, then the above n equations can be expressed as the single matrix equation

Y = Xβ + ε

Multivariate model

Definition 1: For multivariate regression, we instead start with m equations for p = 1, …, m of the same form as for univariate regression, namely

y_p = β_p0 + β_p1x₁ + ⋅⋅⋅ + β_pkx_k + ε_p

In practice the multivariate regression model is built from sample data consisting of n × m observations. For i = 1, …, n

y_i₁ = β₀₁ + β₁₁x_i1 + ⋅⋅⋅ + β_k₁x_ik + ε_i₁

⋅⋅⋅

y_im = β_0m + β_1mx_i1 + ⋅⋅⋅ + β_kmx_ik + ε_im

which can be expressed as in the univariate case by the matrix equation

Y = Xβ + ε

This is the same form as for multiple regression. The difference is that this time Y = [y_ip] and ε = [ε_ip] are n × m matrices, X = [x_ij] is an n × k+1 matrix (exactly as for univariate multiple regression), and β = [β_jp] is a k+1 × m vector of coefficients.

We also use the terminology Y_p, β_p, and ε_p to represent the pth column of Y, β, and ε respectively. Thus, we note that for all p

Y_p = Xβ_p + ε_p

Assumptions

The assumptions for the multivariate regression model are similar to those for the univariate case, namely

E[ε] = O

cov(ε₍_i₎, ε₍_i₎) = Σ

cov(ε₍_i₎, ε₍_h₎) = O for i ≠ h

for some m × m matrix Σ where ε₍_i₎ is the ith row of ε. O is the null matrix (n × m for the first O and m × m for the second O). Note there can be non-zero correlation for columns of ε, namely cov(ε_p, ε_q) ≠ O for p ≠ q

The equivalent set of assumptions is

E[Y] = Xβ

cov(Y₍_i₎, Y₍_i₎) = Σ

cov(Y₍_i₎, Y₍_h₎) = O for i ≠ h

where Y₍_i₎ is the ith row of Y. There can be non-zero correlation for columns of Y, namely cov(Y_p, Y_q) ≠ O for p ≠ q.

Thus, each column version of the regression

Y_p = Xβ_p + ε_p

satisfies all the assumptions of the univariate regression model. If we add the normality assumption, then each ε_p∼ N(0, σ_p²I).

Observations

As for the case where m = 1, the least squares estimate B for β minimizes

SSE = (Y – XB)^T(Y – XB)

and can be calculated by

B = (X^TX)^-1X^TY

In particular, the columns in B can be calculated by separate univariate linear regressions for X on the corresponding column in Y. Since SSE is not a scalar, we won’t get into the details as to what constitutes a minimum result, but suffice it to say that (Y_p – XB_p)^T(Y_p – XB_p) is minimized for each column p (see also Property 2).

We define the covariance σ_pq = cov(ε_p, ε_q) and define the m × m covariance matrix Σ = [σ_pq]. As usual, σ_p² = σ_pp = var(ε_p).

If we apply the least square method to Y_p = Xβ_p + ε_p, we generate the (k+1) × 1 coefficient vectors B_p = [b_jp] where B_p = (X^TX)^-1X^TY_p

From these B_p we can define the (k + 1) × m matrix B = [b_jp].

We can also define the n × m matrices Ŷ = [ŷ_jp] and E = [ε_jp] such that

Ŷ = XB E = Y – Ŷ

Thus

Y = XB + E

In fact, for any values x₁, …, x_k (here the x_j are not variables, nor are they necessarily values in our original sample), where X = [x_j] is a 1 × (k+1) row vector with x₀ = 1, we can define the 1 × m matrices Ŷ = [ŷ_jp] and E = [ε_jp] such that

Ŷ = XB E = Y – Ŷ

Properties

Click here for proofs of the following properties.

Property 1:

B = (X^TX)^-1X^TY

Property 2: B minimizes the trace

Tr((Y – XB)^T(Y – XB))

Property 3:

E[ε] = 0

Property 4: B is an unbiased estimator of β; i.e. E[B] = β

Property 5:

cov(B_p, B_q) = σ_pq(X^TX)^-1

Property 6:

E[E_p] = 0

Property 7:

cov(Y_p,Y_q) = σ_pq

Property 8:

E[E_p^TE_q] = σ_pqdf_Res

Property 9: SSE/df_Res is an unbiased estimate for Σ; i.e. E[SSE] = E[E^TE] = df_Res∑

Observations: This means that s_ih/df_Res is an unbiased estimator for σ_ih

Note too that

SSE = (Y – XB)^T(Y – XB) = Y^TY – B^TX^TY = Y^TY – (XB)^TY

Property 10:

cov(B_p, E_q) = 0 cov(B, E) = 0

References

Johnson, R. A., Wichern, D. W. (2007) Applied multivariate statistical analysis. 6th Ed. Pearson
https://mathematics.foi.hr/Applied%20Multivariate%20Statistical%20Analysis%20by%20Johnson%20and%20Wichern.pdf

Rencher, A.C., Christensen, W. F. (2012) Methods of multivariate analysis (3^nd Ed). Wiley

Multivariate Regression Analysis Basic Concepts