Multiple Regression without Intercept

There are some problems where based on theoretical grounds we expect that the appropriate model is multiple regression without a constant term, i.e.

image020x

which is the usual multiple regression model where β0 = 0. In the case with just one independent variable, this is equivalent to finding the line through the origin which best fits the data.

On this webpage, we explain the theory of this model. See Regression w/o Constant in Excel for a description of various Excel and Real Statistics capabilities for such models.

For the sample {y1, y2, …, yn} of size n for the dependent variable y and samples {x1j, x2j, …, xnj} for each of the independent variables xj for j = 1, …, k, we let Y be the n × 1 column vector with the entries y1, …, yn and X be the n × k matrix [xij].

Most of the properties for ordinary multiple regression still hold where the design matrix is replaced by the matrix X = [xij], i.e. we don’t add columns consisting of ones. In particular, the regression coefficients for the least-squares model can be expressed by

image021xIt also turns out that B = [bj] is the k × 1 column vector such that

image022x

We can use the Y-hat k × 1 column vector with entries

image023x

to express the least-squares model as

image024x

where H is the × n  hat matrix

image025x

For ordinary multiple regression (including an intercept), we have

Picture70

As we saw in Multiple Regression in Excel, SST = SSReg + SSRes. To see this, first, note that

image026x

By taking the sum of both sides of the equation over all values of i and then squaring both sides, we get

image027x

The desired result follows since

image028x

which follows by substituting β0 + β1x1 + … + βkxk for ŷi and simplifying.

These results don’t hold in the case where there is no constant term in the regression equation. Instead of fitting a line through the mean values, we need to instead fit the line through the origin. Since

image029x

It follows that

image030x

Butimage031x

image032x

Thus, for regression without a constant term, we still have SST = SSReg + SSRes and dfT = dfReg + dfRes where

multiple-regression-no-constant

Using these new definitions, we define

image036Another version of R2 is

image037

For multiple regression including the constant term, these definitions are equivalent. This is not necessarily the case when the intercept is not included in the model. It is also not necessarily the case that the sum of the squared errors is zero (as for regression that includes a constant).

The adjusted version of R2 is

image038

Observation: It is important to note that the R2 value for regression with an intercept is not comparable with the R2 value for regression without an intercept (i.e. with an intercept whose value is zero). Thus if R2 = .95 for regression without an intercept and R2 = .80 for regression with an intercept, it doesn’t follow that the model without an intercept is a better fit for the data.

Observation: In general, it is better not to assume that the intercept is zero. In fact, as mentioned earlier, the only time you should use this type of model is when on theoretical grounds you expect that the intercept is zero. Some examples of this are:

  • Hubble model for the expansion of the universe: galaxy speed = β ∙ distance from the earth
  • Examples from Finance such as Capital asset pricing model (CAPM) and Cobb-Douglas production function

In the Hubble model example, there is no constant term since at time zero, according to the Big Bang Theory, all the matter in the universe is concentrated at a single point in space.

Of course, you can always use a regression model that includes the constant term, and check whether this term is significantly different from zero.

See Regression w/o Constant in Excel for a description of various Excel and Real Statistics functions and data analysis tools for creating multiple regression models without an intercept.

6 thoughts on “Multiple Regression without Intercept”

  1. I followed you formula, but the calculation result is different compared with R?

    Is there any wrong with the foruma: b(subj) above?

    which is different from B=rev(x’x) x’ y??

    Reply
  2. Do you know which formula uses Excel to compute the adjusted R^2 in a multiple regression without constant term? I have tried your formula, but the result is different.

    Reply
    • Hello Hector,
      I don’t know how Excel calculates the adjusted R-square in this case. It seems that the value is lower than I have seen from other software tools.
      Charles

      Reply
    • Ujang,
      It depends on what sort of chart you are referring to, since many are possible. If you want a scatter plot of each independent variable against the dependent variable see
      Scatter Plot
      Keep in mind that if you have k independent variables you will need to plot each separately since charts tend to be two-dimensional and not k+1 dimensional.
      Charles

      Reply

Leave a Comment