Method of Least Squares Detailed

Theorem 1: The best fit line for the points (x1, y1), …, (xn, yn) is given by

image1673 where
image1674

Proof: Our objective is to minimize

image3427

For any given values of (x1, y1), … (xn, yn), this expression can be viewed as a function of b and c. Calling this function g(b, c), by calculus, the minimum value occurs when the partial derivatives are zero.

image3429 image3430

Transposing terms and simplifying,

image3431 image3432Since  \sum\nolimits_{i=1}^n (x_i-\bar{x}) = 0, from the second equation we have c = ȳ, and from the first equation we have

image3435

The result follows since

image3436

Alternative Proof: This proof doesn’t require any calculus. We first prove the theorem for the case where both and y have mean 0 and standard deviation 1. Assume the best fit line is y = bx + a, and so

image3437

for all i. Our goal is to minimize the following quantity

image3438

Now minimizing z is equivalent to minimizing z/n, which is

image3440 image3441 image3442 image3443

since = ȳ = 0. Now since a2 is non-negative, the minimum value is achieved when a = 0. Since we are considering the case where x and y have a standard deviation of 1, s_x^2 = s_y^2 = 1, and so expanding the above expression further we get

image3446 image3447 image3448

since
image3450

Now suppose b = r – e, then the above expression becomes

image3452 image3453

Now since e2 is non-negative, the minimum value is achieved when e = 0. Thus b = r – e = r. This proves that the best fitting line has the form y = bx + a where b = r and a = 0, i.e. y = rx.

We now consider the general case where the x and y don’t necessarily have a mean of 0 and a standard deviation of 1, and set

x′ = (x)/sx and y′ = (y – ȳ)/sy

Now x′ and y′ do have a mean of 0 and a standard deviation of 1, and so the line that best fits the data is y′ = rx′, where r = the correlation coefficient between x′ and y′. Thus the best fit line has form

image5080

or equivalently

image1673

where b = rsy/sx. Now note that by Property B of Correlation, the correlation coefficient for x and y is the same as that for x′ and y′, namely r.

The result now follows by Property 1. If there is a better fit line for x and y, it would produce a better fit line for x′ and y′, which would be a contradiction.

7 thoughts on “Method of Least Squares Detailed”

    • Hello Arshad,
      First note that since xbar is a constant, sum(xbar) = n*xbar where n = size of data set. But xbar = sum(x_i)/n. so sum(xbar) = sum(x_i).
      Now sum(x_i – xbar) = sum(x_i) – aum(xbar) = sum(x_i) – sum(x_i) = 0.
      Charles

      Reply
  1. Hello Charlez! Thanks for such a good website,im happy that i’ve found it,its very helpfull!
    I am doing researh “Does China consumption affect on world grain prices”
    Time series from 1980-2017 year,using the eviews programm.My teacher said to use ADF Unit Root test,OLS test and Var regression.As i read in your article ADF unit root test needs to identify if the time series stationary or not,if its not stationary its mean that we reject the null hypotethis,am i right??? And still cant understand for what need to use Method of least Squares and Var?

    Reply
  2. Hi Charles,
    Googling for a good answer on how to calculate the confidence limits of a linear regression I found your text. It is useful indeed.
    Andreas

    Reply

Leave a Comment