Reference: http://en.wikipedia.org/wiki/File:Linear_regression.svg |
From the graph above, blue dots are the X and Y values from a given data set. And our purpose is to predict the red line which is linear. In order to predict this line, we should figure out the slope and intercept value when the error is minimized.
Wait!! How come can we predict the intercept and slope? Because they are constants!
For more information - click!
[1] Find the line that gives the minimum overall squared errors
Sum of Squared Errors $\mathsf {Q= \sum_{i=1}^n \epsilon _{i}^2 = \sum (Y_{i}-\hat{Y_{i}})^2 = \sum (Y_{i}-\beta_{0}-\beta_{1}X_{i})^2}$
Proof
$ b_{0}= \frac{dQ}{db_{0}} =2 \cdot \sum_{i}^{n}(Y_{i}-b_{0}-b_{1}X_{1})^1(-1)= 0$ $ = \sum y_{i}-n\cdot b_{0}-b_{i} \cdot \sum X_{i}=0 \rightarrow \frac{\sum Y_{i}}{n}-b1 \cdot \frac{\sum X_{i}}{n}=b_{0}$
$ \therefore \bar{Y}-b_{1}\bar{X}=b_{0}$, where $ b_{0},b_{1}$ are unknown!
$ b_{1}=\frac{dQ}{db_{1}}= 2\cdot \sum(Y_{i}-b_{0}-b_{1}X_{i})^1(-X_{i})=0 $
$=\sum X_{i}Y_{i}-b_{0}\cdot \sum X_{i}- b_{1} \cdot \sum X_{i}^2 =0 $
$= \sum X_{i}Y_{i}-(\hat{Y}-b_{1}\bar{X})\sum X_{i}-b_{1}\cdot \sum X_{i}^2=0$
$= \sum X_{i}Y_{i}-\bar{Y} \cdot \sum X_{i}=b_{1}(\sum X_{i}^2-\bar{X}\cdot \sum X_{i})$
$\therefore b_{1}=\frac{\sum X_{i}Y_{i}-\bar{Y} \sum X_{i}}{\sum X_{i}^2 - \bar{X} \sum X_{i}}= \frac{\sum X_{i}(Y_{i}-\bar{Y})}{\sum X_{i}(X_{i}-\bar{X})}$
Remark) The slope is only linear combination of $X_{i}$ and $Y_{i}$
[2] Properties of the Least Squares Fitted Line
(1) $\sum_{i=1}^n e_{i}=0$
Proof $\Rightarrow \sum (Y_{i}-\hat{Y}_{i})= \sum (Y_{i}-b_{0}-b_{1}X_{i})$
$=\sum Y_{i}-n(\bar{Y}-b_{1}\bar{X}) - b_{1} \sum X_{i}$
$=n \bar{Y} - n \bar{Y} + n b_{1}\bar{X}-b_{1}n \bar{X}=0$
(2) $\sum_{i=1}^n \hat{Y_{i}}= \sum_{i=1}^n Y_{i}$
Proof $\Rightarrow \sum_{i=1}^n \hat{Y_{i}}= \sum(Y_{i}-e_{i})= \sum Y_{i} - \sum e_{i} = \sum_{i=1}^n Y_{i}$ $(\because \sum e_{i}=0)$
Proof $\Rightarrow \sum X_{i}e_{i}= \sum(X_{i}-\bar{X})e_{i}= \sum (X_{i}-\bar{X})(Y_{i}-b_{0}-b_{1}X_{i})$
$= \sum(X_{i}-\bar{X})(Y_{i}-\bar{Y})-b_{0}\sum(X_{i}-\bar{X})-b_{1} \sum(X_{i}-\bar{X})(X_{i}-\bar{X})$
$= S_{XY}-b_{1}S_{XX}$ $\because b_{1}= \frac{S_{XY}}{S_{XX}} \Rightarrow \therefore S_{XY}-\frac{S_{XY}}{S_{XX}}\cdot S_{XX}=0$
(4) $\sum_{i=1}^n \hat{Y}_{i}e_{i}=0$
Proof $\Rightarrow \sum(b_{0}+b_{1}X_{i})e_{i}=b_{0}\sum e_{i}+ b_{1}\sum X_{i}e_{i}=0$ $\because \sum e_{i}=0,\ \sum X_{i}e_{i}=0$
Proof $\Rightarrow \sum(b_{0}+b_{1}X_{i})e_{i}=b_{0}\sum e_{i}+ b_{1}\sum X_{i}e_{i}=0$ $\because \sum e_{i}=0,\ \sum X_{i}e_{i}=0$
No comments:
Post a Comment