Stat n Math : 1. Simple Linear Regression (SLR)'s Equation and Assumptions

[1] What’s Regression Models?

It describes a *statistical relationship between X and Y, where X is called an explanatory, independent or predictor variable, and Y is called a responsible or dependent variable.

*Wait!! What’s the statistical relationship?

- In Mathematics, y=f(x) is called a functional relationship, which the f(x) is some EXACT function. Therefore, you can draw a graph. You don't need to predict the Y values as long as we can figure out the equation of the function.
- In Statistics, however, there exists an error term (ε) such as measurement errors which we don't know these values. Therefore, the statistical relationship would be explained by y=f(x) + ε.

[2] Simple Linear Regression (SLR) Equation

Here, ‘simple’ means there is one predictor only!

In an observational data sets, suppose we want to predict Y values based on the X, explanatory variables, then we need to build up a statistical relationship equation by using these variables. In equation, there should be a slope, intercept, also an error term, as this is a statically relationship equation.

Therefore, the equation is $Y_{i}= \beta_{0}+\beta_{1}X_{i}+ \varepsilon _{i}$

In this equation, we know Y and X values from a data sets, but we don’t know what the β0(intercept), β1(slope) and εi’s are. We can figure out the intercept and slope based on the analysis as they are constants. However, Y and ε are random values which means if we know its mean and variance, we will also know its distribution!!

Let's find out the error term's distribution first!

[3] The SLR Assumptions: The Error terms’ Mean and Variance.

There are three SLR assumptions regarding the error term.
(1) E[εi] = 0,
(2) Var[εi] = σ2
(3) Cov[εi ,εj’s] = 0 The error terms are uncorrelated.
Therefore, the error terms’ distribution will be ε~ N(0, σ2)

The Y is also random value as the error term is a random. Therefore, we can find Y's mean and variance as well.

[4] The Y’s Mean and Variance
(1) E[Yi]= β0+ β1Xi
Proof: E[Y] = E[β0+ β1Xi + εi] = E[β0] + E[β1Xi ] + E[εi] = β0 + β1Xi
By assumption above E[εi]=0, and β0, β1 and Xi are constants.

(2) Var[Yi]= σ2

Proof: Var[Y] = Var[β0+ β1Xi + εi] = Var[εi] = σ2
By assumption above Var[εi] = σ2, and β0+β1Xi is constants.

(3) Cov[Yi, Yj]= 0

Proof: Cov[β0+ β1Xi + εi, β0+ β1Xj + εj]= Cov [β0, β0] + Cov[β0, β1Xj] +…
(expanding each term)..+ Cov[εi , εj]=0 As Cov [constant, random] = 0,
Only Cov[εi, εj] is left, and its value is 0 by assumption.

Remark!! A statistical relationship between X and Y does NOT necessarily mean that X causes Y, as these X and Y are an observational data.

Stat n Math

Pages

1. Simple Linear Regression (SLR)'s Equation and Assumptions

No comments:

Post a Comment

Search This Blog

Popular Posts