[1]
What’s Regression Models?
It
describes a *statistical relationship between X and Y, where X is called an explanatory, independent or predictor variable, and Y is called a responsible or dependent variable.
*Wait!! What’s the statistical relationship?
-
In Mathematics, y=f(x) is called a functional relationship, which the f(x) is some
EXACT function. Therefore, you can draw a graph. You don't need to predict the Y values as long as we can figure out the equation of the function.- In Statistics, however, there exists an error term (ε) such as measurement errors which we don't know these values. Therefore, the statistical relationship would be explained by y=f(x) + ε.
[2]
Simple Linear Regression (SLR) Equation
Here,
‘simple’ means there is one predictor only!
In
an observational data sets, suppose we want to predict Y values based on the X,
explanatory variables, then we need to build up a statistical relationship
equation by using these variables. In equation, there should be a slope,
intercept, also an error term, as this is a statically relationship equation.
Therefore, the
equation is $Y_{i}= \beta_{0}+\beta_{1}X_{i}+ \varepsilon _{i}$
In
this equation, we know Y and X values from a data sets, but we don’t know what
the β0(intercept), β1(slope) and εi’s are. We can figure out the intercept and slope based on the analysis as they are constants. However, Y and ε are random values which means if we know its mean and variance, we will also know its distribution!!
Let's find out the error term's distribution first!
[3]
The SLR Assumptions: The Error terms’ Mean and Variance.
There
are three SLR assumptions regarding the error term. (1) E[εi] = 0,
(2) Var[εi] = σ2
(3) Cov[εi ,εj’s] = 0 The error terms are uncorrelated.
Therefore, the error terms’ distribution will be ε~ N(0, σ2)
The Y is also random value as the error term is a random. Therefore, we can find Y's mean and variance as well.
[4] The Y’s Mean and Variance
(1) E[Yi]= β0+ β1Xi
Proof: E[Y] = E[β0+ β1Xi + εi] = E[β0] + E[β1Xi ] + E[εi] = β0 + β1Xi
By assumption above E[εi]=0, and β0, β1 and Xi are constants.
(2) Var[Yi]=
σ2
Proof:
Var[Y] = Var[β0+ β1Xi + εi] = Var[εi] = σ2By assumption above Var[εi] = σ2, and β0+β1Xi is constants.
(3) Cov[Yi,
Yj]= 0
Proof:
Cov[β0+ β1Xi + εi, β0+ β1Xj + εj]= Cov [β0, β0] + Cov[β0, β1Xj] +… (expanding each term)..+ Cov[εi , εj]=0 As Cov [constant, random] = 0,
Only Cov[εi, εj] is left, and its value is 0 by assumption.
Remark!!
A statistical relationship between X and Y does NOT necessarily mean that X
causes Y, as these X and Y are an observational data.
No comments:
Post a Comment