Stat n Math : Variance

[1] The Hill Estimator
The Hill estimator is one of ways of estimating the tail index alpha.
Suppose $X_{1}, X_{2},...,X_{n}$ are independent non-negative random variables.
* The Heavy-tailed data : 1-F(X) = $P(X_{i}>x)=x^{-\alpha}L(x)$,
where $\alpha(=\frac{1}{\gamma})$ > 0 is an unknown parameter (called the tail index) and it describes the heaviness of the right tail. And L(x) is an slowly varying function satisfying $\lim_{x\rightarrow \infty}\frac{L(tx)}{L(x)}=1$, t>0.
* Bias increases with k, variance decrease with k. Choice of k can be chosen by Hill plot.

[2] Kernel Density Estimation
The density estimator: $\hat{f_{h}x}= \frac{1}{nh}\sum_{i=1}^n w(\frac{x-x_{i}}{h})$, where w is a symmetric probability density.
* The bandwidth h: If h is too large, the estimator $\hat{f}$ is too smooth, whereas the h is too small, then the estimator $\hat{f}$ is too noisy. Therefore, the bias and variance depend on the bandwidth h!
* Remark) As the bias depends on the unknown density, therefore the choice of the bandwidth h is complicated.

[3] Non-Parametirc Regression using Kernel Smoothing
In non-parametriec regression, our model would be $y_{i}=g(x_{i})+ \varepsilon_{i}$.
We need an inference about g(x), smooth function of x. We can estimate g(x) by $\hat{g}(x)=\sum_{i\in S(x)} w_{i}(x)y_{i}$, where $S(x)= \{ |x_{i} -x |\leq h\}$, bandwidth h. Our $\hat{g}(x)$ will be a loess smoother which uses weighted linear. So the estimated function is smoother, but severly biased. The estimated function is less smooth indicating smaller bias, but large variance.

| Expected Value

Discrete Case: $\mathsf {E(x)= \sum x_{i} \cdot P(X=x_{i})}$

Continuous Case: $\mathsf {E(x)= \int_{-\infty}^{\infty} x \cdot f(x)dx }$

Properties (a, b $ \in \mathbb{R}$)

If X$\geq$ 0, then E(X)$\geq$ 0

Proof

$\mathsf { E(X)=\sum_{x}x\cdot P(X=x)=\sum_{x>0} x\cdot P(X=x)\geq \sum_{x>0} 0 \cdot P(X=x)=0}$

E(aX) = a E(X)

Proof

$\mathsf {E(aX)=\sum _{x}a\cdot x \cdot P(X=x)=a \sum_{x} x \cdot P(X=x)=a \cdot E(x) }$

E(X+Y) = E(X) + E(Y)

| Variance

$\mathsf{ Var= E(X^2)- E(X)^2 }$

Proof

$\mathsf{Var= E[x- E(x)^2] = E[(x- \mu)^2)]= \sum_{x} (x_{i}-\mu)^2 \cdot P(X=x)}$

$\mathsf{= \sum_{x}(x^2 - 2\mu x + \mu^2) \cdot P(X=x)} $

$\mathsf{ = \sum_{x}x^2\cdot P(X=x)-2\mu \cdot \sum_{x}x\cdot P(X=x)+ \mu^2 \sum_{x}P(X=x)}$

$\mathsf{=E(x^2)-2\mu^2+\mu^2 = E(x^2)-\mu^2 = E(x^2)-E(x)^2}$

Properties (a, b $ \in \mathbb{R}$)

Var(a)=0 (All values are same, then there is no variance)

Var (aX+b)= $a^2 \cdot$Var(x)

Proof

From $\mathsf {E(aX+b)=aE(X)+b}$, and $\mathsf {E[(aX+b)^2]=a^2E(X^2)+2abE(X)+b^2}$

$\mathsf {Var(aX+b)=E[(aX+b)^2]-[E(aX+b)]^2}$

$\mathsf {=a^2E(X^2)+2abE(X)+b^2-[aE(X)+b]^2}$

$\mathsf {=a^2E(X^2)+2abE(X)+b^2-[a^2(E(X))^2+2abE(X)+b^2]}$

$\mathsf {=a^2[E(X^2)-E(X)^2]=a^2\cdot Var(X)}$

Var(X+Y)= Var(X)+Var(Y)+ 2Cov(X,Y)

Proof

From $\mathsf {E(X+Y)=E(X)+E(Y)}$, and $\mathsf {E[(X+Y)^2]=E(X^2)+2E(XY)+E(Y^2)}$

$\mathsf {Var(X+Y)=E[(X+Y)^2]-[E(X+Y)]^2}$

$\mathsf {=E(X^2)+E(Y^2)+2E(XY)-[ (E(X))^2 + (E(Y))^2 +2E(X)E(Y)] }$

$\mathsf {=E(X^2)-(E(X))^2+E(Y^2)-(E(Y))^2+2 [E(XY)-E(X)E(Y)] }$

$\mathsf {=Var(X)+Var(Y)+2Cov(X,Y)}$

Var(X+Y)= Var(X)+Var(Y), iff X and Y are uncorrelated

| Covariance

A measure of how much two random variables change together!

Cov(X,Y)=E{ [X-E(X)][Y-E(Y)] } = E(XY)-E(X)E(Y)

|If X and Y are independent, then

P(X=x, Y=y)=P(X=x)P(Y=y)
E(XY)=E(X)E(Y),

Proof
$\mathsf {E(XY)=\sum_{x,y}xy \cdot P(X=x,Y=y)= \sum_{x}\sum_{y}xy \cdot P(X=x)P(Y=y)}$

$\mathsf {=\sum_{x} x\cdot P(X=x) \cdot \sum_{y}y \cdot P(Y=y)=E(X)E(Y)}$

Cov (X,Y)=0

Proof

E(XY)-E(X)E(Y)=E(X)E(Y)-E(X)E(Y)=0

Stat n Math

Pages

[The Variance/ Bias Tradeoff] The Hill Estimator, Kernel Density Estimation, Non-Parametirc Regression

Expected Value & Variance & Covariance

Search This Blog

Popular Posts