Intro to Poisson Regression Model
[1] When do we use Poisson Model?
Poisson Model is an example of Generalized Linear Model which is useful for counts of rare events.
If outcomes are counts and small numbers, it will not have a normal distribution so that we cannot apply linear regression. If outcomes are not binary, or if there is no a
fixed number of trials in outcomes, we cannot apply logistic regression neither.
[2] Model
Recall) Y ~ Poisson($\mu$), then the probability mass function is $P(Y=y)=\frac{\mu^y e^{-\mu}}{y!}$, y=0,1,...
And E(Y)=$\mu$, and Var(Y)=$\mu$.
As Poisson regression model is an example of a Generalized Linear Model, so we need a link function g($\cdot$) in order to make a linear relationship between y and X.
Poisson link function is $g(\mu)=\log(\mu)$, so model E(Y) has a linear function in the parameters, $g(E(Y))= \beta_0+\beta_1 X_1+...+ \beta_pX_p=\mathbb{X}\beta$
Therefore, Poisson regression model is also called "log-linear" model.
So, if one of the explanatory variable $x_j$ increases by one unit, holding other variables constant, $\mu_j$ changes by a factor of exp($\beta_j$).
[3] Estimation of Model Parameters
So, how can we estimate model parameters? - We can use Maximum Likelihood Estimation.
Likelihood function: $L=\prod_{i=1}^{n}\frac{\mu^{y_i} e^{-\mu_i}}{y_i!}$
The Full log-likelihood function : $\sum_{i=1}^{n} \left \{ y_i \cdot log(\mu_i)-\mu_i - \log(y_i!)\right \}$
Note that, in SAS, the log-likelihood is $\sum_{i=1}^{n} \left \{ y_i \cdot log(\mu_i)-\mu_i \right \}$, which constant ($- \log(y_i!)$ ) is deleted so that log-likelihood is larger than the full log-likelihood
So, SAS or R will automatically find the coefficient of parameters.
[4] Model Assessment
Note that this has similar procedure as in Binomial Logistic Regression.
4.1 Checking for linear relationship : plot $- \log(y_i!)$ versus X's
4.2 Correct form (whether $\beta's$ are zero or not!): Check Wald and LRT tests
Wald Procedure
Hypothesis : $H_{0}:\beta_{j}=0$ which means $X_{j}$ has no effect on log-odds!!, $H_{1}:\beta_{j}\neq 0$
Test Statistics : $Z_{obs}=\frac{\hat{\beta_{j}}}{se(\hat{\beta_{j}})}$
where $\hat{\beta_{j}}$ is a maximum likelihood estimate.
Note that if there is large enough sample, MLE's are normally distributed so that under t$H_{0}$, our test statistics, $Z_{obs}$, is an observation from an approximate Normal(0,1) distribution!!
95% Confidence interval : $\hat{\beta_{j}}\pm 1.96 \cdot se(\hat{\beta_{j}})$
LRT (Likelihood Ratio Test)
Likelihood Ratio : $\frac{L_{R}}{L_{F}}$, where $L_{R}$ is reduced model, and $L_{F}$ is full model of same data.
Hypothesis :
$H_{0}: \beta_{1}=...=\beta_{k}=0$ Reduced model is appropriate so that it fits data as well as full model.
$H_{1}:$ at least one $\beta_{1}=...=\beta_{k}\neq 0$
Test Statistics : $G^2=-2 \log L_{R}-(-2\log L_{F})=-2 \log \frac{L_R}{L_F}$
Note that, under the null hypothesis, $G^2$ is an observation from a chi-square distribution with k degrees of freedom for large n! k is the number of parameter fewer in reduced model.
4.3 Adequate Fit: Deviance GOF test
$H_0$ : The Fitted model is enough vs $H_1$ : Saturated model is required.
The smaller deviance implies fitted model is good enough for the data.
Wait! What is the Deviance?
Deviance = $-2 \log \frac{L_F}{L_S}= -2 (\log L_F-\log L_S)=2(\log L_S-\log L_F)$, where
the log-likelihood is $-2 \log \frac{L_F}{L_S}= -2 (\log L_F-\log L_S)=2(\log L_S-\log L_F)$
Recall) Likelihood is $L=\prod_{i=1}^{n}\binom{m_i}{y_i}\pi^{y_i}_i (1-\pi_i)^{m_i-y_i}$ (m= fixed number of trials, n=total sample size)
4.4 Outliers: Look at residuals, Deviance and Pearson residuals.
Both are less than |2|, then we consider that there is no outliers.
In SAS, use reschi for Pearson residuals, resdev for deviance in proc statement.
[5] Common problem (Dispersion Parameter)
In Poisson model, we expect the variance to be equal to the mean, $\mu$. However sometimes variance is larger than its mean.
$\frac{deviance}{df}$, $\frac{Pearson \chi}{df}$ are closed to 1, then Poisson model is appropriate. But if not, then we need to add an extra parameter to get balanced between variance and mean in the model.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment