Case Study : Poisson Regression Model

Case Study : Mating Success of Elephants


Joyce Poole studied a population of African elephants in Amboseli National Park for 8 years. The data contains the number of successful matings(from 0) and ages(at beginning between 27~52 years) of 41 male elephants. The main question is what the relationship between mating success and age is!


Reference: Ramsey, F.L. and Schafer, D.W. (2002). The Statistical Sleuth: A Course in Methods of Data Analysis (2nd ed), Duxbury. // Data from J.H. Poole, "Mate Guarding, Reproductive Success and Female Choice in African Elephants", Animal Behavior 37(1989): 842-49 In R: http://finzi.psych.upenn.edu/library/Sleuth3/html/case2201.html

[1] Data and Model
The predictor variable is AGE between 27 and 52 years, and the outcome is number of successful matings from 0. Note that E(Y)= $\mu$

As the outcome is counts and small numbers so that it will not follow a normal distribution on age. Also the outcome is not a binary nor binomial (b/c there is no fixed number of trials with a definite upper limit). Therefore underlying probability distribution of response is Poisson

Model: $g(\mu)=\log(\mu)= \mathbb{X}\beta = \beta_0+\beta_1 \cdot AGE_{i1}$,
where $\mu_i$ is the mean number of matings for an elephant of Age.
Then,  $\mu_i=\exp\left \{ \beta_0+\beta_1 \cdot AGE_{i1} \right \}$ If age increases by one unit, then $\mu$ changes by a factor of $\exp(\beta_1)$.

[2] SAS Code
Proc genmod can be used for any generalized linear model. 

proc genmod;
  model matings = age / dist=poisson;
run;

[3] SAS Result


3.1 Fitted model
$\hat{\log(\mu)}= - 1.5820 + 0.0687 \cdot AGE$
We have a small P-value ( < 0.001) based on the Wald test, there is strong evidence that the mean number of successful matings depends on Age. Therefore, for every 1-year increase in Age, the mean number of successful matings increases by a factor of exp(0.0687)=1.071 (~7%)   


3.2 Fitted Model vs Saturated Model
$H_0$ : Fitted model fits as well as saturated model.
$H_1$ : Saturated model (which uses indicator variables for each value of Age) fits better. 
Test Statistics (Deviance) : 51.0116
Distribution of test statistics : n-p = 41-2 = 39 (n= # sample size, p= # parameters tested $\beta_0, \beta_1$)
P-value should be calculated by R : 0.0943
Therefore, there is week evidence that fitted model fits as well as the saturated model.

3.3 Dispersion parameter
In order to assess the equality between mean and variance, we can check Deviance/DF=51.0116/39, and Pearson Chi-Square/DF =45.1360/39. Both values are close to 1 so we can see that Poisson model is appropriate.  

No comments:

Post a Comment