Intro to Log-linear Model - IxJ Contingency Table

The main idea is to score between what we expected counts and what we observed counts. If the score is small, then both are close to each other.

The cell counts of a contingency table is a common use of a log-linear model which is an example of generalized linear model.

[1] IxJ Contingency Table
There are a row factor with I levels and a column factor with J levels.

 
[2] Notation and Hypothesis
2.1 The Joint distribution of A and B
The probability that an observation falls into row i, column j, for i=1,...,I, j=1,...,J $= P(A=i, B=j)= \pi_{ij}$

2.2 The Marginal distribution of A and B
The probability an observation falls into row I $= P(A=i)= \pi_{i \cdot}$
The probability an observation falls into column j $= P(B=i)= \pi_{ \cdot j}$

Our purpose is to compare the relationship A and B (whether A and B are independent or not) based on the joint distribution and marginal distributions.
Recall that two variables A and B are independent if and only if P(AB)=P(A)xP(B)

Therefore our hypothesis is...  
$H_0$ : $\pi_{ij}=\pi_{i\cdot}\pi_{\cdot j}$ There is NO relationship between A and B.
$H_1$ : $\pi_{ij} \neq \pi_{i\cdot}\pi_{\cdot j}$  


[3] Assumption
Our Assumption is overall total (grand total) number, n, is fixed. And the data should be counted.


[4] Test Statistic
Under the null hypothesis, we estimate the expected count, $\mu_{ij}$ for the (i,j)th cell.
$\hat{\mu_{ij}}= n \cdot \hat{\pi}_{i \cdot} \cdot \hat{\pi}_{\cdot j}=n \left ( \frac{y_{i \cdot}}{n} \right )\left ( \frac{y_{\cdot j}}{n} \right )= \frac{y_{i \cdot} y_{\cdot j}}{n}$

Test statistic is  $X^2= \sum_{i=1}^{I}\sum_{j=1}^{J} \frac{\left ( y_{ij}-\hat{\mu}_{ij} \right )^2}{\hat{\mu}_{ij}}$,
where under $H_0$ with large samples, $X^2= \chi^2_{df}$ with df=(I-1)(J-1)

[5] Conclusion
Chi-square test is to compare two values between Chi-squared observations and Chi-squared critical.
We reject the null hypothesis if Chi-squared observation is greater than Chi-square critical which can be found Chi-squared table.

In order to look up a critical value, we need a degree of freedom, level of significance and tail situation.

No comments:

Post a Comment