- Multivariate normal distribution implies a Chi-Square (n) distribution of Y=Sum( (X_i-mu_i)^2 / sigma_i^2 ).
- By CLT, the joint pdf of different groups of sample are treated multivariate normal.
- The statistic Q_k-1 = Sum( Y_i^2) = Sum [1..k] ( (X_i-E_i)^2/E_i ) has a Chi-Square(k-1) distribution.
- With the idea of interval frequency approximation, every distribution can be treated as Multi-binomial.
- Partition the domain of experiment result into finite mutually disjoint sets A_1, A_2 ... A_n
- Count the number of result in A_i as frequency X_i
- Assign df (different ways)
- Assign the probability of result in A_i as p_i (different ways)
- Evaluate statistic Q_k-1
- Test
Test | Goodness of fit | Homogeneity | Independence |
example | 5.7.1, 5.7.2 | 5.7.3 | 5.7.4 |
H0 | Result has the theoretical distribution | Two sets of sample have the same distribution | Two attributions of subjects are independent |
Key fact | X_i/n = p_0i | p_1i=p_2i=p_0i=E_i/n | P_ij=Pi*Pj |
source of E_i | multinomial model | MLE | MLE |
formula of E_i (1<=i<=k) | E_i=n*p_0i | E_i=(X_i1+X_i2)/(n1+n2) | E_ij=(X_i./n) * (X_.j/n) |
df | k-1 | k-1=2(k-1)-(k-1) | (a-1)(b-1)=(a*b-1)-(a+b-2) |
statistic | Sum [i=1..k] ( (X_i-E_i)^2/ E_i ) | Sum [j=1,2] [i=1..k]( (X_ij-n_j*E_i)^2/n_j*E_i ) | Sum [j=1,a] [i=1..b]( (X_ij-n*E_ij)^2/n*E_ij) |
Dataset | x_1, x_2 ... x_n | x_1, x_2,... x_n1 y_1, y_2 ... y_n2 |
contingency table Xij, 1<=i<=a, 1<=j<=b k=a*b |
4. Remarks
- Chi-Square tests are not exact test, but approximate test
- The statistic is based on frequency of result in interval, instead of the result itself
- Make sure E_i > 5 or use Fisher exact test
- Minimum Chi-Square estimation
- MLE based Chi-square tests have greater rejection rate than tests based on Minimum Chi-Square estimator
- Every estimated parameter p0i costs one df
No comments:
Post a Comment