## Testing Association Between Two Categorical Variables ### Study Design Characteristics - **Two categorical variables** (smoking: yes/no; COPD: yes/no) - **Cross-sectional design** (not longitudinal) - **Large sample size** (n=500) - **Research question:** Is there an association between smoking and COPD? - **Data structure:** 2×2 contingency table ### Why Chi-Square Test of Independence? **Key Point:** The chi-square (χ²) test of independence is the standard parametric test for determining whether two categorical variables are statistically independent (unrelated) in a population. **High-Yield:** The chi-square statistic measures the discrepancy between observed frequencies and expected frequencies under the null hypothesis of independence: $$\chi^2 = \sum \frac{(O - E)^2}{E}$$ where O = observed frequency, E = expected frequency. ### Expected Contingency Table Structure | | COPD Yes | COPD No | Total | |---|---|---|---| | **Smoker** | a | b | a+b | | **Non-smoker** | c | d | c+d | | **Total** | a+c | b+d | 500 | Expected frequency for each cell: $E = \frac{(\text{row total}) \times (\text{column total})}{\text{grand total}}$ ### Assumptions for Chi-Square Test 1. **Categorical data** — both variables are nominal/ordinal ✓ 2. **Independence of observations** — each worker counted once ✓ 3. **Adequate sample size** — n=500 is large ✓ 4. **Expected frequencies** — typically ≥5 in each cell (usually met with n=500) **Mnemonic:** **CAIN** — **C**ategorical data, **A**dequate sample, **I**ndependence of observations, **N**ormal distribution NOT required (unlike t-test). ### Comparison of Tests | Test | Data Type | Purpose | Example | |------|-----------|---------|----------| | **Chi-square** | Categorical × Categorical | Test independence | Smoking vs. COPD | | **t-test** | Continuous × Categorical | Compare means | BP in males vs. females | | **Pearson r** | Continuous × Continuous | Correlation strength | Height vs. weight | | **ANOVA** | Continuous × Categorical (3+) | Compare 3+ means | Cholesterol across 4 age groups | **Clinical Pearl:** Chi-square is non-parametric in the sense that it does not assume normality, but it is a parametric test in that it uses the actual frequency distribution. It is robust and widely applicable in epidemiological surveys. ### Decision Tree ```mermaid flowchart TD A[What are you comparing?]:::decision --> B{Variable types?}:::decision B -->|Both categorical| C[Chi-square test]:::action B -->|One continuous, one categorical| D{How many categories?}:::decision D -->|2 groups| E[Unpaired t-test]:::action D -->|3+ groups| F[One-way ANOVA]:::action B -->|Both continuous| G[Pearson correlation]:::action ```
Sign up free to access AI-powered MCQ practice with detailed explanations and adaptive learning.