Quick Answer
PSM Epidemiology and Biostatistics contributes 8–10 NEET PG questions per paper and is the highest-yield-per-hour topic in the syllabus. The exam-ready framework:
- Study designs — case-control gives OR (rare diseases), cohort gives RR/incidence, RCT is gold standard, cross-sectional gives prevalence.
- Measures of association — OR, RR, attributable risk, population attributable risk, NNT.
- Screening stats — sensitivity rules out (SnNout), specificity rules in (SpPin), PPV depends on prevalence.
- Inference — p <0.05 means <5% chance under null; CI gives precision and direction.
- Bias and confounding — selection, information, recall, lead-time, length-time, confounding (adjust) vs effect modification (stratify).
PSM Epidemiology and Biostatistics is famously low-volume and high-yield — a single afternoon of structured revision can lock down 8–10 marks across NEET PG, INI-CET and FMGE. Examiners rotate the same 12 concept families: study-design identification, 2×2 table calculations, screening-test math, Bayes' theorem flips, p-value vs CI interpretation, and the bias-confounding-effect-modification trio.
This NEETPGAI deep dive walks through every formula and every classic vignette structure, then layers India-specific examples (NTEP, NFHS, IDSP, polio surveillance) so you can decode exam stems quickly. Pair this with the PSM subject hub and the national health programs guide for full PSM fluency.
Study designs — the identification ladder
Examiners almost always test you on identifying the study design from a stem. Use this decision tree:
| Question in stem | Likely design |
|---|
| "Investigators followed 5,000 smokers and 5,000 non-smokers for 20 years..." | Prospective cohort |
| "Investigators identified 200 cases of lung cancer and 200 matched controls..." | Case-control |
| "10,000 patients randomised 1:1 to drug A vs placebo..." | RCT |
| "Survey of 1,000 households measured BP and salt intake at one time..." | Cross-sectional |
| "Researchers reviewed records from 1990 to 2010 for diabetes onset..." | Retrospective cohort |
| "Single hospital reported 3 unusual cases of..." | Case series |
Strengths and weaknesses (high-yield)
- RCT — gold standard for causation; expensive, ethical limits, poor generalisability.
- Cohort — gives incidence and RR; long, expensive, loss to follow-up.
- Case-control — fast, cheap, good for rare diseases; recall and selection bias, only OR.
- Cross-sectional — gives prevalence, hypothesis-generating; cannot establish temporality.
- Ecological — fast, uses aggregate data; ecological fallacy is the trap.
Levels of evidence (NEET PG order)
- Systematic review and meta-analysis of RCTs (highest)
- Single RCT
- Cohort
- Case-control
- Cross-sectional
- Case series and case reports
- Expert opinion (lowest)
Measures of association — 2×2 table mastery
Every NEET PG biostat question can be solved from a 2×2 table. Memorise this layout:
| Disease + | Disease − |
|---|
| Exposure + | a | b |
| Exposure − | c | d |
- Relative Risk (RR) = [a/(a+b)] / [c/(c+d)] — cohort and RCT.
- Odds Ratio (OR) = (a×d) / (b×c) — case-control. Approximates RR when disease is rare.
- Attributable Risk (AR) = a/(a+b) − c/(c+d) — risk in exposed due to exposure.
- Population Attributable Risk (PAR) = incidence in population − incidence in unexposed.
- Attributable Risk Percent (AR%) = (RR−1)/RR × 100 — proportion of disease in exposed that is because of exposure.
NNT and NNH
- Absolute Risk Reduction (ARR) = control event rate − treatment event rate.
- NNT = 1 / ARR. Treat NNT patients to prevent one event.
- NNH = 1 / ARI (absolute risk increase) — used for harms.
- Worked example: control mortality 10%, treated 6%, ARR = 0.04, NNT = 25. Treat 25 to save 1 life.
Screening tests — Sn, Sp, PPV, NPV, LR
| Metric | Formula | Interpretation |
|---|
| Sensitivity | TP / (TP + FN) | Rules out disease (SnNout) |
| Specificity | TN / (TN + FP) | Rules in disease (SpPin) |
| PPV | TP / (TP + FP) | Probability disease given + test |
| NPV | TN / (TN + FN) | Probability no disease given − test |
| LR+ | Sn / (1 − Sp) | >10 highly diagnostic |
| LR− | (1 − Sn) / Sp | <0.1 highly diagnostic |
| Accuracy | (TP + TN) / Total | Single number summary |
Prevalence dependence
- Sensitivity and specificity are intrinsic to the test — they don't change with prevalence.
- PPV and NPV depend on prevalence — high-prevalence settings boost PPV, low-prevalence settings boost NPV.
- A NEET PG classic: HIV ELISA used in low-prevalence general population produces many false positives despite 99.9% specificity.
ROC curves
- Plot of sensitivity (y) vs 1−specificity (x) across cutoff thresholds.
- AUC ranges 0.5 (useless) to 1.0 (perfect). AUC >0.9 = excellent.
- The ideal cutoff is the upper-left corner (Youden's index = Sn + Sp − 1, maximise).
Bayes' theorem and post-test probability
- Pre-test probability = prevalence (or clinician's estimate).
- Post-test probability = pre-test probability adjusted by likelihood ratio.
- Quick formula: post-test odds = pre-test odds × LR. Convert odds ↔ probability with p = odds/(1+odds).
Worked NEET PG example
Disease prevalence 1% (pre-test 0.01). Test sensitivity 99%, specificity 99% (LR+ = 99).
- Pre-test odds = 0.01/0.99 = 0.0101.
- Post-test odds = 0.0101 × 99 = 1.0.
- Post-test probability = 1/(1+1) = 50%.
Despite a "99% accurate" test, the PPV in a low-prevalence population is only 50% — Bayes' classic counter-intuitive trap.
p-value, confidence interval, and statistical significance
- p-value — probability of observing the data (or more extreme) under the null hypothesis. p <0.05 conventionally significant.
- Type I error (α) — false positive; reject true null. Conventional cutoff 5%.
- Type II error (β) — false negative; fail to reject false null. Power = 1 − β (typical target 80%).
- Confidence interval — range of values consistent with the data at a chosen level (usually 95%). Wider CI = less precision.
- CI vs p-value pearl: a 95% CI excluding the null value (1 for ratio, 0 for difference) corresponds to p <0.05.
Hypothesis testing tests at a glance
| Test | Use case |
|---|
| Student t-test | Compare two means (continuous, normal) |
| Paired t-test | Before-after on same subjects |
| ANOVA | Compare ≥3 means |
| Chi-square | Compare proportions (large samples) |
| Fisher exact | Compare proportions (small samples, expected <5) |
| Mann-Whitney U | Non-parametric two groups |
| Wilcoxon signed-rank | Non-parametric paired |
| Kruskal-Wallis | Non-parametric ≥3 groups |
| Pearson r | Correlation (parametric) |
| Spearman ρ | Correlation (non-parametric) |
| Log-rank | Compare Kaplan-Meier survival curves |
Bias, confounding, and effect modification
Bias (systematic error)
- Selection bias — non-random sampling. Examples: Berkson, healthy worker, prevalence-incidence.
- Information bias — measurement error. Includes recall bias (case-control classic), interviewer bias, misclassification.
- Lead-time bias — screening detects disease earlier; survival appears longer without true mortality benefit.
- Length-time bias — screening preferentially detects slow-growing tumours, inflating apparent survival.
Confounding
- A confounder is associated with both exposure and outcome but is not on the causal pathway.
- Control by: randomisation (RCT), restriction, matching, stratification, multivariate regression.
- Mantel-Haenszel adjusted estimate handles confounding when stratum-specific estimates are similar.
Effect modification
- The effect of exposure differs across strata of a third variable.
- Do not adjust away — report stratum-specific estimates.
- Example: aspirin reduces MI more in men than women (sex modifies effect).
High-yield NEET PG MCQ traps
- OR vs RR confusion — case-control gives OR only; cohort gives RR. Never report RR from a case-control.
- PPV is prevalence-dependent — same test, different population, different PPV.
- Lead-time vs length-time — lead-time is earlier detection; length-time is preferential detection of slow tumours.
- Recall bias — classic in case-control studies of diet and cancer.
- Power calculation — bigger sample, bigger effect size, lower variance, higher α all increase power.
- Ecological fallacy — group-level association does not equal individual-level association.
- Number needed to treat — always reported with the time horizon (NNT = 25 over 5 years, not lifetime).
- Intention-to-treat — analyse patients in the group they were randomised to, regardless of crossover. Preserves randomisation.
Recent updates and Indian context
- NFHS-5 (2019–21) is the current Indian household survey — high-yield items: under-5 mortality 32, IMR 35, NMR 25, total fertility rate 2.0, anaemia in women 57%, institutional delivery 89%.
- IDSP (Integrated Disease Surveillance Programme) — weekly P/L/S forms; outbreak alerts via IHIP portal.
- CRS (Civil Registration System) registers births, deaths, stillbirths — basis for vital statistics.
- SRS (Sample Registration System) is the gold standard for IMR, MMR, TFR estimates in India.
- Maternal mortality ratio in India: 97 per 100,000 live births (SRS 2018–20). Target SDG <70.
- Indian context for screening: NPCDCS uses opportunistic screening for diabetes and hypertension at HWCs (Health and Wellness Centres) — a recurring vignette in NEET PG and FMGE.
Frequently Asked Questions
What is the difference between odds ratio and relative risk?
Relative risk (RR) is the ratio of incidence in exposed to incidence in unexposed and is used in cohort studies and RCTs. Odds ratio (OR) is the ratio of odds and is used in case-control studies where incidence cannot be measured. OR approximates RR only when the disease is rare (under 10%).
How do you calculate sensitivity and specificity?
Sensitivity = true positives divided by all diseased (TP / (TP + FN)) — it tells you how well a test rules out disease. Specificity = true negatives divided by all non-diseased (TN / (TN + FP)) — it tells you how well a test rules in disease. Use SnNout and SpPin mnemonics.
What does a confidence interval crossing 1 mean?
For a ratio measure (OR, RR, hazard ratio), a 95% confidence interval that crosses 1 means the result is not statistically significant — the data are consistent with no effect. For a difference measure (mean difference, risk difference), the no-effect line is 0, not 1.
What is number needed to treat (NNT) and how is it calculated?
NNT is the number of patients you must treat to prevent one additional bad outcome. NNT = 1 divided by absolute risk reduction (ARR). Lower NNT means a more effective intervention. NNT of 10 means treat 10 to prevent 1 event. Always paired with NNH (number needed to harm) in modern guidelines.
How do you differentiate confounding from effect modification?
Confounding is when a third variable distorts the apparent association — adjust for it (stratify, multivariate). Effect modification is when the effect of exposure differs across strata of another variable — report stratum-specific estimates, do not adjust away. The Mantel-Haenszel test detects both.
This content is for educational purposes for NEET PG exam preparation. It is not a substitute for professional medical advice, diagnosis, or treatment. Clinical information has been reviewed by qualified medical professionals.
Written by: NEETPGAI Editorial Team
Reviewed by: Pending SME Review
Last reviewed: April 2026