Analysis Plan
The AI read your question, dataset profile, and variable definitions and proposed the plan below — including outcome, predictors, and model type. Review and confirm before running.
How the system interpreted your question
AI ParsedOutcomesurvival_90d (binary: 1 = survived at 90 days)
Primary exposuretreatment_group (A = reference, B = intervention)
DirectionSuperiority: B > A
Covariatesage · sex · baseline_score · comorbidity_index · site
ObjectiveMeasure the adjusted treatment effect on 90-day survival; test whether Group B is superior to standard care.
If this doesn't match your intent, go back and revise your question or variable definitions — then regenerate the plan.
Model Selection
AI GeneratedModel Family
Logistic regression (binomial family, logit link)
Formula
survival_90d ~ treatment_group + age + sex + baseline_score + site + comorbidity_index + treatment_group:baseline_scoreWhy this model
- Binary outcome (0/1) with ~12% event rate — logistic regression is the canonical choice.
- Interaction term treatment_group × baseline_score included per protocol hypothesis (does benefit differ by baseline severity?).
- Site included as fixed effect (8 levels) given study-level sample size; mixed model explored as sensitivity analysis.
- Complete-case analysis as primary; multiple imputation included as pre-specified sensitivity for age / baseline_score missingness.
Planned Diagnostics
1Complete-separation detection (Hauck–Donner effect scan)
2Variance Inflation Factor (VIF) for all predictors
3Hosmer–Lemeshow goodness-of-fit (decile groups)
4ROC / AUC with 1 000-replicate bootstrap 95% CI (seed 42)
5Calibration plot (predicted vs observed decile frequency)
6Influence analysis: Cook's distance · dfBetas
Safeguards & Fallbacks
ActiveSeparation detected → refit with Firth-penalised logistic regression (logistf / statsmodels penalized).
Max VIF > 10 → flag, suspend interaction term, re-evaluate collinear predictor.
EPP = 7.8 — advisory flag raised (below recommended threshold of 10). Analysis proceeds with caution; interpret coefficient estimates carefully.
All random operations seeded: seed = 42.
Reproducibility
LockedRandom seed
42
Python
3.11.8
statsmodels
0.14.1
pandas
2.2.0
numpy
1.26.4
matplotlib
3.8.3
scikit_learn
1.4.1