These R code chunks are largely equivalent to the SPSS syntax that was used to produce the output that is shown on the workshop slides. If you prefer to use R, you can use the following scripts as a guide. Note that the output produced by this code is not necessarily exactly the same as that produced by the SPSS syntax.
We will use a number of packages including “survival” and “survminer”. Refer to the list below for other packages used. If you haven’t already installed them, you will need to do this before using the library() function.
library(survival)
library(survminer)
library(ggplot2)
library(ggfortify)
library(My.stepwise)
library(gt)
library(gtsummary)
The data file is WHAS500data.csv
This is found at the Wiley website here:
Create the Kaplan-Meier survival object and include Gender as a factor.
We create a Surv() object then parse it to the survfit() function.
kmfit <- survfit(Surv(lenfol, fstat) ~ Gender, data = whas500)
# summary(kmfit) # this is suppressed to save space.
plot(
kmfit,
mark = "+",
main = "Kaplan-Meier curve",
xlab = "Total length of follow up (days)",
ylab = "Cumulative Survival"
)
This includes confidence interval shading by default.
ggplot2::autoplot(kmfit,
main = "Kaplan-Meier curve with ggplot2::autoplot",
xlab = "Total length of follow up (days)",
ylab = "Cumulative Survival"
)
ggsurvplot(
kmfit,
data = whas500,
conf.int = T,
legend = "bottom",
main = "Kaplan-Meier curve with survminer::ggsurvplot",
xlab = "Total length of follow up (days)",
ylab = "Cumulative Survival",
break.time.by = 365,
ggtheme = theme_minimal()
)
Start with stepwise selection of covariates.
# define list of variables to select from
var.select <-
c(
"Gender",
"Age",
"hr",
"sysbp",
"diabp",
"bmi",
"cvd",
"afb",
"sho",
"chf",
"av3",
"miord",
"mitype",
"year"
)
# Use the My.stepwise package to create the stepwise selection model
coxstep <-
My.stepwise.coxph(
Time = "lenfol",
Status = "fstat",
variable.list = var.select,
data = whas500
)
## # --------------------------------------------------------------------------------------------------
## # Initial Model:
## Call:
## coxph(formula = formula, data = data, method = "efron")
##
## n= 500, number of events= 215
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Age 0.066339 1.068589 0.006079 10.91 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Age 1.069 0.9358 1.056 1.081
##
## Concordance= 0.731 (se = 0.018 )
## Likelihood ratio test= 142.1 on 1 df, p=<2e-16
## Wald test = 119.1 on 1 df, p=<2e-16
## Score (logrank) test = 126.6 on 1 df, p=<2e-16
##
## # --------------------------------------------------------------------------------------------------
## ### iter num = 1, Forward Selection by LR Test: + chf
## Call:
## coxph(formula = Surv(lenfol, fstat) ~ Age + chf, data = data,
## method = "efron")
##
## n= 500, number of events= 215
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Age 0.058687 1.060443 0.006097 9.626 < 2e-16 ***
## chf 0.862128 2.368195 0.142333 6.057 1.39e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Age 1.060 0.9430 1.048 1.073
## chf 2.368 0.4223 1.792 3.130
##
## Concordance= 0.757 (se = 0.017 )
## Likelihood ratio test= 178.2 on 2 df, p=<2e-16
## Wald test = 160.8 on 2 df, p=<2e-16
## Score (logrank) test = 174.4 on 2 df, p=<2e-16
##
## --------------- Variance Inflating Factor (VIF) ---------------
## Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
## Age chf
## 1.000015 1.000015
## # --------------------------------------------------------------------------------------------------
## ### iter num = 2, Forward Selection by LR Test: + sho
## Call:
## coxph(formula = Surv(lenfol, fstat) ~ Age + chf + sho, data = data,
## method = "efron")
##
## n= 500, number of events= 215
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Age 0.059348 1.061145 0.006169 9.621 < 2e-16 ***
## chf 0.821392 2.273663 0.143303 5.732 9.93e-09 ***
## sho 0.892605 2.441482 0.261365 3.415 0.000637 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Age 1.061 0.9424 1.048 1.074
## chf 2.274 0.4398 1.717 3.011
## sho 2.441 0.4096 1.463 4.075
##
## Concordance= 0.762 (se = 0.017 )
## Likelihood ratio test= 187.4 on 3 df, p=<2e-16
## Wald test = 169.6 on 3 df, p=<2e-16
## Score (logrank) test = 187.4 on 3 df, p=<2e-16
##
## --------------- Variance Inflating Factor (VIF) ---------------
## Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
## Age chf sho
## 1.001207 1.001959 1.003161
## # --------------------------------------------------------------------------------------------------
## ### iter num = 3, Forward Selection by LR Test: + hr
## Call:
## coxph(formula = Surv(lenfol, fstat) ~ Age + chf + sho + hr, data = data,
## method = "efron")
##
## n= 500, number of events= 215
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Age 0.059619 1.061432 0.006251 9.537 < 2e-16 ***
## chf 0.708127 2.030184 0.146989 4.818 1.45e-06 ***
## sho 0.966808 2.629538 0.261148 3.702 0.000214 ***
## hr 0.009241 1.009284 0.002832 3.263 0.001103 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Age 1.061 0.9421 1.049 1.075
## chf 2.030 0.4926 1.522 2.708
## sho 2.630 0.3803 1.576 4.387
## hr 1.009 0.9908 1.004 1.015
##
## Concordance= 0.768 (se = 0.016 )
## Likelihood ratio test= 197.8 on 4 df, p=<2e-16
## Wald test = 176.1 on 4 df, p=<2e-16
## Score (logrank) test = 197.2 on 4 df, p=<2e-16
##
## --------------- Variance Inflating Factor (VIF) ---------------
## Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
## Age chf sho hr
## 1.003032 1.028002 1.014069 1.035399
## # --------------------------------------------------------------------------------------------------
## ### iter num = 4, Forward Selection by LR Test: + diabp
## Call:
## coxph(formula = Surv(lenfol, fstat) ~ Age + chf + sho + hr +
## diabp, data = data, method = "efron")
##
## n= 500, number of events= 215
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Age 0.054002 1.055487 0.006380 8.464 < 2e-16 ***
## chf 0.691666 1.997040 0.146792 4.712 2.45e-06 ***
## sho 1.061420 2.890474 0.263954 4.021 5.79e-05 ***
## hr 0.012060 1.012133 0.002946 4.094 4.24e-05 ***
## diabp -0.011587 0.988480 0.003480 -3.330 0.00087 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Age 1.0555 0.9474 1.0424 1.0688
## chf 1.9970 0.5007 1.4977 2.6628
## sho 2.8905 0.3460 1.7230 4.8489
## hr 1.0121 0.9880 1.0063 1.0180
## diabp 0.9885 1.0117 0.9818 0.9952
##
## Concordance= 0.778 (se = 0.016 )
## Likelihood ratio test= 209.2 on 5 df, p=<2e-16
## Wald test = 188.5 on 5 df, p=<2e-16
## Score (logrank) test = 211.4 on 5 df, p=<2e-16
##
## --------------- Variance Inflating Factor (VIF) ---------------
## Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
## Age chf sho hr diabp
## 1.007083 1.029705 1.012017 1.077849 1.048180
## # --------------------------------------------------------------------------------------------------
## ### iter num = 5, Forward Selection by LR Test: + bmi
## Call:
## coxph(formula = Surv(lenfol, fstat) ~ Age + chf + sho + hr +
## diabp + bmi, data = data, method = "efron")
##
## n= 500, number of events= 215
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Age 0.048046 1.049219 0.006625 7.252 4.10e-13 ***
## chf 0.693164 2.000034 0.146359 4.736 2.18e-06 ***
## sho 1.109646 3.033285 0.264957 4.188 2.81e-05 ***
## hr 0.012054 1.012126 0.002942 4.098 4.17e-05 ***
## diabp -0.011152 0.988910 0.003524 -3.165 0.00155 **
## bmi -0.045766 0.955266 0.016063 -2.849 0.00438 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Age 1.0492 0.9531 1.0357 1.0629
## chf 2.0000 0.5000 1.5013 2.6645
## sho 3.0333 0.3297 1.8046 5.0985
## hr 1.0121 0.9880 1.0063 1.0180
## diabp 0.9889 1.0112 0.9821 0.9958
## bmi 0.9553 1.0468 0.9257 0.9858
##
## Concordance= 0.783 (se = 0.015 )
## Likelihood ratio test= 217.6 on 6 df, p=<2e-16
## Wald test = 193.6 on 6 df, p=<2e-16
## Score (logrank) test = 218.2 on 6 df, p=<2e-16
##
## --------------- Variance Inflating Factor (VIF) ---------------
## Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
## Age chf sho hr diabp bmi
## 1.096085 1.032883 1.013429 1.078315 1.056359 1.102805
## # --------------------------------------------------------------------------------------------------
## ### iter num = 6, Forward Selection by LR Test: + year
## Call:
## coxph(formula = Surv(lenfol, fstat) ~ Age + chf + sho + hr +
## diabp + bmi + year, data = data, method = "efron")
##
## n= 500, number of events= 215
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Age 0.046718 1.047827 0.006572 7.109 1.17e-12 ***
## chf 0.671303 1.956785 0.147820 4.541 5.59e-06 ***
## sho 1.139317 3.124633 0.265414 4.293 1.77e-05 ***
## hr 0.011292 1.011356 0.002938 3.843 0.000122 ***
## diabp -0.011239 0.988823 0.003510 -3.202 0.001365 **
## bmi -0.048607 0.952555 0.016193 -3.002 0.002684 **
## year 0.239734 1.270911 0.099118 2.419 0.015577 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Age 1.0478 0.9544 1.0344 1.0614
## chf 1.9568 0.5110 1.4646 2.6144
## sho 3.1246 0.3200 1.8573 5.2568
## hr 1.0114 0.9888 1.0055 1.0172
## diabp 0.9888 1.0113 0.9820 0.9956
## bmi 0.9526 1.0498 0.9228 0.9833
## year 1.2709 0.7868 1.0465 1.5434
##
## Concordance= 0.787 (se = 0.015 )
## Likelihood ratio test= 223.5 on 7 df, p=<2e-16
## Wald test = 205.2 on 7 df, p=<2e-16
## Score (logrank) test = 228.9 on 7 df, p=<2e-16
##
## --------------- Variance Inflating Factor (VIF) ---------------
## Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
## Age chf sho hr diabp bmi year
## 1.099424 1.040276 1.012958 1.083588 1.058247 1.100654 1.016588
## # --------------------------------------------------------------------------------------------------
## ### iter num = 7, Forward Selection by LR Test: + Gender
## Call:
## coxph(formula = Surv(lenfol, fstat) ~ Age + chf + sho + hr +
## diabp + bmi + year + Gender, data = data, method = "efron")
##
## n= 500, number of events= 215
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Age 0.048777 1.049987 0.006590 7.402 1.34e-13 ***
## chf 0.711200 2.036433 0.148769 4.781 1.75e-06 ***
## sho 1.162817 3.198933 0.265866 4.374 1.22e-05 ***
## hr 0.011718 1.011787 0.002910 4.027 5.64e-05 ***
## diabp -0.011918 0.988153 0.003471 -3.433 0.000596 ***
## bmi -0.051328 0.949967 0.016580 -3.096 0.001963 **
## year 0.247974 1.281426 0.099540 2.491 0.012731 *
## Gender -0.310662 0.732962 0.144468 -2.150 0.031525 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Age 1.0500 0.9524 1.0365 1.0636
## chf 2.0364 0.4911 1.5214 2.7259
## sho 3.1989 0.3126 1.8998 5.3865
## hr 1.0118 0.9884 1.0060 1.0176
## diabp 0.9882 1.0120 0.9815 0.9949
## bmi 0.9500 1.0527 0.9196 0.9813
## year 1.2814 0.7804 1.0543 1.5575
## Gender 0.7330 1.3643 0.5522 0.9729
##
## Concordance= 0.787 (se = 0.015 )
## Likelihood ratio test= 228.1 on 8 df, p=<2e-16
## Wald test = 209.6 on 8 df, p=<2e-16
## Score (logrank) test = 233.4 on 8 df, p=<2e-16
##
## --------------- Variance Inflating Factor (VIF) ---------------
## Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
## Age chf sho hr diabp bmi year Gender
## 1.137430 1.054373 1.013467 1.085882 1.060169 1.110415 1.017812 1.083232
## # ==================================================================================================
## *** Stepwise Final Model (in.lr.test: sle = 0.15; out.lr.test: sls = 0.15; variable selection restrict in vif = 999):
## Call:
## coxph(formula = Surv(lenfol, fstat) ~ Age + chf + sho + hr +
## diabp + bmi + year + Gender, data = data, method = "efron")
##
## n= 500, number of events= 215
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Age 0.048777 1.049987 0.006590 7.402 1.34e-13 ***
## chf 0.711200 2.036433 0.148769 4.781 1.75e-06 ***
## sho 1.162817 3.198933 0.265866 4.374 1.22e-05 ***
## hr 0.011718 1.011787 0.002910 4.027 5.64e-05 ***
## diabp -0.011918 0.988153 0.003471 -3.433 0.000596 ***
## bmi -0.051328 0.949967 0.016580 -3.096 0.001963 **
## year 0.247974 1.281426 0.099540 2.491 0.012731 *
## Gender -0.310662 0.732962 0.144468 -2.150 0.031525 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Age 1.0500 0.9524 1.0365 1.0636
## chf 2.0364 0.4911 1.5214 2.7259
## sho 3.1989 0.3126 1.8998 5.3865
## hr 1.0118 0.9884 1.0060 1.0176
## diabp 0.9882 1.0120 0.9815 0.9949
## bmi 0.9500 1.0527 0.9196 0.9813
## year 1.2814 0.7804 1.0543 1.5575
## Gender 0.7330 1.3643 0.5522 0.9729
##
## Concordance= 0.787 (se = 0.015 )
## Likelihood ratio test= 228.1 on 8 df, p=<2e-16
## Wald test = 209.6 on 8 df, p=<2e-16
## Score (logrank) test = 233.4 on 8 df, p=<2e-16
##
## --------------- Variance Inflating Factor (VIF) ---------------
## Multicollinearity Problem: Variance Inflating Factor (VIF) is bigger than 10 (Continuous Variable) or is bigger than 2.5 (Categorical Variable)
## Age chf sho hr diabp bmi year Gender
## 1.137430 1.054373 1.013467 1.085882 1.060169 1.110415 1.017812 1.083232
These will use the survival::coxph() function
Model 1: null model
cox01 <- coxph(Surv(lenfol, fstat) ~ 1, data = whas500)
AIC01 <- extractAIC(cox01)
Model 2: Age
cox02 <- coxph(Surv(lenfol, fstat) ~ Age, data = whas500)
AIC02 <- extractAIC(cox02)
Model 3: Gender
cox03 <- coxph(Surv(lenfol, fstat) ~ Gender, data = whas500)
AIC03 <-extractAIC(cox03)
Model 4: BMI
cox04 <- coxph(Surv(lenfol, fstat) ~ bmi, data = whas500)
AIC04 <- extractAIC(cox04)
Model 5: Heart Rate
cox05 <- coxph(Surv(lenfol, fstat) ~ hr, data = whas500)
AIC05 <- extractAIC(cox05)
Model 6: Age + Gender + BMI
cox06 <- coxph(Surv(lenfol, fstat) ~ Age + Gender + bmi, data = whas500)
AIC06 <- extractAIC(cox06)
Model 7: Age + Gender + HR
cox07 <- coxph(Surv(lenfol, fstat) ~ Age + Gender + hr, data = whas500)
AIC07 <- extractAIC(cox07)
Model 8: Age + BMI + HR
cox08 <- coxph(Surv(lenfol, fstat) ~ Age + bmi + hr, data = whas500)
AIC08 <- extractAIC(cox08)
Model 9: Gender + BMI + HR
cox09 <- coxph(Surv(lenfol, fstat) ~ Gender + bmi + hr, data = whas500)
AIC09 <- extractAIC(cox09)
Model 10: Age+Gender+BMI+HR
cox10 <- coxph(Surv(lenfol, fstat) ~ Age + Gender + bmi + hr, data = whas500)
AIC10 <- extractAIC(cox10)
Create a table of the AIC values.
To be done ####
Create a variable for BMI categories
bmi.cat <- cut(whas500$bmi, breaks = c(0, 18.5, 25, 30, 9999), labels = c("underweight", "normal weight", "overweight", "obese"), right = F)
whas500$bmi.cat <- bmi.cat
Have a look at the Kaplan-Meier curves with BMI categories
km.bmi.cat <- survfit(Surv(lenfol, fstat) ~ bmi.cat, data = whas500)
ggplot2::autoplot(km.bmi.cat,
main = "Kaplan-Meier curve for BMI categories",
xlab = "Total length of follow up (days)",
ylab = "Cumulative Survival"
)
Create models including the BMI category variable
cox11 <- coxph(Surv(lenfol, fstat) ~ Age + Gender + bmi.cat + hr, data = whas500)
AIC11 <- extractAIC(cox11)
cox11.fit <- survfit(cox11)
gtsummary::tbl_regression(cox11, exponentiate = T)
Characteristic | HR1 | 95% CI1 | p-value |
---|---|---|---|
Age | 1.06 | 1.05, 1.08 | <0.001 |
Gender | 0.82 | 0.62, 1.08 | 0.2 |
bmi.cat | |||
underweight | — | — | |
normal weight | 0.78 | 0.47, 1.30 | 0.3 |
overweight | 0.48 | 0.28, 0.83 | 0.008 |
obese | 0.54 | 0.29, 1.00 | 0.050 |
hr | 1.01 | 1.01, 1.02 | <0.001 |
1
HR = Hazard Ratio, CI = Confidence Interval
|
Cox survival curves using survminer::ggadjustedcurves
ggadjustedcurves(cox11, data = whas500, variable = "bmi.cat")
Have a look at the plot for the Gender variable (Model 3) using survminer::ggcoxdiagnostics
ggcoxdiagnostics(cox03, type = "schoenfeld", ox.scale = "time")
The residual plot looks OK. The blue dashed line does not significantly depart from the null assumption of the red dashed line.
Note: This is not available through the SPSS syntax.
cox03.test <- cox.zph(cox03)
cox03.test
## chisq df p
## Gender 0.755 1 0.38
## GLOBAL 0.755 1 0.38
ggcoxzph(cox03.test)
The plot and the chi-sq statistic indicate no concern regarding the assumption of proportional hazards for the Gender variable.
see from slide 72
add KM curves for Gender and BMI_cat
For model 11 we will separately test the following interaction terms: TimeGender TimeAge TimeBMI_cat TimeHR
Refer to the pseudo R sq from coxph()
Looking at the output below, it does not seem to be there. Need to check.
summary(cox11)
## Call:
## coxph(formula = Surv(lenfol, fstat) ~ Age + Gender + bmi.cat +
## hr, data = whas500)
##
## n= 500, number of events= 215
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Age 0.060908 1.062801 0.006523 9.337 < 2e-16 ***
## Gender -0.201298 0.817669 0.143506 -1.403 0.16070
## bmi.catnormal weight -0.249975 0.778820 0.260892 -0.958 0.33798
## bmi.catoverweight -0.737154 0.478474 0.279289 -2.639 0.00831 **
## bmi.catobese -0.613867 0.541254 0.313726 -1.957 0.05038 .
## hr 0.012671 1.012751 0.002777 4.563 5.04e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## Age 1.0628 0.9409 1.0493 1.0765
## Gender 0.8177 1.2230 0.6172 1.0833
## bmi.catnormal weight 0.7788 1.2840 0.4671 1.2987
## bmi.catoverweight 0.4785 2.0900 0.2768 0.8272
## bmi.catobese 0.5413 1.8476 0.2927 1.0010
## hr 1.0128 0.9874 1.0073 1.0183
##
## Concordance= 0.753 (se = 0.017 )
## Likelihood ratio test= 173.2 on 6 df, p=<2e-16
## Wald test = 151.3 on 6 df, p=<2e-16
## Score (logrank) test = 168.4 on 6 df, p=<2e-16