Exercise 2. Comparing survival proportions and mortality rates by stage for cause-specific and all-cause survival

The purpose of this exercise is to study survival of the patients using two alternative measures — survival proportions and mortality rates. A second purpose is to study the difference between cause-specific and all-cause survival.

Load dependencies

library(biostat3) # loads the survival and muhaz packages
library(dplyr)    # for data manipulation

We start by listing the first few observations to get an idea about the data. We then define two 1/0 variables for the events that we are interested in.

## Examine the data
data(melanoma)
head(melanoma)
##      sex age     stage mmdx yydx surv_mm surv_yy       status
## 1 Female  81 Localised    2 1981    26.5     2.5  Dead: other
## 2 Female  75 Localised    9 1975    55.5     4.5  Dead: other
## 3 Female  78 Localised    2 1978   177.5    14.5  Dead: other
## 4 Female  75   Unknown    8 1975    29.5     2.5 Dead: cancer
## 5 Female  81   Unknown    7 1981    57.5     4.5  Dead: other
## 6 Female  75 Localised    9 1975    19.5     1.5 Dead: cancer
##            subsite        year8594         dx       exit agegrp id
## 1    Head and Neck Diagnosed 75-84 1981-02-02 1983-04-20    75+  1
## 2    Head and Neck Diagnosed 75-84 1975-09-21 1980-05-07    75+  2
## 3            Limbs Diagnosed 75-84 1978-02-21 1992-12-07    75+  3
## 4 Multiple and NOS Diagnosed 75-84 1975-08-25 1978-02-08    75+  4
## 5    Head and Neck Diagnosed 75-84 1981-07-09 1986-04-25    75+  5
## 6            Trunk Diagnosed 75-84 1975-09-03 1977-04-19    75+  6
##        ydx    yexit
## 1 1981.088 1983.298
## 2 1975.720 1980.348
## 3 1978.140 1992.934
## 4 1975.646 1978.104
## 5 1981.517 1986.312
## 6 1975.671 1977.296
## Create 0/1 outcome variables
melanoma <- 
    transform(melanoma,
              death_cancer = ifelse( status == "Dead: cancer", 1, 0),
              death_all = ifelse( status == "Dead: cancer" |
                                  status == "Dead: other", 1, 0))

(a) Plot estimates of the survivor function and hazard function by stage.

We now tabulate the distribution of the melanoma patients by cancer stage at diagnosis.

## Tabulate by stage
Freq <- xtabs(~stage, data=melanoma)
cbind(Freq, Prop=prop.table(Freq))

We then plot the survival by stage. Does it appear that stage is associated with patient survival?

par(mfrow=c(1, 2))
mfit <- survfit(Surv(surv_mm, death_cancer) ~ stage, data = melanoma)

plot(mfit, col=1:4,
     xlab = "Follow-up Time",
     ylab = "Survival")
## legend("topright", levels(melanoma$stage), col=1:4, lty = 1)

hazards <- muhaz2(Surv(surv_mm, death_cancer)~stage, melanoma)
plot(hazards,
     col=1:4, lty=1, xlim=c(0,250), ylim=c(0,0.08),
     legend.args=list(bty="n"))

(b) Estimate the mortality rates for each stage using, for example, the survRate command.

survRate(Surv(surv_mm/12, death_cancer) ~ stage, data=melanoma)
melanoma %>%
    select(death_cancer, surv_mm, stage) %>%
    group_by(stage) %>%
    summarise(D = sum(death_cancer), M = sum(surv_mm/12), Rate = D/M,
              CI_low = stats::poisson.test(D,M)$conf.int[1],
              CI_high = stats::poisson.test(D,M)$conf.int[2]) 

What are the units of the estimated rates? The survRate function, as the name suggests, is used to estimates rates. Look at the help pages if you are not familiar with the function.

(c) If you haven’t already done so, estimate the mortality rates for each stage per 1000 person-years of follow-up.

survRate(Surv(surv_mm/12/1000, death_cancer) ~ stage, data=melanoma)

(d)

Study whether survival is different for males and females (both by plotting the survivor function and by tabulating mortality rates).

par(mfrow=c(1, 2))
sfit <- survfit(Surv(surv_mm, death_cancer) ~ sex, data = melanoma)

plot(sfit, col=1:2,
     xlab = "Follow-up Time",
     ylab = "Survival")

plot(muhaz2(Surv(surv_mm,death_cancer)~sex, data = melanoma), 
     col = 1:2, lty = 1)

Is there a difference in survival between males and females? If yes, is the difference present throughout the follow up?

(e)

The plots you made above were based on cause-specific survival (i.e., only deaths due to cancer are counted as events, deaths due to other causes are censored). In the next part of this question we will estimate all-cause survival (i.e., any death is counted as an event). First, however, study the coding of vital status and tabulate vital status by age group.

How many patients die of each cause? Does the distribution of cause of death depend on age?

xtabs(~status+agegrp, melanoma)

(f)

To get all-cause survival, specify all deaths (both cancer and other) as events.

Now plot the survivor proportion for all-cause survival by stage. We name the graph to be able to separate them in the graph window. Is the survivor proportion different compared to the cause-specific survival you estimated above? Why?

par(mfrow=c(1, 1))
afit <- survfit(Surv(surv_mm, death_all) ~ stage, data = melanoma)
plot(afit, col=1:4,
     xlab = "Follow-up Time",
     ylab = "Survival",
     main = "Kaplan-Meier survival estimates\nAll-cause")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)

(g)

It is more common to die from a cause other than cancer in older ages. How does this impact the survivor proportion for different stages? Compare cause-specific and all-cause survival by plotting the survivor proportion by stage for the oldest age group (75+ years) for both cause-specific and all-cause survival.

par(mfrow=c(1, 2))
mfit75 <- survfit(Surv(surv_mm, death_cancer) ~ stage, data = subset(melanoma,agegrp=="75+"))
plot(mfit75, col=1:4,
     xlab = "Follow-up Time",
     ylab = "Survival",
     main = "Kaplan-Meier survival estimates\nCancer | Age 75+")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)

afit75 <- survfit(Surv(surv_mm, death_all) ~ stage, data = subset(melanoma,agegrp=="75+"))
plot(afit75, col=1:4,
     xlab = "Follow-up Time",
     ylab = "Survival",
     main = "Kaplan-Meier survival estimates\nAll-cause | Age 75+")
legend("topright", levels(melanoma$stage), col=1:4, lty = 1)

(h)

Now estimate both cancer-specific and all-cause survival for each age group.

par(mfrow=c(1, 2))
mfitage <- survfit(Surv(surv_mm, death_cancer) ~ agegrp, data = melanoma)
plot(mfitage, col=1:4,
     xlab = "Follow-up Time",
     ylab = "Survival",
     main = "Kaplan-Meier estimates of\ncancer survival by age group")
legend("topright", levels(melanoma$agegrp), col=1:4, lty = 1)

afitage <- survfit(Surv(surv_mm, death_all) ~ agegrp, data = melanoma)
plot(afitage, col=1:4,
     xlab = "Follow-up Time",
     ylab = "Survival",
     main = "Kaplan-Meier estimates of\nall-cause survival by age group")
legend("topright", levels(melanoma$agegrp), col=1:4, lty = 1)