Example 8.2: Bone marrow transplantation in the treatment of leukaemia
We use the following three libraries:
The following data were used in the analysis; see Table 8.2 in Collett (2023).
Note that the column ptime is actually a string, with "." as missing values:
We can read in those values using as.numeric, but then we would get a warning with the missing values. As a better approach, we convert the string values using type.convert and make the missing values large (Inf would be a nice approach, but that causes problems later):
Collett then goes on to show that the age of the patient (page) and age the age of the blood donor (dage) do not significantly predict patient death. Confirm this observation using the following code. Explain the following the code and explain why we have done this two ways.
Time-dependent modelling using tt
We can model the time-dependent effect of platelets returning to normal using the tt argument in coxph. Interpret the following model, its summary and its anova.
Collett also describes adding group to the analysis. Interpret the following model, its summary and its anova.
Unfortunately, we are not able to use survfit with a coxph object with a tt term. Run the following code to confirm this observation:
Time-dependent modelling by hand
We split the time before and after platelets return to normal. This could be implemented a number of ways, including using survival::survSplit or using Epi::splitLexis, or using data.table or dplyr. In our implementation, we have first taken the time intervals that start at time 0, and then combined those time intervals with the time intervals that start at ptime when ptime<time.
We can now using the counting process notation for Surv to model the different time segments. Are these results similar to the results using tt?
Using these data, we are now able to use survfit to predict survival assuming that platelets do not return to normal (pind=0) or that platelets return to normal at t=0 (pind=1):
Factually, do any observations start with platelets at normal? What does this imply about the predictions for survival for those with pind=1?
Extension: survival predictions given a time-varying exposure
The previous predictions where for either no platelet recovery or immediate platelet recovery. We may be interested in estimating survival with platelet recovery at a specific time. We are not aware of an R package for these calculations. We present some code for doing this – assuming platelet recovery at 100 days.
Compare with this our earlier approach of never versus immediate platelet recovery:
For interval estimation, we present an approach using the bootstrap. Note that we resample for patients rather than rows.
Based on the warnings, all did not go to plan:(. The output is as follows:
Consider the following plot. What does this suggest to you?