Title: | Mixture Cure Models with Auxiliary Subgroup Survival Probabilities |
---|---|
Description: | Estimate mixture cure models with subgroup survival probabilities as auxiliary information. A reference of the underlying methods is Jie Ding, Jialiang Li and Xiaoguang Wang (2024) <doi:10.1093/jrsssc/qlad106>. |
Authors: | Jie Ding [aut, cre] , Jialiang Li [aut] , Mengxiu Zhang [aut] , Xiaoguang Wang [aut] |
Maintainer: | Jie Ding <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.0.1 |
Built: | 2024-11-04 06:05:41 UTC |
Source: | https://github.com/biostat-jieding/cureauxsp |
This package provides an information synthesis framework that can Estimate mixture cure models with subgroup survival probabilities as auxiliary information. The underlying methods are based on the paper titled "Efficient auxiliary information synthesis for cure rate model", which has been published in Jie Ding, Jialiang Li and Xiaoguang Wang (2024) doi:10.1093/jrsssc/qlad106. project work
Output of SMC.AuxSP
object.
Print.SMC.AuxSP(object)
Print.SMC.AuxSP(object)
object |
an object of SMC.AuxSP |
This function has no return value and is used to print the results in a SMC.AuxSP
class with a better presentation.
Calculate Subgroup survival probabilities basedon the Kaplan-Meier estimation procedure
Probs.Sub(tstar, sdata, G)
Probs.Sub(tstar, sdata, G)
tstar |
time points that the survival probabilities will be estimated at. |
sdata |
a survival dataset (dataframe) in which to interpret the variables named in the |
G |
a matrix used to indicate which subgroups he/she belongs to for each of these subjects. |
It returns the estimated subgroup survival probabilities for a given survival dataset.
Generate simulated dataset from a well-designed PH mixture cure model.
sdata.SMC(n, trace = FALSE)
sdata.SMC(n, trace = FALSE)
n |
the sample size of the simulated dataset. |
trace |
a logical value that indicates whether the information about cure rate and censoring rate should be printed. |
It returns the simulated dataset from a well-designed PH mixture cure model.
Fit the semi-parametric mixture cure model with auxiliary subgroup survival probability information based on the control variate technique.
SMC.AuxSP( formula, cureform, sdata, aux = NULL, hetero = FALSE, N = Inf, latency = "PH", nboot = 400 )
SMC.AuxSP( formula, cureform, sdata, aux = NULL, hetero = FALSE, N = Inf, latency = "PH", nboot = 400 )
formula |
a formula expression, of the form |
cureform |
indicator function a formula expression, of the form |
sdata |
a survival dataset (dataframe) in which to interpret the variables named in the |
aux |
indicates the historical aggregated statistics. It is a list of lists, and each sub-list represents auxiliary information from the same time point.
We combine multiple time points together and each time point contains the following four elements
|
hetero |
denotes a logical value. If it is |
N |
records the sample size of the external sdata that we extract auxiliary information from.
The default value is |
latency |
specifies the model used in latency part.
It can be |
nboot |
specifies the number of bootstrap sampling. The default |
This is a function used to fit the semiparametric mixture cure model with auxilary subgrop survival information. The method used here is the control variate technique. The test statistic for evaluating the homogeneity assumption will also be calculated.
An object of class SMC.AuxSP
is returned.
Specifically, it contains: model
, a description of the model we fit; suffix
, a character that indicates the status of auxiliary information; coefficients
, estimated parameters.
For more friendly presentation, we provide a function Print.SMC.AuxSP()
that can examine this newly defined class.
#--------------------------------------------------------------------# # illustration via simulated dataset (from PH mixture cure model) #### #--------------------------------------------------------------------# ## library library(survival) library(CureAuxSP) ## generate both the internal dataset of interest and the external dataset # - the internal dataset set.seed(1) sdata.internal <- sdata.SMC(n = 300) head(sdata.internal) # - the external dataset set.seed(1) sdata.external <- sdata.SMC(n = 10000) ## prepare the auxiliary information based on the external dataset # - define two functions for subgroup splitting gfunc.t1 <- function(X,Z=NULL){ rbind((X[,1] < 0 & X[,2] == 0), (X[,1] >= 0 & X[,2] == 0), (X[,1] < 0 & X[,2] == 1), (X[,1] >= 0 & X[,2] == 1))} gfunc.t2 <- function(X,Z=NULL){rbind((X[,2] == 0), (X[,2] == 1))} # - calculate subgroup survival rates sprob.t1 <- Probs.Sub(tstar = 1, sdata = sdata.external, G = gfunc.t1(X = sdata.external[,-c(1,2)])) sprob.t2 <- Probs.Sub(tstar = 2, sdata = sdata.external, G = gfunc.t2(X = sdata.external[,-c(1,2)])) cat("Information at t* = 1:", sprob.t1, "\nInformation at t* = 2:", sprob.t2) # - prepare the set that collects information about auxiliary data aux <- list( time1 = list(tstar = 1, gfunc = gfunc.t1, sprob = c(0.73,0.70,0.88,0.83)), time2 = list(tstar = 2, gfunc = gfunc.t2, sprob = c(0.62,0.76)-0.20) ) ## fit the model without auxiliary information set.seed(1) sol.PHMC <- SMC.AuxSP( formula = Surv(yobs,delta) ~ X1 + X2, cureform = ~ X1, sdata = sdata.internal, aux = NULL, latency = "PH" ) Print.SMC.AuxSP(object = sol.PHMC) ## fit the model with auxiliary information # - ignore heterogeneity set.seed(1) sol.PHMC.Homo <- SMC.AuxSP( formula = Surv(yobs,delta) ~ X1 + X2, cureform = ~ X1, sdata = sdata.internal, aux = aux, hetero = FALSE, latency = "PH" ) Print.SMC.AuxSP(object = sol.PHMC.Homo) # - consider heterogeneity set.seed(1) sol.PHMC.Hetero <- SMC.AuxSP( formula = Surv(yobs,delta) ~ X1 + X2, cureform = ~ X1, sdata = sdata.internal, aux = aux, hetero = TRUE, latency = "PH" ) Print.SMC.AuxSP(object = sol.PHMC.Hetero) #--------------------------------------------------------------------# # illustration via real breast cancer dataset (from TCGA program) #### # - the R package "TCGAbiolinks" should be downloaded in advance # - see "10.18129/B9.bioc.TCGAbiolinks" for more help #--------------------------------------------------------------------# ## library library(survival) library(CureAuxSP) ## prepare the breast cancer dataset # - download clinical data from the TCGA website query <- TCGAbiolinks::GDCquery(project = "TCGA-BRCA", data.category = "Clinical", file.type = "xml") TCGAbiolinks::GDCdownload(query) clinical <- TCGAbiolinks::GDCprepare_clinic(query, clinical.info = "patient") # - a preparation sdata.pre <- data.frame( yobs = ifelse(!is.na(clinical[,'days_to_death']),clinical[,'days_to_death'], clinical[,'days_to_last_followup'])/365, delta = ifelse(!is.na(clinical[,'days_to_death']),1,0), ER = ifelse( clinical[,'breast_carcinoma_estrogen_receptor_status']=='Positive',1, ifelse(clinical[,'breast_carcinoma_estrogen_receptor_status']=='Negative',0,NA)), Age = clinical[,'age_at_initial_pathologic_diagnosis'], Race = ifelse(clinical[,'race_list']=='BLACK OR AFRICAN AMERICAN','black', ifelse(clinical[,'race_list']=='WHITE','white','other')), Gender = ifelse(clinical[,'gender']=='FEMALE','Female', ifelse(clinical[,'gender']=='MALE','Male',NA)), Stage = sapply( clinical[,'stage_event_pathologic_stage'], function(x,pattern='Stage X|Stage IV|Stage [I]*'){ ifelse(grepl(pattern,x),regmatches(x,regexpr(pattern,x)),NA)},USE.NAMES = FALSE) ) # - extract covariates and remove undesiable subjects and NA sdata.TCGA <- na.omit( sdata.pre[ sdata.pre[,'yobs'] > 0 & sdata.pre[,'Age'] <= 75 & sdata.pre[,'Gender'] == "Female" & sdata.pre[,'Race'] %in% c('white') & sdata.pre[,'Stage'] %in% c('Stage I','Stage II','Stage III'), c('yobs','delta','Age','ER'), ] ) rownames(sdata.TCGA) <- NULL # - summary statistics of the internal dataset summary(sdata.TCGA) ## plot a figure to show the existence of a cure fraction # pdf("Figure_KM_TCGA_BRCA.pdf",width=8.88,height=6.66); { plot( survival::survfit(survival::Surv(yobs, delta) ~ 1, data = sdata.TCGA), conf.int = T, mark.time = TRUE, lwd = 2, ylab = "Survival Probability", xlab = "Survival Time (in Years)", xlim = c(0,25), ylim = c(0,1) ) # }; dev.off() ## fit the model without auxiliary information # - rescale the Age variable Age.Min <- min(sdata.TCGA$Age); Age.Max <- max(sdata.TCGA$Age) sdata.TCGA$Age <- (sdata.TCGA$Age-Age.Min)/(Age.Max-Age.Min) # - fit the model set.seed(1) sol.PHMC <- SMC.AuxSP( formula = Surv(yobs,delta) ~ Age + ER, cureform = ~ Age + ER, sdata = sdata.TCGA, aux = NULL, latency = "PH" ) Print.SMC.AuxSP(object = sol.PHMC) ## fit the model with auxiliary information # - prepare the auxiliary information Age.Cut <- c(0,(c(40,50,60)-Age.Min)/(Age.Max-Age.Min),1) gfunc.t1 <- function(X,Z){ rbind((X[,1] >= Age.Cut[1] & X[,1] < Age.Cut[2] & X[,2] == 1), (X[,1] >= Age.Cut[2] & X[,1] < Age.Cut[3] & X[,2] == 1), (X[,1] >= Age.Cut[3] & X[,1] < Age.Cut[4] & X[,2] == 1), (X[,1] >= Age.Cut[4] & X[,1] <= Age.Cut[5] & X[,2] == 1), (X[,1] >= Age.Cut[1] & X[,1] < Age.Cut[2] & X[,2] == 0), (X[,1] >= Age.Cut[2] & X[,1] < Age.Cut[3] & X[,2] == 0), (X[,1] >= Age.Cut[3] & X[,1] < Age.Cut[4] & X[,2] == 0), (X[,1] >= Age.Cut[4] & X[,1] <= Age.Cut[5] & X[,2] == 0))} gfunc.t2 <- function(X,Z){rbind((X[,2] == 1), (X[,2] == 0))} aux <- list( time1 = list(tstar = 5, gfunc = gfunc.t1, sprob = c(0.810,0.935,0.925,0.950,0.695,0.780,0.830,0.850)), time2 = list(tstar = 10, gfunc = gfunc.t2, sprob = c(0.825,0.705)) ) # - ignore heterogeneity set.seed(1) sol.PHMC.Homo <- SMC.AuxSP( formula = Surv(yobs,delta) ~ Age + ER, cureform = ~ Age + ER, sdata = sdata.TCGA, aux = aux, hetero = FALSE, N = 1910, latency = "PH" ) Print.SMC.AuxSP(object = sol.PHMC.Homo) # - consider heterogeneity set.seed(1) sol.PHMC.Hetero <- SMC.AuxSP( formula = Surv(yobs,delta) ~ Age + ER, cureform = ~ Age + ER, sdata = sdata.TCGA, aux = aux, hetero = TRUE, N = 1910, latency = "PH" ) Print.SMC.AuxSP(object = sol.PHMC.Hetero)
#--------------------------------------------------------------------# # illustration via simulated dataset (from PH mixture cure model) #### #--------------------------------------------------------------------# ## library library(survival) library(CureAuxSP) ## generate both the internal dataset of interest and the external dataset # - the internal dataset set.seed(1) sdata.internal <- sdata.SMC(n = 300) head(sdata.internal) # - the external dataset set.seed(1) sdata.external <- sdata.SMC(n = 10000) ## prepare the auxiliary information based on the external dataset # - define two functions for subgroup splitting gfunc.t1 <- function(X,Z=NULL){ rbind((X[,1] < 0 & X[,2] == 0), (X[,1] >= 0 & X[,2] == 0), (X[,1] < 0 & X[,2] == 1), (X[,1] >= 0 & X[,2] == 1))} gfunc.t2 <- function(X,Z=NULL){rbind((X[,2] == 0), (X[,2] == 1))} # - calculate subgroup survival rates sprob.t1 <- Probs.Sub(tstar = 1, sdata = sdata.external, G = gfunc.t1(X = sdata.external[,-c(1,2)])) sprob.t2 <- Probs.Sub(tstar = 2, sdata = sdata.external, G = gfunc.t2(X = sdata.external[,-c(1,2)])) cat("Information at t* = 1:", sprob.t1, "\nInformation at t* = 2:", sprob.t2) # - prepare the set that collects information about auxiliary data aux <- list( time1 = list(tstar = 1, gfunc = gfunc.t1, sprob = c(0.73,0.70,0.88,0.83)), time2 = list(tstar = 2, gfunc = gfunc.t2, sprob = c(0.62,0.76)-0.20) ) ## fit the model without auxiliary information set.seed(1) sol.PHMC <- SMC.AuxSP( formula = Surv(yobs,delta) ~ X1 + X2, cureform = ~ X1, sdata = sdata.internal, aux = NULL, latency = "PH" ) Print.SMC.AuxSP(object = sol.PHMC) ## fit the model with auxiliary information # - ignore heterogeneity set.seed(1) sol.PHMC.Homo <- SMC.AuxSP( formula = Surv(yobs,delta) ~ X1 + X2, cureform = ~ X1, sdata = sdata.internal, aux = aux, hetero = FALSE, latency = "PH" ) Print.SMC.AuxSP(object = sol.PHMC.Homo) # - consider heterogeneity set.seed(1) sol.PHMC.Hetero <- SMC.AuxSP( formula = Surv(yobs,delta) ~ X1 + X2, cureform = ~ X1, sdata = sdata.internal, aux = aux, hetero = TRUE, latency = "PH" ) Print.SMC.AuxSP(object = sol.PHMC.Hetero) #--------------------------------------------------------------------# # illustration via real breast cancer dataset (from TCGA program) #### # - the R package "TCGAbiolinks" should be downloaded in advance # - see "10.18129/B9.bioc.TCGAbiolinks" for more help #--------------------------------------------------------------------# ## library library(survival) library(CureAuxSP) ## prepare the breast cancer dataset # - download clinical data from the TCGA website query <- TCGAbiolinks::GDCquery(project = "TCGA-BRCA", data.category = "Clinical", file.type = "xml") TCGAbiolinks::GDCdownload(query) clinical <- TCGAbiolinks::GDCprepare_clinic(query, clinical.info = "patient") # - a preparation sdata.pre <- data.frame( yobs = ifelse(!is.na(clinical[,'days_to_death']),clinical[,'days_to_death'], clinical[,'days_to_last_followup'])/365, delta = ifelse(!is.na(clinical[,'days_to_death']),1,0), ER = ifelse( clinical[,'breast_carcinoma_estrogen_receptor_status']=='Positive',1, ifelse(clinical[,'breast_carcinoma_estrogen_receptor_status']=='Negative',0,NA)), Age = clinical[,'age_at_initial_pathologic_diagnosis'], Race = ifelse(clinical[,'race_list']=='BLACK OR AFRICAN AMERICAN','black', ifelse(clinical[,'race_list']=='WHITE','white','other')), Gender = ifelse(clinical[,'gender']=='FEMALE','Female', ifelse(clinical[,'gender']=='MALE','Male',NA)), Stage = sapply( clinical[,'stage_event_pathologic_stage'], function(x,pattern='Stage X|Stage IV|Stage [I]*'){ ifelse(grepl(pattern,x),regmatches(x,regexpr(pattern,x)),NA)},USE.NAMES = FALSE) ) # - extract covariates and remove undesiable subjects and NA sdata.TCGA <- na.omit( sdata.pre[ sdata.pre[,'yobs'] > 0 & sdata.pre[,'Age'] <= 75 & sdata.pre[,'Gender'] == "Female" & sdata.pre[,'Race'] %in% c('white') & sdata.pre[,'Stage'] %in% c('Stage I','Stage II','Stage III'), c('yobs','delta','Age','ER'), ] ) rownames(sdata.TCGA) <- NULL # - summary statistics of the internal dataset summary(sdata.TCGA) ## plot a figure to show the existence of a cure fraction # pdf("Figure_KM_TCGA_BRCA.pdf",width=8.88,height=6.66); { plot( survival::survfit(survival::Surv(yobs, delta) ~ 1, data = sdata.TCGA), conf.int = T, mark.time = TRUE, lwd = 2, ylab = "Survival Probability", xlab = "Survival Time (in Years)", xlim = c(0,25), ylim = c(0,1) ) # }; dev.off() ## fit the model without auxiliary information # - rescale the Age variable Age.Min <- min(sdata.TCGA$Age); Age.Max <- max(sdata.TCGA$Age) sdata.TCGA$Age <- (sdata.TCGA$Age-Age.Min)/(Age.Max-Age.Min) # - fit the model set.seed(1) sol.PHMC <- SMC.AuxSP( formula = Surv(yobs,delta) ~ Age + ER, cureform = ~ Age + ER, sdata = sdata.TCGA, aux = NULL, latency = "PH" ) Print.SMC.AuxSP(object = sol.PHMC) ## fit the model with auxiliary information # - prepare the auxiliary information Age.Cut <- c(0,(c(40,50,60)-Age.Min)/(Age.Max-Age.Min),1) gfunc.t1 <- function(X,Z){ rbind((X[,1] >= Age.Cut[1] & X[,1] < Age.Cut[2] & X[,2] == 1), (X[,1] >= Age.Cut[2] & X[,1] < Age.Cut[3] & X[,2] == 1), (X[,1] >= Age.Cut[3] & X[,1] < Age.Cut[4] & X[,2] == 1), (X[,1] >= Age.Cut[4] & X[,1] <= Age.Cut[5] & X[,2] == 1), (X[,1] >= Age.Cut[1] & X[,1] < Age.Cut[2] & X[,2] == 0), (X[,1] >= Age.Cut[2] & X[,1] < Age.Cut[3] & X[,2] == 0), (X[,1] >= Age.Cut[3] & X[,1] < Age.Cut[4] & X[,2] == 0), (X[,1] >= Age.Cut[4] & X[,1] <= Age.Cut[5] & X[,2] == 0))} gfunc.t2 <- function(X,Z){rbind((X[,2] == 1), (X[,2] == 0))} aux <- list( time1 = list(tstar = 5, gfunc = gfunc.t1, sprob = c(0.810,0.935,0.925,0.950,0.695,0.780,0.830,0.850)), time2 = list(tstar = 10, gfunc = gfunc.t2, sprob = c(0.825,0.705)) ) # - ignore heterogeneity set.seed(1) sol.PHMC.Homo <- SMC.AuxSP( formula = Surv(yobs,delta) ~ Age + ER, cureform = ~ Age + ER, sdata = sdata.TCGA, aux = aux, hetero = FALSE, N = 1910, latency = "PH" ) Print.SMC.AuxSP(object = sol.PHMC.Homo) # - consider heterogeneity set.seed(1) sol.PHMC.Hetero <- SMC.AuxSP( formula = Surv(yobs,delta) ~ Age + ER, cureform = ~ Age + ER, sdata = sdata.TCGA, aux = aux, hetero = TRUE, N = 1910, latency = "PH" ) Print.SMC.AuxSP(object = sol.PHMC.Hetero)