In this vignette we are going to show with an example how to use this package.
We are going to consider a data set with time to event data and 10 categorical covariates. These categorical covariates define 25 subgroups. We are interested in estimating the subgroup treatment effect (in this case the subgroup hazard ratio) of each one of these 25 subgroups. Here we will show an example where we only use 2 categorical covariates in order to save run time. To do so we are going to use all the methods available in this package and we are going to compare their results using a forest plot.
As it was mentioned before we are going to use survival data as an
example of how to use the package. In our data (that should be of class
data.frame) we should have columns with the following
variables:
Treatment: factor variable with two levels (the first level represents the control group and the second one the treatment group). It is important to make sure that this variable is a factor and that the levels are in the correct order.
Response: in the case of survival data this should be a numeric variable with the survival time.
Status: in the case of survival data this variable should be present and is a numeric variable with 0 and 1 indicating whether the observation was censored or not.
Categorical variables: factor variables with information about the individuals. The levels of these variables are going to be the subgroups that we are interested in studying.
In our case we have the following structure of the data:
str(example_data)
#> 'data.frame':    1000 obs. of  14 variables:
#>  $ id    : int  1 2 3 4 5 6 7 8 9 10 ...
#>  $ arm   : Factor w/ 2 levels "0","1": 1 1 1 1 2 2 2 2 2 1 ...
#>  $ x_1   : Factor w/ 2 levels "a","b": 2 2 1 1 2 1 1 2 2 1 ...
#>  $ x_2   : Factor w/ 2 levels "a","b": 1 2 1 1 1 2 2 1 2 2 ...
#>  $ x_3   : Factor w/ 2 levels "a","b": 2 1 2 1 2 2 2 1 2 2 ...
#>  $ x_4   : Factor w/ 3 levels "a","b","c": 2 1 3 3 3 3 3 1 3 3 ...
#>  $ x_5   : Factor w/ 4 levels "a","b","c","d": 4 4 1 1 4 1 3 4 4 3 ...
#>  $ x_6   : Factor w/ 2 levels "a","b": 2 1 2 2 2 1 2 2 2 2 ...
#>  $ x_7   : Factor w/ 2 levels "a","b": 2 1 2 2 2 1 1 2 2 2 ...
#>  $ x_8   : Factor w/ 3 levels "a","b","c": 3 2 1 3 3 3 1 1 2 3 ...
#>  $ x_9   : Factor w/ 2 levels "a","b": 2 1 2 2 2 1 1 2 2 2 ...
#>  $ x_10  : Factor w/ 3 levels "a","b","c": 3 3 3 3 3 1 2 2 2 3 ...
#>  $ tt_pfs: num  0.9795 3.4762 1.7947 0.0197 2.2168 ...
#>  $ ev_pfs: num  1 0 1 1 0 0 0 0 0 0 ...We have that arm is our treatment variable,
x_1 to x_10 are the categorical covariates,
tt_pfs is the response variable and ev_pfs is
the status variable.
Once that we are sure that our data is in the correct format and contains all the necessary variables, we are going to fit the different models in order to obtain the subgroup treatment effects.
Let’s start by fitting the model that will lead to an overall treatment effect estimation.
naivepop_model <- naivepop(
  resp = "tt_pfs",
  trt = "arm",
  data = example_data,
  resptype = "survival",
  status = "ev_pfs"
)This new naivepop object contains the fitted model, the
kind of model that was fitted, the response type of the data and the
data.
We can take the summary of this object to obtain the overall treatment effect estimate (in this case the overall hazard ratio).
Now we are going to fit the model to obtain the naive
subgroup-specific treatment effects. We have to indicate which
categorical variables we want to consider to obtain the subgroup
treatment effects. If we add variable x_1 we are going to
obtain the subgroup treatment effect of the subgroups x_1a
and x_1b. We do the same for x_2.
naive_model <- naive(
  resp = "tt_pfs", trt = "arm",
  subgr = c("x_1", "x_2"),
  data = example_data, resptype = "survival",
  status = "ev_pfs"
)This naive object contains the fitted models for each
one of the subgroups, the main information about the coefficients
associated to treatment of these fitted models, the kind of models
fitted, the response type and the data.
We can take the summary of this object to obtain the subgroup treatment effects. We can also include a value for the confidence level in order to obtain confidence intervals for these subgroup treatment effect estimates. By default this confidence level is of 95%.
summary_naive <- summary(naive_model, conf = 0.90)
summary_naive
#>   subgroup trt.estimate   trt.low  trt.high
#> 1     x_1a    0.6741880 0.4975147 0.9135999
#> 2     x_1b    0.6645867 0.4936858 0.8946489
#> 3     x_2a    0.7087873 0.5060265 0.9927928
#> 4     x_2b    0.6424496 0.4886396 0.8446746We can add a forest plot with the estimated treatment effects:
We are going to fit a model considering an elastic net penalization
on the subgroup treatment interaction coefficients. Depending on the
value of alpha we are going to have different kinds of
penalties. If we put alpha to 0 we consider a ridge penalty
and if we put alpha to 1 we consider a lasso penalty. We
are going to fit both lasso and ridge.
We have to add the covars argument which indicates which
categorical variables we want to include in our model. It is important
that all the variables that are in subgr are also in
covars. The idea is that we can include many variables but
then only find the subgroup treatment effect of some of them.
ridge_model <- elastic_net(
  resp = "tt_pfs", trt = "arm",
  subgr = c("x_1", "x_2"),
  covars = c(
    "x_1", "x_2", "x_3", "x_4", "x_5",
    "x_6", "x_7", "x_8", "x_9", "x_10"
  ),
  data = example_data, resptype = "survival",
  alpha = 0, status = "ev_pfs"
)
lasso_model <- elastic_net(
  resp = "tt_pfs", trt = "arm",
  subgr = c("x_1", "x_2"),
  covars = c(
    "x_1", "x_2", "x_3", "x_4", "x_5",
    "x_6", "x_7", "x_8", "x_9", "x_10"
  ),
  data = example_data, resptype = "survival",
  alpha = 1, status = "ev_pfs"
)These elastic_net models contain the fitted models, the
response type, the data, the value of alpha, the design and
the dummy matrices (that are later going to be used to obtain the
subgroup treatment effects), the response and status variables and the
names of the subgroups.
We are now going to obtain the summary of these fitted objects to find the subgroup hazard ratio estimates.
summary_ridge <- summary(ridge_model)
summary_ridge
#>   subgroup trt.estimate
#> 1     x_1a    0.6502032
#> 2     x_1b    0.6494442
#> 3     x_2a    0.6495571
#> 4     x_2b    0.6502464
summary_lasso <- summary(lasso_model)
summary_lasso
#>   subgroup trt.estimate
#> 1     x_1a    0.6500483
#> 2     x_1b    0.6493824
#> 3     x_2a    0.6493749
#> 4     x_2b    0.6501893We can obtain a forest plot for each one of these fitted models:
We are now going to fit a Bayesian model with a horseshoe prior on
the subgroup-treatment interactions. Fitting this kind of models usually
takes a bit of time. We can modify some parameters like the number of
Markov chains, the number of iterations or the number of warmup
iterations (between others). The parameters that we can change are found
in the documentation of the brm function from the brms
package.
horseshoe_model <- horseshoe(
  resp = "tt_pfs", trt = "arm",
  subgr = c("x_1", "x_2"),
  covars = c(
    "x_1", "x_2", "x_3", "x_4", "x_5",
    "x_6", "x_7", "x_8", "x_9", "x_10"
  ),
  data = example_data,
  resptype = "survival",
  status = "ev_pfs",
  chains = 2,
  seed = 0,
  iter = 1000,
  warmup = 800,
  control = list(adapt_delta = 0.95)
)Once that the model is fitted we have to check if there are
convergence problems. We might get divergent transitions after warmup.
In general if there are few divergent transitions (taking into account
the total number of iterations) and there are no other problems like
high Rhat values we can continue with our analysis.
We can obtain a summary of the posterior distributions of the coefficients of the fitted model:
horseshoe_model$fit
#>  Family: cox 
#>   Links: mu = log 
#> Formula: tt_pfs | cens(1 - ev_pfs) ~ a + b 
#>          a ~ 0 + arm0 + arm1 + x_1b + x_2b + x_3b + x_4b + x_4c + x_5b + x_5c + x_5d + x_6b + x_7b + x_8b + x_8c + x_9b + x_10b + x_10c
#>          b ~ 0 + x_1a_arm + x_1b_arm + x_2a_arm + x_2b_arm + x_3a_arm + x_3b_arm + x_4a_arm + x_4b_arm + x_4c_arm + x_5a_arm + x_5b_arm + x_5c_arm + x_5d_arm + x_6a_arm + x_6b_arm + x_7a_arm + x_7b_arm + x_8a_arm + x_8b_arm + x_8c_arm + x_9a_arm + x_9b_arm + x_10a_arm + x_10b_arm + x_10c_arm
#>    Data: data_model (Number of observations: 1000) 
#>   Draws: 2 chains, each with iter = 1000; warmup = 800; thin = 1;
#>          total post-warmup draws = 400
#> 
#> Regression Coefficients:
#>             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> a_arm0         -0.28      0.41    -1.06     0.56 1.04       95      102
#> a_arm1         -0.76      0.54    -1.83     0.29 1.03       97      118
#> a_x_1b         -0.08      0.15    -0.35     0.18 1.02      347      254
#> a_x_2b          0.07      0.13    -0.20     0.32 1.00      531      313
#> a_x_3b         -0.32      0.17    -0.64     0.01 1.01      382      270
#> a_x_4b         -0.27      0.18    -0.64     0.07 1.00      351      282
#> a_x_4c         -0.56      0.19    -0.94    -0.21 1.01      340      270
#> a_x_5b          0.04      0.29    -0.44     0.80 1.00      152       88
#> a_x_5c         -0.13      0.25    -0.55     0.36 1.01      143       93
#> a_x_5d         -0.22      0.23    -0.60     0.30 1.00      137       82
#> a_x_6b          0.20      0.15    -0.12     0.51 1.00      313      262
#> a_x_7b          0.07      0.16    -0.25     0.35 1.00      332      312
#> a_x_8b          0.00      0.20    -0.39     0.37 1.00      289      205
#> a_x_8c         -0.01      0.18    -0.36     0.38 1.00      247      130
#> a_x_9b          0.10      0.20    -0.27     0.45 1.01      303      237
#> a_x_10b         0.18      0.22    -0.24     0.57 1.00      403      365
#> a_x_10c         0.15      0.20    -0.23     0.51 1.00      360      299
#> b_x_1a_arm     -0.00      0.10    -0.20     0.19 1.00      457      316
#> b_x_1b_arm     -0.01      0.11    -0.29     0.20 1.01      473      363
#> b_x_2a_arm      0.01      0.09    -0.16     0.25 1.00      325      281
#> b_x_2b_arm     -0.01      0.10    -0.27     0.18 1.00      435      215
#> b_x_3a_arm     -0.05      0.15    -0.43     0.09 1.02      270      225
#> b_x_3b_arm      0.04      0.13    -0.09     0.43 1.01      325      352
#> b_x_4a_arm     -0.02      0.10    -0.33     0.16 1.00      381      230
#> b_x_4b_arm     -0.02      0.13    -0.35     0.24 1.00      435      367
#> b_x_4c_arm      0.04      0.12    -0.10     0.42 1.01      324      255
#> b_x_5a_arm      0.10      0.25    -0.10     0.92 1.00      130       94
#> b_x_5b_arm     -0.07      0.19    -0.68     0.12 1.01      293      226
#> b_x_5c_arm     -0.03      0.10    -0.33     0.13 1.00      453      377
#> b_x_5d_arm      0.01      0.10    -0.26     0.22 1.00      515      366
#> b_x_6a_arm      0.02      0.11    -0.15     0.30 1.00      412      306
#> b_x_6b_arm     -0.03      0.11    -0.35     0.17 1.00      384      315
#> b_x_7a_arm     -0.07      0.16    -0.58     0.08 1.01      262      223
#> b_x_7b_arm      0.05      0.15    -0.13     0.55 1.01      270      220
#> b_x_8a_arm     -0.03      0.11    -0.41     0.13 1.00      420      335
#> b_x_8b_arm     -0.01      0.11    -0.28     0.19 1.02      336      317
#> b_x_8c_arm      0.04      0.12    -0.10     0.39 1.00      251      249
#> b_x_9a_arm     -0.04      0.14    -0.48     0.12 1.01      226      258
#> b_x_9b_arm      0.04      0.16    -0.19     0.54 1.01      418      308
#> b_x_10a_arm    -0.01      0.10    -0.26     0.22 1.00      440      313
#> b_x_10b_arm     0.00      0.08    -0.18     0.20 1.00      249      364
#> b_x_10c_arm    -0.02      0.10    -0.28     0.14 1.01      515      341
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).Apart from the fitted model this horseshoe object also
contains the data, the response, the design and dummy matrices, the kind
of response and the subgroup names.
We are now going to call summary of this object to obtain the subgroup hazard ratio estimates. With this summary we are also going to obtain the samples of the approximate posterior distribution of the subgroup hazard ratios. The estimates are just the median of this approximate posterior distribution. We should select a confidence level in order to obtain credible intervals for the subgroup treatment effects. The default confidence level is 95%.
summary_horseshoe <- summary(horseshoe_model, conf = 0.9)
summary_horseshoe
#>   subgroup trt.estimate   trt.low  trt.high
#> 1     x_1a    0.6548058 0.5396242 0.8285532
#> 2     x_1b    0.6555175 0.5250172 0.7978729
#> 3     x_2a    0.6527021 0.5215938 0.8296019
#> 4     x_2b    0.6479122 0.5427992 0.7976948We can obtain a forest plot with the treatment effect estimates and the credible intervals.
A last useful thing that we can do is to compare the different treatment effect estimates. For that we are first going to generate a data set with all the estimated hazard ratios and then we are going to plot all of them in a common forest plot.
comparison_data <- compare(naivepop_model, naive_model, ridge_model, lasso_model, horseshoe_model)
comparison_data
#> $data
#>    subgroup trt.estimate   trt.low  trt.high                 model
#> 1      x_1a    0.6741880 0.4693783 0.9683648                 Naive
#> 2      x_1b    0.6645867 0.4663568 0.9470762                 Naive
#> 3      x_2a    0.7087873 0.4743926 1.0589952                 Naive
#> 4      x_2b    0.6424496 0.4636818 0.8901394                 Naive
#> 5      x_1a    0.6502032 0.6502032 0.6502032 Elastic net alpha = 0
#> 6      x_1b    0.6494442 0.6494442 0.6494442 Elastic net alpha = 0
#> 7      x_2a    0.6495571 0.6495571 0.6495571 Elastic net alpha = 0
#> 8      x_2b    0.6502464 0.6502464 0.6502464 Elastic net alpha = 0
#> 9      x_1a    0.6500483 0.6500483 0.6500483 Elastic net alpha = 1
#> 10     x_1b    0.6493824 0.6493824 0.6493824 Elastic net alpha = 1
#> 11     x_2a    0.6493749 0.6493749 0.6493749 Elastic net alpha = 1
#> 12     x_2b    0.6501893 0.6501893 0.6501893 Elastic net alpha = 1
#> 13     x_1a    0.6548058 0.5217772 0.8529706             Horseshoe
#> 14     x_1b    0.6555175 0.5070357 0.8349754             Horseshoe
#> 15     x_2a    0.6527021 0.5113236 0.9065121             Horseshoe
#> 16     x_2b    0.6479122 0.5240480 0.8121235             Horseshoe
#> 
#> $overall_trt
#> [1] 0.666919
#> 
#> $resptype
#> [1] "survival"
#> 
#> attr(,"class")
#> [1] "compare.data"Now we plot all the estimated subgroup hazard ratios and we add a vertical line indicating the value of the overall hazard ratio.
In the case of having survival data the procedure would be analogous
but instead of having survival resptype we would have
binary. Also the response variable should be a numeric variable with 1
and 0 and there would be no status variable.