Help for package Greg

Version:

2.0.3

Title:

Regression Helper Functions

Maintainer:

Max Gordon <max@gforge.se>

Description:

Methods for manipulating regression models and for describing these in a style adapted for medical journals. Contains functions for generating an HTML table with crude and adjusted estimates, plotting hazard ratio, plotting model estimates and confidence intervals using forest plots, extending this to comparing multiple models in a single forest plots. In addition to the descriptive methods, there are functions for the robust covariance matrix provided by the 'sandwich' package, a function for adding non-linearities to a model, and a wrapper around the 'Epi' package's Lexis() functions for time-splitting a dataset when modeling non-proportional hazards in Cox regressions.

License:

GPL (≥ 3)

URL:

http://gforge.se

BugReports:

https://github.com/gforge/Greg/issues

Biarch:

yes

Encoding:

UTF-8

Imports:

broom, Epi, dplyr, glue, graphics, grDevices, htmlTable (≥ 2.0.0), Hmisc, knitr, methods, nlme, purrr, rlang, rms, sandwich, stats, stringr, tibble, tidyr, tidyselect, utils

Depends:

Gmisc (≥ 1.0.3), forestplot, R (≥ 4.1.0)

Suggests:

boot, testthat, cmprsk, survival, ggplot2, parallel, rmarkdown, rmeta, tidyverse

VignetteBuilder:

knitr

RoxygenNote:

7.3.2

NeedsCompilation:

Packaged:

2025-08-19 06:31:38 UTC; max

Author:

Max Gordon [aut, cre], Reinhard Seifert [aut] (Author of original plotHR)

Repository:

CRAN

Date/Publication:

2025-08-19 07:40:02 UTC

Regression Helper Functions

Description

This package provides utilities that streamline working with regression models. Standard regression output offers a wealth of information, but for publication—especially in journals such as *NEJM* and *BMJ*—results often need additional simplification. The functions here help you get there.

Output functions

The package contains functions that automatically print crude (unadjusted) estimates alongside adjusted estimates, a presentation style expected in many medical papers.

Forest-plot wrappers let you visualise regression estimates—handy when you’re dealing with a long list of covariates. Utilities are also included for comparing models (e.g., patient subsets or different regression types).

Time splitter

Cox models sometimes violate proportional-hazards assumptions. When the 'tt()' approach becomes unwieldy on large data sets, you can time-split the data and use start time as an interaction term. See timeSplitter() and vignette("timeSplitter").

Other regression functions

The package extends linear regression by enabling robust covariance matrices through the sandwich framework for rms::ols(). Additional helpers target stats and survival workflows, including coxph().

Important notice

Extensive tests guard against regressions, but always verify results in your own context. If you use the package with other model classes—and have tests to share—please let me know.

Author(s)

Max Gordon

Add a nonlinear function to the model

Description

This function takes a model and adds a non-linear function if the likelihood-ratio supports this (via the anova(..., test = "chisq") test for stats while for rms you need to use the rcs() spline that is automatically evaluated for non-linearity).

Usage

addNonlinearity(
  model,
  variable,
  spline_fn,
  flex_param = 2:7,
  min_fn = AIC,
  sig_level = 0.05,
  verbal = FALSE,
  workers,
  ...
)

## S3 method for class 'negbin'
addNonlinearity(model, ...)

Arguments

model

The model that is to be evaluated and adapted for non-linearity

variable

The name of the parameter that is to be tested for non-linearity. Note that the variable should be included plain (i.e. as a linear variable) form in the model.

spline_fn

Either a string or a function that is to be used for testing alternative non-linearity models

flex_param

A vector with values that are to be tested as the default second parameter for the non-linearity function that you want to evaluate. This defaults to 2:7, for the ns() it tests the degrees of freedom ranging between 2 and 7.

min_fn

This is the function that we want to minimized if the variable supports the non-linearity assumption. E.g. BIC() or AIC, note that the BIC() will in the majority of cases support a lower complexity than the AIC().

sig_level

The significance level for which the non-linearity is deemed as significant, defaults to 0.05.

verbal

Set this to TRUE if you want print statements with the anova test and the chosen knots.

workers

The function tries to run everything in parallel. Under some circumstances you may want to restrict the number of parallel threads to less than the default detectCores() - 1, e.g. you may run out of memory then you can provide this parameter. If you do not want to use parallel then simply set workers to FALSE. The cluster created using makeCluster() function.

...

Passed onto internal prNlChooseDf() function.

Examples

library(Greg)
data("melanoma", package = "boot", envir = environment())

library(dplyr)
melanoma <- mutate(melanoma, 
                   status = factor(status,
                                   levels = 1:3,
                                   labels = c("Died from melanoma", 
                                              "Alive", 
                                              "Died from other causes")),
                   ulcer = factor(ulcer,
                                  levels = 0:1,
                                  labels = c("Absent", "Present")),
                   time = time/365.25, # All variables should be in the same time unit
                   sex = factor(sex,
                                levels = 0:1,
                                labels = c("Female", "Male")))

library(survival)
model <- coxph(Surv(time, status == "Died from melanoma") ~ sex + age,
  data = melanoma
)

nl_model <- addNonlinearity(model, "age",
  spline_fn = "pspline",
  verbal = TRUE,
  workers = FALSE
)
# Note that there is no support for nonlinearity in this case

Getting the bread for the `vcovHC`

Description

The original bread.lm uses the summary.lm function it seems like a quick fix and I've therefore created the original bread definition: $(X'X)^-1$

Usage

## S3 method for class 'ols'
bread(x, ...)

Arguments

x

The ols model fit

...

arguments passed to methods.

Value

matrix The bread for the sandwich vcovHC function

Examples

# Generate some data
n <- 500
x1 <- runif(n) * 2
x2 <- runif(n)
y <- x1^3 + x2 + rnorm(n)

library(rms)
library(sandwich)
dd <- datadist(x1, x2, y)
org.op <- options(datadist = "dd")

# Main function
f <- ols(y ~ rcs(x1, 3) + x2)

# Check the bread
bread(f)
# Check the HC-matrix
vcovHC(f, type = "HC4m")
# Adjust the model so that it uses the HC4m variance
f_rob <- robcov_alt(f, type = "HC4m")
# Get the new HC4m-matrix
# - this function just returns the f_rob$var matrix
vcov(f_rob)
# Now check the confidence interval for the function
confint(f_rob)

options(org.op)

A function for gathering all the description options

Description

Since there are so many different description options for the printCrudeAndAdjustedModel() function they have been gathered into a list. This function is simply a helper in order to generate a valid list.

Usage

caDescribeOpts(
  show_tot_perc = FALSE,
  numb_first = TRUE,
  continuous_fn = describeMean,
  prop_fn = describeFactors,
  factor_fn = describeFactors,
  digits = 1,
  colnames = c("Total", "Event")
)

Arguments

show_tot_perc

Show percentages for the total column

numb_first

Whether to show the number before the percentages

continuous_fn

Stat function used for the descriptive statistics, defaults to describeMean()

prop_fn

Stat function used for the descriptive statistics, defaults to describeFactors() since there has to be a reference in the current setup.

factor_fn

Stat function used for the descriptive statistics, defaults to describeFactors()

digits

Number of digits to use in the descriptive columns. Defaults to the general digits if not specified.

colnames

The names of the two descriptive columns. By default Total and Event.

Value

list Returns a list with all the options

A `confint` function for the `ols`

Description

This function checks that there is a df.residual before running the qt(). If not found it then defaults to the qnorm() function. Otherwise it is a copy of the confint() function.

Usage

## S3 method for class 'ols'
confint(object, parm, level = 0.95, ...)

Arguments

object

a fitted ols-model object.

parm

a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.

level

the confidence level required.

...

additional argument(s) for methods.

Value

A matrix (or vector) with columns giving lower and upper confidence limits for each parameter. These will be labelled as (1-level)/2 and 1 - (1-level)/2 in

Examples

# Generate some data
n <- 500
x1 <- runif(n) * 2
x2 <- runif(n)
y <- x1^3 + x2 + rnorm(n)

library(rms)
library(sandwich)
dd <- datadist(x1, x2, y)
org.op <- options(datadist = "dd")

# Main function
f <- ols(y ~ rcs(x1, 3) + x2)

# Check the bread
bread(f)
# Check the HC-matrix
vcovHC(f, type = "HC4m")
# Adjust the model so that it uses the HC4m variance
f_rob <- robcov_alt(f, type = "HC4m")
# Get the new HC4m-matrix
# - this function just returns the f_rob$var matrix
vcov(f_rob)
# Now check the confidence interval for the function
confint(f_rob)

options(org.op)

The confint function adapted for vcovHC

Description

The confint.lm uses the t-distribution as the default confidence interval estimator. When there is reason to believe that the normal distribution is violated an alternative approach using the vcovHC() may be more suitable.

Usage

confint_robust(
  object,
  parm,
  level = 0.95,
  HC_type = "HC3",
  t_distribution = FALSE,
  ...
)

Arguments

object

The regression model object, either an ols or lm object

parm

a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.

level

the confidence level required.

HC_type

See options for vcovHC()

t_distribution

A boolean for if the t-distribution should be used or not. Defaults to FALSE. According to Cribari-Nieto and Lima's study from 2009 this should not be the case.

...

Additional parameters that are passed on to vcovHC()

Value

matrix A matrix (or vector) with columns giving lower and upper confidence limits for each parameter. These will be labelled as (1-level)/2 and 1 - (1-level)/2 in

References

F. Cribari-Neto and M. da G. A. Lima, "Heteroskedasticity-consistent interval estimators", Journal of Statistical Computation and Simulation, vol. 79, no. 6, pp. 787-803, 2009 (doi:10.1080/00949650801935327)

Examples

n <- 50
x <- runif(n)
y <- x + rnorm(n)

fit <- lm(y~x)
library("sandwich")
confint_robust(fit, HC_type = "HC4m")

Fix for the Extract Empirical Estimating Functions

Description

As missing data is handled a little different for the ols than for the lm we need to change the estfun to work with the ols().

Usage

## S3 method for class 'ols'
estfun(x, ...)

Arguments

x

A fitted ols model object.

...

arguments passed to methods.

Details

I have never worked with weights and this should probably be checked as this just uses the original estfun.lm as a template.

Value

matrix A matrix containing the empirical estimating functions.

Examples

# Generate some data
n <- 500
x1 <- runif(n) * 2
x2 <- runif(n)
y <- x1^3 + x2 + rnorm(n)

library(rms)
library(sandwich)
dd <- datadist(x1, x2, y)
org.op <- options(datadist = "dd")

# Main function
f <- ols(y ~ rcs(x1, 3) + x2)

# Check the bread
bread(f)
# Check the HC-matrix
vcovHC(f, type = "HC4m")
# Adjust the model so that it uses the HC4m variance
f_rob <- robcov_alt(f, type = "HC4m")
# Get the new HC4m-matrix
# - this function just returns the f_rob$var matrix
vcov(f_rob)
# Now check the confidence interval for the function
confint(f_rob)

options(org.op)

Compares different scores in different regression objects.

Description

Creates a composite from different regression objects into one forestplot where you can choose the variables of interest to get an overview and easier comparison.

Usage

forestplotCombineRegrObj(
  regr.obj,
  variablesOfInterest.regexp = NULL,
  estimate.txt = NULL,
  add_first_as_ref = FALSE,
  ref_txt = "ref.",
  digits = 1,
  post_process_data = function(x) x,
  is.summary = NULL,
  xlab = NULL,
  zero = NULL,
  xlog = NULL,
  exp = xlog,
  ...
)

Arguments

regr.obj

A list with all the fits that have variables that are to be identified through the regular expression

variablesOfInterest.regexp

A regular expression identifying the variables that are of interest of comparing. For instance it can be "(score|index|measure)" that finds scores in different models that should be compared.

estimate.txt

The text of the estimate, usually HR for hazard ratio, OR for odds ratio

add_first_as_ref

If you want that the first variable should be reference for that group of variables. The ref is a variable with the estimate 1 or 0 depending if exp() and the confidence interval 0.

ref_txt

Text instead of estimate number

digits

Number of digits to use for the estimate output

post_process_data

A function that takes the data frame just prior to calling 'forestplot' and allows you to manipulate it. Primarily used for changing the 'column_label' that has the names shown in the final plot.

is.summary

A vector indicating by TRUE/FALSE if the value is a summary value which means that it will have a different font-style

xlab

x-axis label

zero

Indicates what is zero effect. For survival/logistic fits the zero is 1 while in most other cases it's 0.

xlog

If TRUE, x-axis tick marks are to follow a logarithmic scale, e.g. for logistic regression (OR), survival estimates (HR), Poisson regression etc. Note: This is an intentional break with the original forestplot function as I've found that exponentiated ticks/clips/zero effect are more difficult to for non-statisticians and there are sometimes issues with rounding the tick marks properly.

exp

Report in exponential form. Default true since the function was built for use with survival models.

...

Passed to forestplot()

Examples

org.par <- par("ask" = TRUE)

# simulated data to test
library(tidyverse)
set.seed(10)
cov <- tibble(ftime = rexp(200),
              fstatus = sample(0:1, 200, replace = TRUE),
              x1 = runif(200),
              x2 = runif(200),
              x3 = runif(200)) |> 
  # Add some column labels
  Gmisc::set_column_labels(x1 = "First variable",
                           x2 = "Second variable")

library(rms)
ddist <- datadist(cov)
options(datadist = "ddist")

fit1 <- cph(Surv(ftime, fstatus) ~ x1 + x2, data = cov)
fit2 <- cph(Surv(ftime, fstatus) ~ x1 + x3, data = cov)

list(`First model` = fit1, 
     `Second model` = fit2) |> 
  forestplotCombineRegrObj(variablesOfInterest.regexp = "(x2|x3)") |> 
  fp_set_style(lines = "steelblue",
               box = "darkblue")

# How to add expressions to the plot label
list(fit1, fit2) |> 
  forestplotCombineRegrObj(variablesOfInterest.regexp = "(x2|x3)",
                           reference.names = c("First model", "Second model"),
                           post_process_data = \(data) {
                             data$column_label[4] <- c(rlang::expr(expression(Fever >= 38.5)))
                             return(data)
                           })

par(org.par)

Forest plot for multiple models

Description

Plot different model fits with similar variables in order to compare the model's estimates and confidence intervals. Each model is represented by a separate line on top of eachother and are therefore ideal for comparing different models. This extra appealing when you have lots of variables included in the models.

Usage

forestplotRegrObj(
  regr.obj,
  postprocess_estimates.fn = function(x) x,
  rowname = "Variable",
  ci.txt = "CI",
  ci.glue = "{lower} to {higher}",
  digits = 1,
  get_box_size = fpBoxSize,
  ...
)

## Default S3 method:
forestplotRegrObj(
  regr.obj,
  postprocess_estimates.fn = function(x) x,
  rowname = "Variable",
  ci.txt = "CI",
  ci.glue = "{lower} to {higher}",
  digits = 1,
  get_box_size = fpBoxSize,
  ...
)

## S3 method for class 'coxph'
forestplotRegrObj(
  regr.obj,
  postprocess_estimates.fn = function(x) x,
  rowname = "Variable",
  ci.txt = "CI",
  ci.glue = "{lower} to {higher}",
  digits = 1,
  get_box_size = fpBoxSize,
  xlab = "Hazard Ratio",
  estimate.txt = "HR",
  xlog = TRUE,
  zero = 1,
  exp = TRUE,
  ...
)

## S3 method for class 'lrm'
forestplotRegrObj(
  regr.obj,
  postprocess_estimates.fn = function(x) x,
  rowname = "Variable",
  ci.txt = "CI",
  ci.glue = "{lower} to {higher}",
  digits = 1,
  get_box_size = fpBoxSize,
  xlab = "Odds ratio",
  estimate.txt = "HR",
  xlog = TRUE,
  zero = 1,
  exp = TRUE,
  ...
)

## S3 method for class 'lm'
forestplotRegrObj(
  regr.obj,
  postprocess_estimates.fn = function(x) x,
  rowname = "Variable",
  ci.txt = "CI",
  ci.glue = "{lower} to {higher}",
  digits = 1,
  get_box_size = fpBoxSize,
  xlab = "Effect",
  estimate.txt = "Coef",
  xlog = FALSE,
  zero = 0,
  exp = FALSE,
  ...
)

## S3 method for class 'glm'
forestplotRegrObj(
  regr.obj,
  postprocess_estimates.fn = function(x) x,
  rowname = "Variable",
  ci.txt = "CI",
  ci.glue = "{lower} to {higher}",
  digits = 1,
  get_box_size = fpBoxSize,
  xlab = NULL,
  xlog = NULL,
  zero = NULL,
  estimate.txt = NULL,
  exp = NULL,
  ...
)

## S3 method for class 'list'
forestplotRegrObj(
  regr.obj,
  postprocess_estimates.fn = function(x) x,
  rowname = "Variable",
  ci.txt = "CI",
  ci.glue = "{lower} to {higher}",
  digits = 1,
  get_box_size = fpBoxSize,
  xlab = NULL,
  xlog = NULL,
  zero = NULL,
  estimate.txt = NULL,
  exp = NULL,
  ...
)

fpBoxSize(p_values, variable_count, boxsize, significant = 0.05)

Arguments

regr.obj

A regression model object. It should be of coxph, crr or glm class. Warning: The glm is not fully tested.

postprocess_estimates.fn

A function that takes the regression outputs and returns the same data with modifications. The input columns are:

* 'Rowname' * 'Coef' * 'Lower' * 'Upper' * 'Sort'

rowname

The name of the variables

ci.txt

The text above the confidence interval, defaults to '"CI"'

ci.glue

The string used for [glue::glue()] the 'lower' and 'higher' confidence intervals together.

digits

The number of digits to round presented values to

get_box_size

A function for extracting the box sizes

...

Passed to forestplot()

xlab

x-axis label

estimate.txt

The text above the estimate, e.g. Est, HR

xlog

zero

Indicates what is zero effect. For survival/logistic fits the zero is 1 while in most other cases it's 0.

exp

Report in exponential form. Default true since the function was built for use with survival models.

p_values

The p-values that will work as the foundation for the box size

variable_count

The number of variables

boxsize

The default box size

significant

Level of significance .05

Examples

org.par <- par("ask" = TRUE)

library(tidyverse)
# simulated data to test
set.seed(102)
cov <- tibble(ftime = rexp(200)) |> 
  mutate(x1 = runif(n()),
         x2 = runif(n()),
         x3 = runif(n()),
         fstatus1 = if_else(x1 * 1 + 
                              x2 * 0.2 + 
                              x3 * 0.5 + 
                              runif(n()) * 0.5 > 1, 
                            1, 0),
         fstatus2 = if_else(x1 * 0.2 + 
                              x2 * 0.5 + 
                              x3 * 0.1 + 
                              runif(n()) * 2 > 1, 
                            1, 0)) |> 
  # Add some column labels
  Gmisc::set_column_labels(x1 = "First variable",
                           x2 = "Second variable")

library(rms)
dd <- datadist(cov)
options(datadist = "dd")

fit1 <- cph(Surv(ftime, fstatus1 == 1) ~ x1 + x2 + x3, data = cov)

fit1 |> 
  forestplotRegrObj() |> 
  fp_set_zebra_style("#f0f0f0")

fit2 <- update(fit1, Surv(ftime, fstatus2 == 1) ~ .)
list("Frist model" = fit1, "Second model"  = fit2) |> 
  forestplotRegrObj(legend_args = fpLegend(title = "Type of regression"),
                    postprocess_estimates.fn = function(x) {
                      x |> 
                        filter(str_detect(column_term, "(x2|x3)"))
                    }) |> 
  fp_set_style(box = rep(c("darkblue", "darkred"), each = 3))


par(org.par)

This function helps with printing regression models

Description

This function is used for getting the adjusted and unadjusted values for a regression model. It takes a full model and walks through each variable, removes in the regression all variables except one then reruns that variable to get the unadjusted value. This functions not intended for direct use, it's better to use printCrudeAndAdjustedModel() that utilizes this function.

Usage

getCrudeAndAdjustedModelData(
  model,
  level = 0.95,
  remove_interaction_vars = TRUE,
  remove_strata = FALSE,
  remove_cluster = FALSE,
  var_select,
  ...
)

## S3 method for class 'getCrudeAndAdjustedModelData'
x[i, j, ...]

Arguments

model

The regression model

level

The confidence interval level

remove_interaction_vars

Removes the interaction terms as they in the raw state are difficult to understand

remove_strata

Strata should most likely not be removed in the crude version. If you want to force the removal of stratas you can specify the remove_strata = TRUE

remove_cluster

Cluster information should most likely also retain just as the remove_strata option. Clusters are sometimes used in cox regression models, cluster()

var_select

A vector with regular expressions for choosing what variables to return (the same format as for the order argument in printCrudeAndAdjustedModel() call). It can be useful when working with large datasets only to report a subsection of all tested variables. This makes the function both run faster and the data presentation more concise.

...

Not used

Details

This function saves a lot of time creating tables since it compiles a fully unadjusted list of all your used covariates.

If the model is an exponential poisson/logit/cox regression model then it automatically reports the exp() values instead of the original values

The function skips by default all spline variables since this becomes very complicated and there is no simple

\beta

to display. For the same reason it skips any interaction variables since it's probably better to display these as a contrast table.

Note that the rms regression has a separate function that uses the rms:::summaryrms function that returns a matrix that is then pruned.

Value

Returns a matrix with the columns: c("Crude", "2.5 %", "97.5 %", "Adjusted", "2.5 %", "97.5 %"). The row order is not changed from the original model. The percentages can vary depending on the set level.

Examples

# simulated data to use
set.seed(10)
ds <- data.frame(
  ftime = rexp(200),
  fstatus = sample(0:1, 200, replace = TRUE),
  x1 = runif(200),
  x2 = runif(200),
  x3 = runif(200),
  x4 = runif(200),
  x5 = runif(200)
)

library(rms)
dd <- datadist(ds)
options(datadist = "dd")

s <- Surv(ds$ftime, ds$fstatus == 1)
fit <- cph(s ~ x1 + x2 + x3, data = ds)

data_matrix <- getCrudeAndAdjustedModelData(fit)

print(data_matrix)

# If we have interaction then those variable are not
# reported
fit <- cph(s ~ x1 + x2 + x3 + x4 * x5, data = ds)
data_matrix <- getCrudeAndAdjustedModelData(fit)

print(data_matrix)

Get model data

Description

A helper function for forestplotCombineRegrObj(). Extracts the data from the regression model fits and returns a list with model data gathered by the function [broom::tidy()]

Usage

getModelData4Forestplot(
  regr.obj,
  exp = TRUE,
  variablesOfInterest.regexp = NULL,
  add_first_as_ref = FALSE
)

Arguments

regr.obj

A list with all the fits that have variables that are to be identified through the regular expression

exp

Report in exponential form. Default true since the function was built for use with survival models.

variablesOfInterest.regexp

A regular expression identifying the variables that are of interest of comparing. For instance it can be "(score|index|measure)" that finds scores in different models that should be compared.

add_first_as_ref

If you want that the first variable should be reference for that group of variables. The ref is a variable with the estimate 1 or 0 depending if exp() and the confidence interval 0.

Examples

org.par <- par("ask" = TRUE)

# simulated data to test
library(tidyverse)
set.seed(10)
cov <- tibble(ftime = rexp(200),
              fstatus = sample(0:1, 200, replace = TRUE),
              x1 = runif(200),
              x2 = runif(200),
              x3 = runif(200)) |> 
  # Add some column labels
  Gmisc::set_column_labels(x1 = "First variable",
                           x2 = "Second variable")

library(rms)
ddist <- datadist(cov)
options(datadist = "ddist")

fit1 <- cph(Surv(ftime, fstatus) ~ x1 + x2, data = cov)
fit2 <- cph(Surv(ftime, fstatus) ~ x1 + x3, data = cov)

list(`First model` = fit1, 
     `Second model` = fit2) |> 
  forestplotCombineRegrObj(variablesOfInterest.regexp = "(x2|x3)") |> 
  fp_set_style(lines = "steelblue",
               box = "darkblue")

# How to add expressions to the plot label
list(fit1, fit2) |> 
  forestplotCombineRegrObj(variablesOfInterest.regexp = "(x2|x3)",
                           reference.names = c("First model", "Second model"),
                           post_process_data = \(data) {
                             data$column_label[4] <- c(rlang::expr(expression(Fever >= 38.5)))
                             return(data)
                           })

par(org.par)

Get the hat matrix for the OLS

Description

The hat matrix comes from the residual definition:

\hat{\epsilon} = y-X\hat{\beta} = \{I_n-X(X'X)X'\}y = (I_n-H)y

where the H is called the hat matrix since

Hy = \hat{y}

. The hat values are actually the diagonal elements of the matrix that sum up to p (the rank of X, i.e. the number of parameters + 1). See ols.influence().

Usage

## S3 method for class 'ols'
hatvalues(model, ...)

Arguments

model

The ols model fit

...

arguments passed to methods.

Value

vector

Examples

# Generate some data
n <- 500
x1 <- runif(n) * 2
x2 <- runif(n)
y <- x1^3 + x2 + rnorm(n)

library(rms)
library(sandwich)
dd <- datadist(x1, x2, y)
org.op <- options(datadist = "dd")

# Main function
f <- ols(y ~ rcs(x1, 3) + x2)

# Check the bread
bread(f)
# Check the HC-matrix
vcovHC(f, type = "HC4m")
# Adjust the model so that it uses the HC4m variance
f_rob <- robcov_alt(f, type = "HC4m")
# Get the new HC4m-matrix
# - this function just returns the f_rob$var matrix
vcov(f_rob)
# Now check the confidence interval for the function
confint(f_rob)

options(org.op)

Functions for checking regression type

Description

The isFitCoxPH A simple check if object inherits either "coxph" or "crr" class indicating that it is a survival function.

Usage

isFitCoxPH(fit)

isFitLogit(fit)

Arguments

fit

Regression object

Value

boolean Returns TRUE if the object is of that type otherwise it returns FALSE.

Examples

# simulated data to use
set.seed(10)
ds <- data.frame(
  ftime = rexp(200),
  fstatus = sample(0:1, 200, replace = TRUE),
  x1 = runif(200),
  x2 = runif(200),
  x3 = runif(200)
)

library(survival)
library(rms)

dd <- datadist(ds)
options(datadist = "dd")

s <- Surv(ds$ftime, ds$fstatus == 1)
fit <- cph(s ~ x1 + x2 + x3, data = ds)

if (isFitCoxPH(fit)) {
  print("Correct, the cph is of cox PH hazard type")
}

fit <- coxph(s ~ x1 + x2 + x3, data = ds)
if (isFitCoxPH(fit)) {
  print("Correct, the coxph is of cox PH hazard type")
}

library(cmprsk)
set.seed(10)
ftime <- rexp(200)
fstatus <- sample(0:2, 200, replace = TRUE)
cov <- matrix(runif(600), nrow = 200)
dimnames(cov)[[2]] <- c("x1", "x2", "x3")
fit <- crr(ftime, fstatus, cov)

if (isFitCoxPH(fit)) {
  print(paste(
    "Correct, the competing risk regression is",
    "considered a type of cox regression",
    "since it has a Hazard Ratio"
  ))
}
# ** Borrowed code from the lrm example **

# Fit a logistic model containing predictors age, blood.pressure, sex
# and cholesterol, with age fitted with a smooth 5-knot restricted cubic
# spline function and a different shape of the age relationship for males
# and females.

n <- 1000 # define sample size
set.seed(17) # so can reproduce the results
age <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol <- rnorm(n, 200, 25)
sex <- factor(sample(c("female", "male"), n, TRUE))
label(age) <- "Age" # label is in Hmisc
label(cholesterol) <- "Total Cholesterol"
label(blood.pressure) <- "Systolic Blood Pressure"
label(sex) <- "Sex"
units(cholesterol) <- "mg/dl" # uses units.default in Hmisc
units(blood.pressure) <- "mmHg"

# To use prop. odds model, avoid using a huge number of intercepts by
# grouping cholesterol into 40-tiles

# Specify population model for log odds that Y = 1
L <- .4 * (sex == "male") + .045 * (age - 50) +
  (log(cholesterol - 10) - 5.2) * (-2 * (sex == "female") + 2 * (sex == "male"))
# Simulate binary y to have Prob(y = 1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)
cholesterol[1:3] <- NA # 3 missings, at random

ddist <- datadist(age, blood.pressure, cholesterol, sex)
options(datadist = "ddist")

fit_lrm <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol, 4)),
  x = TRUE, y = TRUE
)

if (isFitLogit(fit_lrm) == TRUE) {
  print("Correct, the lrm is a logistic regression")
}

fit_lm <- lm(blood.pressure ~ sex)
if (isFitLogit(fit_lm) == FALSE) {
  print("Correct, the lm is not a logistic regression")
}

fit_glm_logit <- glm(y ~ blood.pressure + sex * (age + rcs(cholesterol, 4)),
  family = binomial()
)

if (isFitLogit(fit_glm_logit) == TRUE) {
  print("Correct, the glm with a family of binomial is a logistic regression")
}

fit_glm <- glm(blood.pressure ~ sex)
if (isFitLogit(fit_glm) == FALSE) {
  print("Correct, the glm without logit as a family is not a logistic regression")
}

A fix for the `model.matrix`

Description

The model.matrix.lm() that the ols() falls back upon "forgets" the intercept value and behaves unreliable in the vcovHC() functions. I've therefore created this sub-function to generate the actual model.matrix() by just accessing the formula.

Usage

## S3 method for class 'ols'
model.matrix(object, ...)

Arguments

object

A Model

...

Parameters passed on

Value

matrix

Plot a spline in a Cox regression model

Description

This function is a more specialized version of the termplot() function. It creates a plot with the spline against hazard ratio. The plot can additianally have indicator of variable density and have multiple lines.

Usage

plotHR(
  models,
  term = 1,
  se = TRUE,
  cntrst = ifelse(inherits(models, "rms") || inherits(models[[1]], "rms"), TRUE, FALSE),
  polygon_ci = TRUE,
  rug = "density",
  xlab = "",
  ylab = "Hazard Ratio",
  main = NULL,
  xlim = NULL,
  ylim = NULL,
  col.term = "#08519C",
  col.se = "#DEEBF7",
  col.dens = grey(0.9),
  lwd.term = 3,
  lty.term = 1,
  lwd.se = lwd.term,
  lty.se = lty.term,
  x.ticks = NULL,
  y.ticks = NULL,
  ylog = TRUE,
  cex = 1,
  y_axis_side = 2,
  plot.bty = "n",
  axes = TRUE,
  alpha = 0.05,
  ...
)

## S3 method for class 'plotHR'
print(x, ...)

## S3 method for class 'plotHR'
plot(x, y, ...)

Arguments

models

A single model or a list() with several models

term

The term of interest. Can be either the name or the number of the covariate in the model.

se

Boolean if you want the confidence intervals or not

cntrst

By contrasting values you can have the median as a reference point making it easier to compare hazard ratios.

polygon_ci

If you want a polygon as indicator for your confidence interval. This can also be in the form of a vector if you have several models. Sometimes you only want one model to have a polygon and the rest to be dotted lines. This gives the reader an indication of which model is important.

rug

The rug is the density of the population along the spline variable. Often this is displayed as a jitter with bars that are thicker & more common when there are more observations in that area or a smooth density plot that looks like a mountain. Use "density" for the mountain view and "ticks" for the jitter format.

xlab

The label of the x-axis

ylab

The label of the y-axis

main

The main title of the plot

xlim

A vector with 2 elements containing the upper & the lower bound of the x-axis

ylim

A vector with 2 elements containing the upper & the lower bound of the y-axis

col.term

The color of the estimate line. If multiple lines you can have different colors by giving a vector.

col.se

The color of the confidence interval. If multiple lines you can have different colors by giving a vector.

col.dens

The color of the density plot. Ignored if you're using jitter

lwd.term

The width of the estimated line. If you have more than one model then provide the function with a vector if you want to have different lines for different width for each model.

lty.term

The typeof the estimated line, see lty. If you have more than one model then provide the function with a vector if you want to have different line types for for each model.

lwd.se

The line width of your confidence interval. This is ignored if you're using polygons for all the confidence intervals.

lty.se

The line type of your confidence interval. This is ignored if you're using polygons for all the confidence intervals.

x.ticks

The ticks for the x-axis if you desire other than the default.

y.ticks

The ticks for the y-axis if you desire other than the default.

ylog

Show a logarithmic y-axis. Not having a logarithmic axis might seem easier to understand but it's actually not really a good idea. The distance between HR 0.5 and 2.0 should be the same. This will only show on a logarithmic scale and therefore it is strongly recommended to use the logarithmic scale.

cex

Increase if you want larger font size in the graph.

y_axis_side

The side that the y axis is to be plotted, see axis() for details

plot.bty

Type of box that you want. See the bty description in graphical parameters (par). If bty is one of "o" (the default), "l", "7", "c", "u", or "]" the resulting box resembles the corresponding upper case letter. A value of "n" suppresses the box.

axes

A boolean that is used to identify if axes are to be plotted

alpha

The alpha level for the confidence intervals

...

Any additional values that are to be sent to the plot() function

x

Sent the 'plotHR' object to plot

y

Ignored in plot

Value

The function does not return anything

Multiple models in one plot

The function allows for plotting multiple splines in one graph. Sometimes you might want to show more than one spline for the same variable. This allows you to create that comparison.

Examples of a situation where I've used multiple splines in one plot is when I want to look at a variables behavior in different time periods. This is another way of looking at the proportional hazards assumption. The Schoenfeld residuals can be a little tricky to look at when you have the splines.

Another example of when I've used this is when I've wanted to plot adjusted and unadjusted splines. This can very nicely demonstrate which of the variable span is mostly confounded. For instance - younger persons may exhibit a higher risk for a procedure but when you put in your covariates you find that the increased hazard changes back to the basic

Author(s)

Reinhard Seifert, Max Gordon

Examples

org_par <- par(xaxs = "i", ask = TRUE)
library(survival)
library(rms)
library(dplyr)
library(Gmisc)

# Get data for example
n <- 1000
set.seed(731)

ds <- tibble(age = round(50 + 12 * rnorm(n), 1),
             smoking = sample(c("Yes", "No"), n, rep = TRUE, prob = c(.2, .75)),
             sex = sample(c("Male", "Female"), n, rep = TRUE, prob = c(.6, .4))) |>
  # Build outcome
  mutate(h = .02 * exp(.02 * (age - 50) + .1 *
                         ((age - 50) / 10)^3 + .8 *
                         (sex == "Female") + 2 *
                         (smoking == "Yes")),
         cens = 15 * runif(n),
         dt = -log(runif(n)) / h,
         e = if_else(dt <= cens, 1, 0),
         dt = pmin(dt, cens),
         # Add missing data to smoking
         smoking = case_when(runif(n) < 0.05 ~ NA_character_,
                             TRUE ~ smoking)) |>
  set_column_labels(age = "Age",
                    dt = "Follow-up time") |>
  set_column_units(dt = "Year")


library(splines)
fit.coxph <- coxph(Surv(dt, e) ~ bs(age, 3) + sex + smoking, data = ds)

plotHR(fit.coxph, term = "age", plot.bty = "o", xlim = c(30, 70), xlab = "Age")

dd <- datadist(ds)
options(datadist = "dd")
fit.cph <- cph(Surv(dt, e) ~ rcs(age, 4) + sex + smoking, data = ds, x = TRUE, y = TRUE)

plotHR(fit.cph,
       term = 1,
       plot.bty = "L",
       xlim = c(30, 70),
       ylim = 2^c(-3, 3),
       xlab = "Age"
)

plotHR(fit.cph,
       term = "age",
       plot.bty = "l",
       xlim = c(30, 70),
       ylog = FALSE,
       rug = "ticks",
       xlab = "Age"
)

unadjusted_fit <- cph(Surv(dt, e) ~ rcs(age, 4), data = ds, x = TRUE, y = TRUE)
plotHR(list(fit.cph, unadjusted_fit),
       term = "age",
       xlab = "Age",
       polygon_ci = c(TRUE, FALSE),
       col.term = c("#08519C", "#77777799"),
       col.se = c("#DEEBF7BB", grey(0.6)),
       lty.term = c(1, 2),
       plot.bty = "l", xlim = c(30, 70)
)
par(org_par)

Add reference according to the model

Description

This is of course for factored variables and not in general.

Usage

prCaAddRefAndStat(
  model,
  var_order,
  add_references,
  add_references_pos,
  reference_zero_effect,
  values,
  ds,
  desc_column,
  desc_args,
  use_labels
)

Arguments

model

A regression model fit, i.e. the returned object from your regression function, or the output from getCrudeAndAdjustedModelData()

var_order

The output from the prMapVariable2Name

add_references

True if it should use the data set to look for references, otherwise supply the function with a vector with names. Sometimes you want to indicate the reference row for each group. This needs to be just as many as the groups as the order identified. Use NA if you don't want to have a reference for that particular group.

add_references_pos

The position where a reference should be added. Sometimes you don't want the reference to be at the top, for instance if you have age groups then you may have < 25, 25-39, 40-55, > 55 and you have the reference to be 25-39 then you should set the reference list for age_groups as add_references_pos = list(age_groups = 2) so that you have the second group as the position for the reference.

reference_zero_effect

Used with references, tells if zero effect is in exponential form, i.e. exp(0) = 1, or in regular format, i.e. 0 = 0 (can be set to any value)

values

The values that are to be outputted

ds

The dataset

desc_column

Add descriptive column to the crude and adjusted table

desc_args

The description arguments that are to be used for the the description columns. The options/arguments should be generated by the caDescribeOpts function.

use_labels

If the rowname.fn function doesn't change the name then the label should be used instead of the name, that is if there is a label and it isn't a factor.

Value

list

Adds a reference to value matrix

Description

Adds a reference to value matrix

Usage

prCaAddReference(
  vn,
  var_order,
  values,
  add_references_pos,
  reference_zero_effect,
  ds,
  use_labels
)

Arguments

vn

Variable name

var_order

The output from the prMapVariable2Name

values

The value matrix

add_references_pos

reference_zero_effect

Used with references, tells if zero effect is in exponential form, i.e. exp(0) = 1, or in regular format, i.e. 0 = 0 (can be set to any value)

ds

The data set

use_labels

If the rowname.fn function doesn't change the name then the label should be used instead of the name, that is if there is a label and it isn't a factor.

Value

matrix A matrix with rgroup and n.rgroup attributes

Adds references

Description

Adds references

Usage

prCaAddUserReferences(
  reordered_groups,
  var_order,
  add_references,
  add_references_pos,
  reference_zero_effect
)

Arguments

reordered_groups

The value matrix that needs refrences

var_order

The output from the prMapVariable2Name

add_references

add_references_pos

reference_zero_effect

Used with references, tells if zero effect is in exponential form, i.e. exp(0) = 1, or in regular format, i.e. 0 = 0 (can be set to any value)

Value

matrix The reordered_groups with references and the attribute "var_order" in order to keep track of no. of variables per row.

Get the confidence intervals

Description

These are functions that get the estimates and the confidence intervals. Due to package differences there are some local modifications.

Usage

prCaDefaultGetCoefAndCI(model, level, skip_intercept = FALSE)

prCaRmsGetCoefAndCI(model, level, vn, data)

Arguments

model

The regression model

level

The confidence interval level

skip_intercept

If the model should remove the intercept from the returned values.

vn

The variable names

data

The data set

Value

matrix Returns a n x 3 matrix where the n equals the number of variables.

The default

Gets the estimate and confidence interval using the confint and coef.

The rms

The rms-package does not have confint implemented and it is therefore a better option to go through the summary function (rms:::summary.rms). Infortunately skip intercept is not an option as the summary doesn't include the intercept for the rms regression outputs

Function for retrieving the imputation arguments

Description

Function for retrieving the imputation arguments

Usage

prCaGetImputationCols(impute_args, output_mtrx, model, data)

Arguments

impute_args

The imputation arguments from printCrudeAndAdjustedModel function call.

output_mtrx

The reordered groups matrix (a nx4 matrix) that have been prepared in for the printCrudeAndAdjustedModel function. It is important that the references if any have been added.

model

The imputation model. Currently only fit.mult.impute is supported by the function.

data

The data that has been used for generating the model.

Value

matrix Returns a matrix with the requested columns

Gets the labelled rowname if it exists

Description

Looks for matches inside factors if rowname contains the name of the column.

Usage

prCaGetRowname(vn, use_labels, dataset)

Arguments

vn

The variable name

use_labels

If labels should be used

dataset

The dataset

Value

string The rowname

Gets the variable stats

Description

Gets the variable stats

Usage

prCaGetVnStats(
  model,
  vn,
  outcome,
  ds,
  add_references,
  add_references_pos,
  desc_args
)

Arguments

model

The model

vn

The variable name

outcome

The outcome vector

ds

The dataset

add_references

add_references_pos

desc_args

The description arguments that are to be used for the the description columns. The options/arguments should be generated by the caDescribeOpts function.

Value

matrix A matrix from getDescriptionStatsBy or prGetStatistics

Prettify the text

Description

Sets the number of digits, formats the confidence interval and changes the number of cols into 4 where the upper and lower CI meet in one string column

Usage

prCaPrepareCrudeAndAdjusted(x, ci_lim, digits, sprintf_ci_str)

Arguments

x

The value matrix from getCrudeAndAdjusted

ci_lim

The limits of the confidence interval

digits

The number of decimal digits to use

sprintf_ci_str

The sprintf code for the confidence interval

Value

matrix A string matrix with the values formated

Reorder according to the requested variables

Description

Uses the prCaSelectAndOrderVars for finding the orders according to the order argument.

Usage

prCaReorder(mtrx2reorder, var_order, order)

Arguments

mtrx2reorder

The matrix to reorder

var_order

The variables representing different rows prMapVariable2Name

order

A vector of strings used for prCaSelectAndOrderVars

Value

matrix Returns the mtrx2reorder rearranged with the attribute "greps" for the greps from prCaSelectAndOrderVars and the attribute "var_order" for the new var_order

Adds the ordering, references, and descriptions

Description

This is a wrapper function around some more basic functions that printCrudeAndAdjustedModel() uses.

Usage

prCaReorderReferenceDescribe(
  x,
  model,
  order,
  var_order,
  add_references,
  add_references_pos,
  reference_zero_effect,
  ds,
  desc_column,
  desc_args,
  use_labels
)

Arguments

x

The main value matrix from the prCaPrepareCrudeAndAdjusted()

model

The model

order

A vector A vector with regular expressions for each group.

var_order

The output from the prMapVariable2Name()

add_references

add_references_pos

reference_zero_effect

Used with references, tells if zero effect is in exponential form, i.e. exp(0) = 1, or in regular format, i.e. 0 = 0 (can be set to any value)

ds

The dataset from the model

desc_column

Add descriptive column to the crude and adjusted table

desc_args

The description arguments that are to be used for the the description columns. The options/arguments should be generated by the caDescribeOpts function.

use_labels

If the rowname.fn function doesn't change the name then the label should be used instead of the name, that is if there is a label and it isn't a factor.

Value

The reordered groups as a matrix

Re-order variables

Description

Re-order variables

Usage

prCaSelectAndOrderVars(names, order, ok2skip = FALSE)

Arguments

names

The names of the variables

order

The order regular expression

ok2skip

If you have the intercept then it should be ok for the function to skip that variable if it isn't found among the variable list

Value

vector A vector containing the greps

Sets the rownames of the reordered_groups

Description

Sets the rownames of the reordered_groups

Usage

prCaSetRownames(reordered_groups, var_order, rowname.fn, use_labels, ds)

Arguments

reordered_groups

The value matrix that needs refrences

var_order

The output from the prMapVariable2Name

rowname.fn

A rowname function for tailoring names

use_labels

Whether to use labels or not

ds

The model data set

Value

matrix Returns the reordered_groups

Removes the printCrudeAndAdjusted class from arguments

Description

Removes the printCrudeAndAdjusted class from arguments

Usage

prClearPCAclass(pca)

Value

list

A function for converting a useNA variable

Description

The variable is suppose to be directly compatible with table(..., useNA = useNA). It throughs an error if not compatible

Usage

prConvertShowMissing(useNA)

Arguments

useNA

Boolean or "no", "ifany", "always"

Value

string

Runs an `fastDoCall()` within the environment of the model

Description

Sometimes the function can't find some of the variables that were available when running the original variable. This function uses the as.formula() together with environment() in order to get the environment that the original code used.

Usage

prEnvModelCall(model, what, ...)

Arguments

model

The model used

what

The function or non-empty character string used for fastDoCall()

...

Additional arguments passed to the function

Get model outcome

Description

Uses the model to extract the outcome variable. Throws error if unable to find the outcome.

Usage

prExtractOutcomeFromModel(model, mf)

Arguments

model

The fitted model

mf

The dataset that the model is fitted to - if missing it uses the model.frame() dataset. This can cause length issues as there may be variables that are excluded from the model for different reasons.

Value

vector

Looks for unique rowname match without grep

Description

Since a rowname may contain characters reserved by regular expressions I've found it easier to deal with the rowname finding by just checking for matching strings at the beginning of the name while at the same time excluding names that have the same stem, i.e. DM and DM_COMP will cause an issue since DM will match both rows.

Usage

prFindRownameMatches(rnames, vn, vars)

Arguments

rnames

A vector with the rownames that are looked for

vn

The variable name that is of interest

vars

A vector with all the names and the potentially competing names

Value

integer A vector containing the position of the matches

TODO: remove this function in favor of the more powerful prMapVariable2Name

Get model data.frame

Description

Returns the raw variables from the original data frame using the get_all_vars() but with the twist that it also performs any associated subsetting based on the model's subset() argument.

Usage

prGetModelData(x, terms_only = FALSE, term.label)

Arguments

x

The fitted model.

terms_only

Only use the right side of the equation by selecting the terms

term.label

Sometimes need to retrieve specific spline labels that are not among the 'labels(terms(x))'

Value

data.frame

Get the models variables

Description

This function extract the modelled variables. Any interaction terms are removed as those should already be represented by the individual terms.

Usage

prGetModelVariables(
  model,
  remove_splines = TRUE,
  remove_interaction_vars = FALSE,
  add_intercept = FALSE
)

Arguments

model

A model fit

remove_splines

If splines, etc. should be cleaned from the variables as these no longer are "pure" variables

remove_interaction_vars

If interaction variables are not interesting then these should be removed. Often in the case of printCrudeAndAdjustedModel() it is impossible to properly show interaction variables and it's better to show these in a separate table

add_intercept

Adds the intercept if it exists

Value

vector with names

Get statistics according to the type

Description

A simple function applied by the getDescriptionStatsBy() for the total column. This function is also used by printCrudeAndAdjustedModel() in case of a basic linear regression is asked for a raw stat column

Usage

prGetStatistics(
  x,
  show_perc = FALSE,
  html = TRUE,
  digits = 1,
  numbers_first = TRUE,
  useNA = "no",
  show_all_values = FALSE,
  continuous_fn = describeMean,
  factor_fn = describeFactors,
  prop_fn = factor_fn,
  percentage_sign = percentage_sign
)

Arguments

x

The variable that we want the statistics for

show_perc

If this is a factor/proportion variable then we might want to show the percentages

html

If the output should be in html or LaTeX formatting

digits

Number of decimal digits

numbers_first

If number is to be prior to the percentage

useNA

If missing should be included

show_all_values

This is by default false as for instance if there is no missing and there is only one variable then it is most sane to only show one option as the other one will just be a complement to the first. For instance sex - if you know gender then automatically you know the distribution of the other sex as it's 100 % - other %.

continuous_fn

A function for describing continuous variables defaults to describeMean()

factor_fn

A function for describing factors, defaults to describeFactors()

prop_fn

A function for describing proportions, defaults to the factor function

percentage_sign

If you want to suppress the percentage sign you can set this variable to FALSE. You can also choose something else that the default % if you so wish by setting this variable.

Value

A matrix or a vector depending on the settings

TODO: Use the Gmisc function instead of this copy

A function that tries to resolve what variable corresponds to what row

Description

As both the getCrudeAndAdjustedModelData() and the printCrudeAndAdjustedModel() need to now exactly what name from the coef()/summary.rms() correspond to we for generalizeability this rather elaborate function.

Usage

prMapVariable2Name(var_names, available_names, data, force_match = TRUE)

Arguments

var_names

The variable names that are saught after

available_names

The names that are available to search through

data

The data set that is saught after

force_match

Whether all variables need to be identified or not. E.g. you may only want to use some variables and already pruned the available_names and therefore wont have matches. This is the case when getCrudeAndAdjustedModelData() has been used together with the var_select argument.

Value

list Returns a list with each element has the corresponding variable name and a subsequent list with the parameters no_rows and location indiciting the number of rows corresponding to that element and where those rows are located. For factors the list also contains lvls and no_lvls.

Chooses the degrees of freedom for the non-linearity

Description

Looks for the model with the minimal min_fn within the flex_param span.

Usage

prNlChooseDf(
  model,
  flex_param,
  variable,
  spline_fn,
  min_fn,
  simplest_nonlinear,
  verbal,
  workers,
  libraries
)

Arguments

model

The model that is to be evaluated and adapted for non-linearity

flex_param

variable

The name of the parameter that is to be tested for non-linearity. Note that the variable should be included plain (i.e. as a linear variable) form in the model.

spline_fn

Either a string or a function that is to be used for testing alternative non-linearity models

min_fn

simplest_nonlinear

The simplest non-linear form that the ANOVA has been tested against

verbal

Set this to TRUE if you want print statements with the anova test and the chosen knots.

workers

libraries

If we use the parallel approach we need to make sure that the right libraries are available in the threads

Plots the confidence intervals

Description

Uses polygon() or lines() to plot confidence intervals.

Usage

prPhConfIntPlot(model_data, color, polygon, lwd, lty)

Arguments

model_data

A data frame with 'xvalues', 'upper', and 'lower' columns.

color

The color of the line/polygon

polygon

Boolean indicating polygon or line

lwd

Line width - see gpar()

lty

Line type - see gpar()

Value

void The function performs the print

Plot a density on the datapoints

Description

Plot a density on the datapoints

Usage

prPhDensityPlot(xvalues, color)

Arguments

xvalues

The xvalues that are used for the density

color

The color of the density polygon

Value

void

Gets the non-linear function's estimate

Description

The function uses predict if not specified contrast in order to attain the estimate, upper, and lower confidence interval

Usage

prPhEstimate(model, term.label, ylog, cntrst, xlim, alpha, new_data)

Arguments

model

The fit of the model to be plotted

term.label

The name of the label

ylog

If the outcome should be presented in the anti-log form, i.e. exp().

cntrst

A boolean that indicates if the contrast() function is to be deployed for rms generated functions. The nice thing is that you get the median as a reference by default.

xlim

The xlim if provided

new_data

If not provided the function looks for the most common values i.e. median for continuous and mode for factors.

Value

data.frame with the columns xvalues, fit, ucl, lcl

A function for retrieving new_data argument for predict

Description

A function for retrieving new_data argument for predict

Usage

prPhNewData(model, term.label, xlim)

Arguments

model

The model fit from coxph() or cph()

term.label

The label that is the one that plotHR() intends to plot.

xlim

The x-limits for the plot if any

Value

data.frame

Plot a rug on the datapoints

Description

Plot a rug on the datapoints

Usage

prPhRugPlot(xvalues)

Arguments

xvalues

The xvalues that are used for the density

Value

void

Prep for printing

Description

Since we have both the print() and the knit_print() that we need to call it is useful to have a common string preparation. Note: Currently knit_print doesn't work as expected...

Usage

prPrintCAstring(x, ...)

Arguments

x

The output object from the printCrudeAndAdjustedModel function

...

outputs from printCrudeAndAdjusted. If mixed then it defaults to rbind.data.frame

Output crude and adjusted model data

Description

Prints table for a fitted object. It prints by default a latex table but can also be converted into a HTML table that should be more compatible with common word processors. For details run vignette("printCrudeAndAdjustedModel")

Usage

printCrudeAndAdjustedModel(
  model,
  order,
  digits = 2,
  ci_lim = c(-Inf, Inf),
  sprintf_ci_str = getOption("sprintf_ci_str", "%s to %s"),
  add_references,
  add_references_pos,
  reference_zero_effect,
  groups,
  rowname.fn,
  use_labels = TRUE,
  desc_column = FALSE,
  desc_args = caDescribeOpts(digits = digits),
  impute_args,
  ...
)

## S3 method for class 'printCrudeAndAdjusted'
rbind(..., alt.names, deparse.level = 1)

## S3 method for class 'printCrudeAndAdjusted'
print(x, ...)

## S3 method for class 'printCrudeAndAdjusted'
htmlTable(x, ...)

## S3 method for class 'printCrudeAndAdjusted'
x[i, j, ...]

## S3 method for class 'printCrudeAndAdjusted'
cbind(..., alt.names, deparse.level = 1)

## S3 method for class 'printCrudeAndAdjusted'
knit_print(x, ...)

## S3 method for class 'printCrudeAndAdjusted'
latex(object, ...)

Arguments

model

A regression model fit, i.e. the returned object from your regression function, or the output from getCrudeAndAdjustedModelData()

order

A vector with regular expressions for each group, use if youe want to reorder the groups in another way than what you've used in your original function. You can also use this in order to skip certain variables from the output.

digits

The number of digits to round to

ci_lim

A limit vector number that specifies if any values should be abbreviated above or below this value, for instance a value of 1000 would give a value of > -1000 for a value of 1001. This gives a prettier table when you have very wide confidence intervals.

sprintf_ci_str

A string according to sprintf() to write the confidence interval where the first %s is the lower and the second the upper. You can choose to set this through setting the option sprintf_ci_str, e.g. options(sprintf_ci_str = "%s - %s").

add_references

add_references_pos

reference_zero_effect

Used with references, tells if zero effect is in exponential form, i.e. exp(0) = 1, or in regular format, i.e. 0 = 0 (can be set to any value)

groups

If you wish to have other than the default rgroup names for the grouping parameter

rowname.fn

A function that takes a row name and sees if it needs beautifying. The function has only one parameter the coefficients name and should return a string or expression.

use_labels

If the rowname.fn function doesn't change the name then the label should be used instead of the name, that is if there is a label and it isn't a factor.

desc_column

Add descriptive column to the crude and adjusted table

desc_args

The description arguments that are to be used for the the description columns. The options/arguments should be generated by the caDescribeOpts function.

impute_args

A list with additional arguments if the provided input is a imputed object. Currently the list options coef_change and variance.inflation are supported. If you want both columns then the simplest way is to provide the list: list(coef_change = TRUE, variance.inflation = TRUE). The coef_change adds a column with the change in coefficients due to the imputation, the the "raw" model is subtracted from the imputed results. The "raw" model is the unimputed model, coef(imputed_model) - coef(raw_model). The variance.inflation adds the variance.inflation.impute from the fit.mult.impute() to a separate column. See the description for the variance.inflation.impute in in the fit.mult.impute() description. Both arguments can be customized by providing a list. The list can have the elements type, name, out_str, and/or digits. The type can for coef_change/variance.impute be either "percent" or "ratio", note that variance.inflation.impute was not originally intended to be interpreted as %. The default for coef_change is to have "diff", that gives the absolute difference in the coefficient. The name provides the column name, the out_str should be a string that is compatible with sprintf() and also contains an argument for accepting a float value, e.g. " column. The digits can be used if you are not using the out_str argument, it simply specifies the number of digits to show. See the example for how for a working example. Note that currently only the fit.mult.impute() is supported by this option.

...

outputs from printCrudeAndAdjusted. If mixed then it defaults to rbind.data.frame

alt.names

If you don't want to use named arguments for the tspanner attribute in the rbind or the cgroup in the cbind but a vector with names then use this argument.

deparse.level

backward compatibility

x

The output object from the printCrudeAndAdjustedModel function

object

The output object from the printCrudeAndAdjustedModel function

Value

matrix Returns a matrix of class printCrudeAndAdjusted that has a default print method associated with

Warning

If you call this function and you've changed any of the variables used in the original call, i.e. the premises are changed, this function will not remember the original values and the statistics will be faulty!

Examples

# simulated data to use
set.seed(10)
ds <- data.frame(
  ftime = rexp(200),
  fstatus = sample(0:1, 200, replace = TRUE),
  Variable1 = runif(200),
  Variable2 = runif(200),
  Variable3 = runif(200),
  Variable4 = factor(sample(LETTERS[1:4], size = 200, replace = TRUE))
)

library(rms)
dd <- datadist(ds)
options(datadist = "dd")

fit <- cph(Surv(ftime, fstatus) ~ Variable1 + Variable3 + Variable2 + Variable4,
  data = ds, x = TRUE, y = TRUE
)
printCrudeAndAdjustedModel(fit, order = c("Variable[12]", "Variable3"))
printCrudeAndAdjustedModel(fit,
  order = c("Variable3", "Variable4"),
  add_references = TRUE,
  desc_column = TRUE
)

# Now to a missing example
n <- 500
ds <- data.frame(
  x1 = factor(sample(LETTERS[1:4], size = n, replace = TRUE)),
  x2 = rnorm(n, mean = 3, 2),
  x3 = factor(sample(letters[1:3], size = n, replace = TRUE))
)

ds$Missing_var1 <- factor(sample(letters[1:4], size = n, replace = TRUE))
ds$Missing_var2 <- factor(sample(letters[1:4], size = n, replace = TRUE))
ds$y <- rnorm(nrow(ds)) +
  (as.numeric(ds$x1) - 1) * 1 +
  (as.numeric(ds$Missing_var1) - 1) * 1 +
  (as.numeric(ds$Missing_var2) - 1) * .5

# Create a messy missing variable
non_random_missing <- sample(which(ds$Missing_var1 %in% c("b", "d")),
  size = 150, replace = FALSE
)
# Restrict the non-random number on the x2 variables
non_random_missing <- non_random_missing[non_random_missing %in%
  which(ds$x2 > mean(ds$x2) * 1.5) &
  non_random_missing %in%
    which(ds$x2 > mean(ds$y))]
ds$Missing_var1[non_random_missing] <- NA

# Simple missing variable
ds$Missing_var2[sample(1:nrow(ds), size = 50)] <- NA

# Setup the rms environment
ddist <- datadist(ds)
options(datadist = "ddist")

impute_formula <-
  as.formula(paste(
    "~",
    paste(colnames(ds),
      collapse = "+"
    )
  ))

imp_ds <- aregImpute(impute_formula, data = ds, n.impute = 10)

fmult <- fit.mult.impute(y ~ x1 + x2 + x3 +
  Missing_var1 + Missing_var2,
fitter = ols, xtrans = imp_ds, data = ds
)

printCrudeAndAdjustedModel(fmult,
  impute_args = list(
    variance.inflation = TRUE,
    coef_change = list(
      type = "diff",
      digits = 3
    )
  )
)


# Use some labels and style to prettify the output
# fro the mtcars dataset
data("mtcars")

label(mtcars$mpg) <- "Gas"
units(mtcars$mpg) <- "Miles/(US) gallon"

label(mtcars$wt) <- "Weight"
units(mtcars$wt) <- "10^3 kg" # not sure the unit is correct

mtcars$am <- factor(mtcars$am, levels = 0:1, labels = c("Automatic", "Manual"))
label(mtcars$am) <- "Transmission"

mtcars$gear <- factor(mtcars$gear)
label(mtcars$gear) <- "Gears"

# Make up some data for making it slightly more interesting
mtcars$col <- factor(sample(c("red", "black", "silver"), size = NROW(mtcars), replace = TRUE))
label(mtcars$col) <- "Car color"

require(splines)
fit_mtcar <- lm(mpg ~ wt + gear + col, data = mtcars)
printCrudeAndAdjustedModel(fit_mtcar,
  add_references = TRUE,
  ctable = TRUE,
  desc_column = TRUE,
  digits = 1,
  desc_args = caDescribeOpts(
    digits = 1,
    colnames = c("Avg.")
  )) |>
  htmlTable::addHtmlTableStyle(css.rgroup = "",
                               css.header = "font-weight: normal")

printCrudeAndAdjustedModel(fit_mtcar,
  add_references = TRUE,
  desc_column = TRUE,
  order = c("Interc", "gear")
)

# Alterntive print - just an example, doesn't make sense to skip reference
printCrudeAndAdjustedModel(fit_mtcar,
  order = c("col", "gear"),
  groups = c("Color", "Gears"),
  add_references = c("black", NA),
  ctable = TRUE
)

# Now we can also combine models into one table using rbind()
mpg_model <- printCrudeAndAdjustedModel(lm(mpg ~ wt + gear + col, data = mtcars),
  add_references = TRUE,
  ctable = TRUE,
  desc_column = TRUE,
  digits = 1,
  desc_args = caDescribeOpts(
    digits = 1,
    colnames = c("Avg.")
  )
)

wt_model <- printCrudeAndAdjustedModel(lm(wt ~ mpg + gear + col, data = mtcars),
  add_references = TRUE,
  ctable = TRUE,
  desc_column = TRUE,
  digits = 1,
  desc_args = caDescribeOpts(
    digits = 1,
    colnames = c("Avg.")
  )
)

library(htmlTable)
rbind(Miles = mpg_model, Weight = wt_model) |>
  addHtmlTableStyle(pos.caption = "bottom") |>
  htmlTable(caption = paste("Combining models together with a table spanner element",
                            "separating each model"))

Robust covariance matrix based upon the 'sandwich'-package

Description

This is an alternative to the 'rms'-package robust covariance matrix that uses the 'sandwich' package vcovHC() function instead of the 'rms'-built-in estimator. The advantage being that many more estimation types are available.

Usage

robcov_alt(fit, type = "HC3", ...)

Arguments

fit

The ols fit that

type

a character string specifying the estimation type. See vcovHC() for options.

...

You should specify type= followed by some of the alternative available for the vcovHC() function.

Value

model The fitted model with adjusted variance and df.residual set to NULL

Examples

# Generate some data
n <- 500
x1 <- runif(n) * 2
x2 <- runif(n)
y <- x1^3 + x2 + rnorm(n)

library(rms)
library(sandwich)
dd <- datadist(x1, x2, y)
org.op <- options(datadist = "dd")

# Main function
f <- ols(y ~ rcs(x1, 3) + x2)

# Check the bread
bread(f)
# Check the HC-matrix
vcovHC(f, type = "HC4m")
# Adjust the model so that it uses the HC4m variance
f_rob <- robcov_alt(f, type = "HC4m")
# Get the new HC4m-matrix
# - this function just returns the f_rob$var matrix
vcov(f_rob)
# Now check the confidence interval for the function
confint(f_rob)

options(org.op)

A simpler latex output of the latex.anova.rms

Description

The original problem is that the anova default function output is very detailed and cause a complaint in Sweave/knitr that \hbox is overfull. It basically changes capitalized TOTAL, TOTAL INTERACTION and TOTAL NONLINEAR INTERACTION into lower case letters. It also deletes the (Factor + Higher Order Factors).

Usage

simpleRmsAnova(
  anova_output,
  subregexps,
  digits = 4,
  pval_lim.sig = 10^-4,
  rowlabel = "",
  ...
)

## S3 method for class 'simpleRmsAnova'
print(x, html = TRUE, ...)

Arguments

anova_output

An object from the anova() function

subregexps

A 2 column matrix with sub() regular expressions to search for and their substitutions. The regular expression should be located in column 1 and the substitution in column 2.

digits

Number of digits in using the round

pval_lim.sig

The threshold before setting "<", default is < 0.0001

rowlabel

The label of the rows

...

Passed on to latex() or htmlTable

x

The output object from the SimpleRmsAnova function

html

If HTML output through the htmlTable should be used instead of traditional latex() function

Value

void See the latex() function

Examples

# ** Borrowed code from the lrm example **

# Fit a logistic model containing predictors age, blood.pressure, sex
# and cholesterol, with age fitted with a smooth 5-knot restricted cubic
# spline function and a different shape of the age relationship for males
# and females.

library(rms)
n <- 1000 # define sample size
set.seed(17) # so can reproduce the results
age <- rnorm(n, 50, 10)
blood.pressure <- rnorm(n, 120, 15)
cholesterol <- rnorm(n, 200, 25)
sex <- factor(sample(c("female", "male"), n, TRUE))
label(age) <- "Age" # label is in Hmisc
label(cholesterol) <- "Total Cholesterol"
label(blood.pressure) <- "Systolic Blood Pressure"
label(sex) <- "Sex"
units(cholesterol) <- "mg/dl" # uses units.default in Hmisc
units(blood.pressure) <- "mmHg"

# To use prop. odds model, avoid using a huge number of intercepts by
# grouping cholesterol into 40-tiles

# Specify population model for log odds that Y = 1
L <- .4 * (sex == "male") + .045 * (age - 50) +
     (log(cholesterol - 10) - 5.2) * (-2 * (sex == "female") + 2 * (sex == "male"))
# Simulate binary y to have Prob(y = 1) = 1/[1+exp(-L)]
y <- ifelse(runif(n) < plogis(L), 1, 0)
cholesterol[1:3] <- NA # 3 missings, at random

ddist <- datadist(age, blood.pressure, cholesterol, sex)
options(datadist = "ddist")

fit_lrm <- lrm(y ~ blood.pressure + sex * (age + rcs(cholesterol, 4)),
     x = TRUE, y = TRUE
)

a_out <- anova(fit_lrm,
     dec.F = 1,
     ss = FALSE
)

simpleRmsAnova(a_out,
     subregexps = rbind(
          c("age", "Age"),
          c("cholesterol", "Cholesterol"),
          c("sex", "Sex")
     ),
     caption = "Anova output for a logistic regression model"
)

Tidy a(n) rms model object

Description

Tidy summarizes information about the components of a model. A model component might be a single term in a regressions. Exactly what tidy considers to be a model component varies across models but is usually self-evident. If a model has several distinct types of components, you will need to specify which components to return.

Usage

## S3 method for class 'rms'
tidy(
  x,
  conf.int = FALSE,
  conf.level = 0.95,
  exponentiate = FALSE,
  ...,
  .add_print_p_and_stat_values = getOption("Greg.tidy_add_p_and_stat_values", default =
    FALSE)
)

Arguments

x

An rms model, e.g. ['rms::cph()'], ['rms::lrm()']

conf.int

Logical indicating whether or not to include a confidence interval in the tidied output. Defaults to FALSE.

conf.level

The confidence level to use for the confidence interval if conf.int = TRUE. Must be strictly greater than 0 and less than 1. Defaults to 0.95, which corresponds to a 95 percent confidence interval.

exponentiate

Logical indicating whether or not to exponentiate the the coefficient estimates. This is typical for logistic and multinomial regressions, but a bad idea if there is no log or logit link. Defaults to FALSE.

...

Additional arguments. Not used. Needed to match generic signature only. Cautionary note: Misspelled arguments will be absorbed in ..., where they will be ignored. If the misspelled argument has a default value, the default value will be used. For example, if you pass conf.lvel = 0.9, all computation will proceed using conf.level = 0.95. Two exceptions here are:

tidy() methods will warn when supplied an exponentiate argument if it will be ignored.
augment() methods will warn when supplied a newdata argument if it will be ignored.

.add_print_p_and_stat_values

For estimating print values there is a workaround that relies on capturing output from the 'print(x)' and is not considered safe.

Details

This is a quick fix for addressing the lack of 'rms'-compatibility with the 'broom' package, see [broom issue 30](https://github.com/tidymodels/broom/issues/30).

Value

A tibble::tibble() with columns: - 'term' The name of the regression term. - 'factor' The factor if the term is a character/factor term. - 'column_term' The full name as in the original input data - 'estimate' The estimated value of the regression term. - 'conf.high' Upper bound on the confidence interval for the estimate.c - 'conf.low' Lower bound on the confidence interval for the estimate. - 'p.value' The two-sided p-value associated with the observed statistic. - 'statistic' The value of a statistic to use in a hypothesis that the regression term is non-zero. - 'std.error' The standard error of the regression term.

Examples

library(rms)
library(broom)
library(tidyverse)

set.seed(10)
cov <- tibble(x1 = runif(200)) |> 
  mutate(x_bool_fact = if_else(x1 > 0.5,
                               "Yes",
                               sample(c("Yes", "No"), size = n(), replace = TRUE)),
         x_multi_fact = sample(c("Strange", "Factor", "Names"), size = n(), replace = TRUE),
         ftime = rexp(n()),
         fstatus = sample(0:1, size = n(), replace = TRUE),
         x_good_predictor = fstatus * runif(n()))

ddist <- datadist(cov)
options(datadist = "ddist")

cph_fit <- cph(Surv(ftime, fstatus) ~ x1 + x_bool_fact + 
                 x_multi_fact + x_good_predictor, data = cov)
tidy(cph_fit)

A function for splitting a time according to time periods

Description

If we have a violation of the cox proprtional hazards assumption we need to split an individual's followup time into several. See vignette("timeSplitter", package = "Greg") for a detailed description.

Usage

timeSplitter(
  data,
  by,
  time_var,
  event_var,
  event_start_status,
  time_related_vars,
  time_offset
)

Arguments

data

The dataset that you want to split according to the time_var option.

by

The time period that you want to split the dataset by. The size of the variable must be in proportion to the the time_var. The by variable can also be a vector for each time split, useful if the effect has large varyations over time.

time_var

The name of the main time variable in the dataset. This variable must be a numeric variable.

event_var

The event variable

event_start_status

The start status of the event status, e.g. "Alive"

time_related_vars

A dataset often contains other variabels that you want to update during the split, most commonly these are age or calendar time.

time_offset

If you want to skip the initial years you can offset the entire dataset by setting this variable. See detailed description below.

Details

Important note: The time variables must have the same time unit. I.e. function can not dedu if all variables are in years or if one happens to be in days.

Value

data.frame with the split data. The starting time for each period is named Start_time and the ending time is called Stop_time. Note that the resulting event_var will now contain the time-splitted eventvar.

The time_offset - details

Both time_var and other variables will be adjusted by the time_offset, e.g. if we the time scale is in years and we want to skip the first 4 years we set the time_offset = 4. In the outputted dataset the smallest time_var will be 0. Note: 0 will not be included as we generally want to look at those that survived the start date, e.g. if a patient dies on the 4-year mark we would not include him/her in our study.

Examples

test_data <- data.frame(
  id = 1:4,
  time = c(4, 3.5, 1, 5),
  event = c("alive", "censored", "dead", "dead"),
  age = c(62.2, 55.3, 73.7, 46.3),
  date = as.Date(
    c("2003-01-01", 
      "2010-04-01", 
      "2013-09-20",
      "2002-02-23")),
  stringsAsFactors = TRUE
)
timeSplitter(test_data, .5, 
             time_var = "time",
             time_related_vars = c("age", "date"),
             event_var = "event")

Regression Helper Functions

Description

Output functions

Time splitter

Other regression functions

Important notice

Author(s)

See Also

Add a nonlinear function to the model

Description

Usage

Arguments

Examples

Getting the bread for the vcovHC

Description

Usage

Arguments

Value

Examples

A function for gathering all the description options

Description

Usage

Arguments

Value

A confint function for the ols

Description

Usage

Arguments

Value

Examples

The confint function adapted for vcovHC

Description

Usage

Arguments

Value

References

Examples

Fix for the Extract Empirical Estimating Functions

Description

Usage

Arguments

Details

Value

Examples

Compares different scores in different regression objects.

Description

Usage

Arguments

See Also

Examples

Forest plot for multiple models

Description

Usage

Arguments

See Also

Examples

This function helps with printing regression models

Description

Usage

Arguments

Details

Value

See Also

Examples

Get model data

Description

Usage

Arguments

Examples

Get the hat matrix for the OLS

Description

Usage

Arguments

Value

Examples

Functions for checking regression type

Description

Usage

Arguments

Value

Getting the bread for the `vcovHC`

A `confint` function for the `ols`

A fix for the `model.matrix`