Type: | Package |
Title: | Matching on Generalized Propensity Scores with Continuous Exposures |
Version: | 0.5.0 |
Maintainer: | Naeem Khoshnevis <nkhoshnevis@g.harvard.edu> |
Description: | Provides a framework for estimating causal effects of a continuous exposure using observational data, and implementing matching and weighting on the generalized propensity score. Wu, X., Mealli, F., Kioumourtzoglou, M.A., Dominici, F. and Braun, D., 2022. Matching on generalized propensity scores with continuous exposures. Journal of the American Statistical Association, pp.1-29. |
License: | GPL-3 |
Language: | en-US |
URL: | https://github.com/NSAPH-Software/CausalGPS |
BugReports: | https://github.com/NSAPH-Software/CausalGPS/issues |
Copyright: | Harvard University |
Imports: | parallel, data.table, SuperLearner, xgboost, gam, MASS, polycor, wCorr, stats, ggplot2, rlang, logger, Rcpp, gnm, locpol, Ecume, KernSmooth, cowplot |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Suggests: | covr, knitr, rmarkdown, ranger, earth, testthat, gridExtra |
VignetteBuilder: | knitr |
Depends: | R (≥ 3.5.0) |
LinkingTo: | Rcpp |
NeedsCompilation: | yes |
Packaged: | 2024-06-19 18:12:02 UTC; rstudio |
Author: | Naeem Khoshnevis |
Repository: | CRAN |
Date/Publication: | 2024-06-19 18:30:02 UTC |
The 'CausalGPS' package.
Description
An R package for implementing matching and weighting on generalized propensity scores with continuous exposures.
Details
We developed an innovative approach for estimating causal effects using observational data in settings with continuous exposures, and introduce a new framework for GPS caliper matching.
Author(s)
Naeem Khoshnevis
Xiao Wu
Danielle Braun
References
Wu, X., Mealli, F., Kioumourtzoglou, M.A., Dominici, F. and Braun, D., 2022. Matching on generalized propensity scores with continuous exposures. Journal of the American Statistical Association, pp.1-29.
Kennedy, E.H., Ma, Z., McHugh, M.D. and Small, D.S., 2017. Non-parametric methods for doubly robust estimation of continuous treatment effects. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 79(4), pp.1229-1245.
Check covariate balance using absolute approach
Description
Checks covariate balance based on absolute correlations for given data sets.
Usage
absolute_corr_fun(w, c)
Arguments
w |
A vector of observed continuous exposure variable. |
c |
A data.frame of observed covariates variable. |
Value
The function returns a list including:
-
absolute_corr
: the absolute correlations for each pre-exposure covariates; -
mean_absolute_corr
: the average absolute correlations for all pre-exposure covariates.
Examples
set.seed(291)
n <- 100
mydata <- generate_syn_data(sample_size=100)
year <- sample(x=c("2001","2002","2003","2004","2005"),size = n,
replace = TRUE)
region <- sample(x=c("North", "South", "East", "West"),size = n,
replace = TRUE)
mydata$year <- as.factor(year)
mydata$region <- as.factor(region)
mydata$cf5 <- as.factor(mydata$cf5)
cor_val <- absolute_corr_fun(mydata[,2], mydata[, 3:length(mydata)])
print(cor_val$mean_absolute_corr)
Check Weighted Covariate Balance Using Absolute Approach
Description
Checks covariate balance based on absolute weighted correlations for given data sets.
Usage
absolute_weighted_corr_fun(w, vw, c)
Arguments
w |
A vector of observed continuous exposure variable. |
vw |
A vector of weights. |
c |
A data.table of observed covariates variable. |
Value
The function returns a list saved the measure related to covariate balance
absolute_corr
: the absolute correlations for each pre-exposure
covairates;
mean_absolute_corr
: the average absolute correlations for all
pre-exposure covairates.
Examples
set.seed(639)
n <- 100
mydata <- generate_syn_data(sample_size=100)
year <- sample(x=c("2001","2002","2003","2004","2005"),size = n,
replace = TRUE)
region <- sample(x=c("North", "South", "East", "West"),size = n,
replace = TRUE)
mydata$year <- as.factor(year)
mydata$region <- as.factor(region)
mydata$cf5 <- as.factor(mydata$cf5)
cor_val <- absolute_weighted_corr_fun(mydata[,2],
runif(n),
mydata[, 3:length(mydata)])
print(cor_val$mean_absolute_corr)
A helper function for cgps_cw object
Description
A helper function to plot cgps_cw object using ggplot2 package.
Usage
## S3 method for class 'cgps_cw'
autoplot(object, ...)
Arguments
object |
A cgps_cw object. |
... |
Additional arguments passed to customize the plot. |
Value
Returns a ggplot object.
A helper function for cgps_erf object
Description
A helper function to plot cgps_erf object using ggplot2 package.
Usage
## S3 method for class 'cgps_erf'
autoplot(object, ...)
Arguments
object |
A cgps_erf object. |
... |
Additional arguments passed to customize the plot. |
Value
Returns a ggplot object.
A helper function for cgps_gps object
Description
A helper function to plot cgps_gps object using ggplot2 package.
Usage
## S3 method for class 'cgps_gps'
autoplot(object, ...)
Arguments
object |
A cgps_gps object. |
... |
Additional arguments passed to customize the plot. |
Value
Returns a ggplot object.
A helper function for cgps_pspop object
Description
A helper function to plot cgps_pspop object using ggplot2 package.
Usage
## S3 method for class 'cgps_pspop'
autoplot(object, ...)
Arguments
object |
A cgps_pspop object. |
... |
Additional arguments passed to customize the plot. |
Value
Returns a ggplot object.
Check covariate balance
Description
Checks the covariate balance of original population or pseudo population.
Usage
check_covar_balance(
w,
c,
ci_appr,
counter_weight = NULL,
covar_bl_method = "absolute",
covar_bl_trs = 0.1,
covar_bl_trs_type = "mean"
)
Arguments
w |
A vector of observed continuous exposure variable. |
c |
A data.frame of observed covariates variable. |
ci_appr |
The causal inference approach. |
counter_weight |
A weight vector in different situations. If the matching approach is selected, it is an integer data.table of counters. In the case of the weighting approach, it is weight data.table. |
covar_bl_method |
Covariate balance method. Available options: - 'absolute' |
covar_bl_trs |
Covariate balance threshold. |
covar_bl_trs_type |
Covariate balance type (mean, median, maximal). |
Value
output object:
corr_results
absolute_corr
mean_absolute_corr
pass (TRUE,FALSE)
Examples
set.seed(422)
n <- 100
mydata <- generate_syn_data(sample_size=n)
year <- sample(x=c("2001","2002","2003","2004","2005"),size = n,
replace = TRUE)
region <- sample(x=c("North", "South", "East", "West"),size = n,
replace = TRUE)
mydata$year <- as.factor(year)
mydata$region <- as.factor(region)
mydata$cf5 <- as.factor(mydata$cf5)
m_xgboost <- function(nthread = 1,
ntrees = 35,
shrinkage = 0.3,
max_depth = 5,
...) {SuperLearner::SL.xgboost(
nthread = nthread,
ntrees = ntrees,
shrinkage=shrinkage,
max_depth=max_depth,
...)}
data_with_gps <- estimate_gps(.data = mydata,
.formula = w ~ cf1 + cf2 + cf3 + cf4 + cf5 +
cf6 + year + region,
sl_lib = c("m_xgboost"),
gps_density = "kernel")
cw_object_matching <- compute_counter_weight(gps_obj = data_with_gps,
ci_appr = "matching",
bin_seq = NULL,
nthread = 1,
delta_n = 0.1,
dist_measure = "l1",
scale = 0.5)
pseudo_pop <- generate_pseudo_pop(.data = mydata,
cw_obj = cw_object_matching,
covariate_col_names = c("cf1", "cf2", "cf3",
"cf4", "cf5", "cf6",
"year", "region"),
covar_bl_trs = 0.1,
covar_bl_trs_type = "maximal",
covar_bl_method = "absolute")
adjusted_corr_obj <- check_covar_balance(w = pseudo_pop$.data[, c("w")],
c = pseudo_pop$.data[ ,
pseudo_pop$params$covariate_col_names],
counter = pseudo_pop$.data[,
c("counter_weight")],
ci_appr = "matching",
covar_bl_method = "absolute",
covar_bl_trs = 0.1,
covar_bl_trs_type = "mean")
Check Kolmogorov-Smirnov (KS) statistics
Description
Checks the Kolmogorov-Smirnov (KS) statistics for exposure and confounders in the pseudo-population
Usage
check_kolmogorov_smirnov(w, c, ci_appr, counter_weight = NULL)
Arguments
w |
A vector of observed continuous exposure variable. |
c |
A data.frame of observed covariates variable. |
ci_appr |
The causal inference approach. |
counter_weight |
A weight vector in different situations. If the matching approach is selected, it is an integer data.table of counters. In the case of the weighting approach, it is weight data.table. |
Value
output object is list including:
ks_stat
maximal_val
mean_val
median_val
Compile pseudo population
Description
Compiles pseudo population based on the original population and estimated GPS value.
Usage
compile_pseudo_pop(
data_obj,
ci_appr,
gps_density,
exposure_col_name,
nthread,
...
)
Arguments
data_obj |
A S3 object including the following:
|
ci_appr |
Causal inference approach. |
gps_density |
Model type which is used for estimating GPS value,
including |
exposure_col_name |
Exposure data column name. |
nthread |
An integer value that represents the number of threads to be used by internal packages. |
... |
Additional parameters. |
Details
For matching approach, use an extra parameter, bin_seq
, which is sequence
of w (treatment) to generate pseudo population. If NULL
is passed the
default value will be used, which is
seq(min(w)+delta_n/2,max(w), by=delta_n)
.
Value
compile_pseudo_pop
returns the pseudo population data that is compiled based
on the selected causal inference approach.
Examples
set.seed(112)
m_d <- generate_syn_data(sample_size = 100)
m_xgboost <- function(nthread = 1,
ntrees = 35,
shrinkage = 0.3,
max_depth = 5,
...) {SuperLearner::SL.xgboost(
nthread = nthread,
ntrees = ntrees,
shrinkage=shrinkage,
max_depth=max_depth,
...)}
data_with_gps <- estimate_gps(.data = m_d,
.formula = w ~ cf1 + cf2 + cf3 +
cf4 + cf5 + cf6,
gps_density = "normal",
sl_lib = c("m_xgboost")
)
pd <- compile_pseudo_pop(data_obj = data_with_gps,
ci_appr = "matching",
gps_density = "normal",
bin_seq = NULL,
exposure_col_name = c("w"),
nthread = 1,
dist_measure = "l1",
covar_bl_method = 'absolute',
covar_bl_trs = 0.1,
covar_bl_trs_type= "mean",
delta_n = 0.5,
scale = 1)
Find the closest data in subset to the original data
Description
A function to compute the closest data in subset of data to the original data based on two attributes: vector and scalar (vector of size one).
Usage
compute_closest_wgps(a, b, c, d, sc, nthread)
Arguments
a |
Vector of the first attribute values for subset of data. |
b |
Vector of the first attribute values for all data. |
c |
Vector of the second attribute values for subset of data. |
d |
Vector of size one for the second attribute value. |
sc |
Scale parameter to give weight for two mentioned measurements. |
nthread |
Number of available cores. |
Value
The function returns index of subset data that is closest to the original data sample.
Compute counter or weight of data samples
Description
Computes counter (for matching approach) or weight (for weighting) approach.
Usage
compute_counter_weight(gps_obj, ci_appr, nthread = 1, ...)
Arguments
gps_obj |
A gps object that is generated with |
ci_appr |
The causal inference approach. Possible values are:
|
nthread |
An integer value that represents the number of threads to be used by internal packages. |
... |
Additional arguments passed to different models. |
Details
Additional parameters
Causal Inference Approach (ci_appr)
if ci_appr = 'matching':
-
bin_seq: A sequence of w (treatment) to generate pseudo population. If
NULL
is passed the default value will be used, which isseq(min(w)+delta_n/2,max(w), by=delta_n)
. -
dist_measure: Matching function. Available options:
l1: Manhattan distance matching
-
delta_n: caliper parameter.
-
scale: a specified scale parameter to control the relative weight that is attributed to the distance measures of the exposure versus the GPS.
-
Value
Returns a counter_weight (cgps_cw) object that includes .data
and params
attributes.
-
.data
: includesid
andcounter_weight
columns. In case ofmatching
thecounter_weight
column is integer values, which represent how many times the provided observational data was mached during the matching process. In case ofweighting
the column is double values. -
params
: Include related parameters that is used for the process.
Examples
m_d <- generate_syn_data(sample_size = 100)
gps_obj <- estimate_gps(.data = m_d,
.formula = w ~ cf1 + cf2 + cf3 + cf4 + cf5 + cf6,
gps_density = "normal",
sl_lib = c("SL.xgboost"))
cw_object <- compute_counter_weight(gps_obj = gps_obj,
ci_appr = "matching",
bin_seq = NULL,
nthread = 1,
delta_n = 0.1,
dist_measure = "l1",
scale = 0.5)
Approximate density based on another vector
Description
A function to impute missing values based on density estimation of another vector or itself after removing the missing values.
Usage
compute_density(x0, x1)
Arguments
x0 |
vector |
x1 |
vector |
Value
Returns approximation of density value of vector x1 based on vector x0.
Compute minimum and maximum
Description
Function to compute minimum and maximum of the input vector
Usage
compute_min_max(x)
Arguments
x |
vector |
Value
Returns a vector of length 2. The first element is min value, and the second element is max value.
Computes distance on all possible combinations
Description
Computes the distance between all combination of elements in two vector. a is vector of size n, and b is a vector of size m, the result, will be a matrix of size(n,m)
Usage
compute_outer(a, b, op)
Arguments
a |
first vector (size n) |
b |
second vector (size m) |
op |
operator (e.g., '-', '+', '/', ...) |
Value
A n by m matrix that includes abs difference between elements of vector a and b.
Compute residual
Description
Function to compute residual
Usage
compute_resid(a, b, c)
Arguments
a |
A vector |
b |
A vector |
c |
A vector |
Value
returns a residual values.
Compute risk value
Description
Calculates the cross-validated risk for the optimal bandwidth selection in kernel smoothing approach.
Usage
compute_risk(h, matched_Y, matched_w, matched_cw, x_eval, w_vals, kernel_appr)
Arguments
h |
A scalar representing the bandwidth value. |
matched_Y |
A vector of outcome variable in the matched set. |
matched_w |
A vector of continuous exposure variable in the matched set. |
matched_cw |
A vector of counter or weight variable in the matched set. |
w_vals |
A vector of values that you want to calculate the values of the ERF at. |
kernel_appr |
Internal kernel approach. Available options are |
Value
returns a cross-validated risk value for the input bandwidth
Create pseudo population using matching casual inference approach
Description
Generates pseudo population based on matching casual inference method.
Usage
create_matching(
.data,
exposure_col_name,
matching_fn,
dist_measure = dist_measure,
gps_density = gps_density,
delta_n = delta_n,
scale = scale,
bin_seq = NULL,
nthread = 1
)
Arguments
.data |
TBD |
gps_density |
Model type which is used for estimating GPS value, including
|
bin_seq |
Sequence of w (treatment) to generate pseudo population. If
NULL is passed the default value will be used, which is
|
nthread |
Number of available cores. |
Value
Returns data.table of matched set.
Create pseudo population using weighting casual inference approach
Description
Generates pseudo population based on weighting casual inference method.
Usage
create_weighting(dataset, exposure_col_name)
Arguments
dataset |
A gps object data. |
exposure_col_name |
The exposure column name. |
Value
Returns a data table which includes the following columns:
Y
w
gps
counter
row_index
ipw
covariates
Estimate Exposure Response Function
Description
Estimates the exposure-response function (ERF) for a matched and weighted dataset using parametric, semiparametric, and nonparametric models.
Usage
estimate_erf(.data, .formula, weights_col_name, model_type, w_vals, ...)
Arguments
.data |
A data frame containing an observed continuous exposure variable, weights,
and an observed outcome variable. Includes an |
.formula |
A formula specifying the relationship between the exposure variable and the outcome variable. For example, Y ~ w. |
weights_col_name |
A string representing the weight or counter column
name in |
model_type |
A string representing the model type based on preliminary
assumptions, including |
w_vals |
A numeric vector of values at which you want to calculate the ERF. |
... |
Additional arguments passed to the model. |
Value
Returns an S3 object containing the following data and parameters:
.data_original <- result_data_original
.data_prediction <- result_data_prediction
params
Estimate generalized propensity score (GPS) values
Description
Estimates GPS value for each observation using normal or kernel approaches.
Usage
estimate_gps(
.data,
.formula,
gps_density = "normal",
sl_lib = c("SL.xgboost"),
...
)
Arguments
.data |
A data frame of observed continuous exposure variable and
observed covariates variable. Also includes |
.formula |
A formula specifying the relationship between the exposure variable and the covariates. For example, w ~ I(cf1^2) + cf2. |
gps_density |
Model type which is used for estimating GPS value,
including |
sl_lib |
A vector of prediction algorithms to be used by the SuperLearner packageg. |
... |
Additional arguments passed to the model. |
Value
The function returns a S3 object. Including the following:
-
.data
:id
,exposure_var
,gps
,e_gps_pred
,e_gps_std_pred
,w_resid
-
params
: Including the following fields:gps_mx (min and max of gps)
w_mx (min and max of w).
.formula
gps_density
sl_lib
fcall (function call)
Examples
m_d <- generate_syn_data(sample_size = 100)
data_with_gps <- estimate_gps(.data = m_d,
.formula = w ~ cf1 + cf2 + cf3 + cf4 + cf5 + cf6,
gps_density = "normal",
sl_lib = c("SL.xgboost")
)
Estimate hat (fitted) values
Description
Estimates the fitted values based on bandwidth value
Usage
estimate_hat_vals(bw, matched_w, w_vals)
Arguments
bw |
The bandwidth value. |
matched_w |
A vector of continuous exposure variable in the matched set. |
w_vals |
A vector of values that you want to calculate the values of the ERF at. |
Value
Returns fitted values, or the prediction made by the model for each observation.
Estimate smoothed exposure-response function (ERF) for pseudo population
Description
Estimate smoothed exposure-response function (ERF) for matched and weighted data set using non-parametric models.
Usage
estimate_npmetric_erf(
m_Y,
m_w,
counter_weight,
bw_seq,
w_vals,
nthread,
kernel_appr = "locpol"
)
Arguments
m_Y |
A vector of outcome variable in the matched set. |
m_w |
A vector of continuous exposure variable in the matched set. |
counter_weight |
A vector of counter or weight variable in the matched set. |
bw_seq |
A vector of bandwidth values. |
w_vals |
A vector of values that you want to calculate the values of the ERF at. |
nthread |
The number of available cores. |
kernel_appr |
Internal kernel approach. Available options are |
Details
Estimate Functions Using Local Polynomial kernel regression.
Value
The function returns a gpsm_erf object. The object includes the following attributes:
params
m_Y
m_w
bw_seq
w_vals
erf
fcall
Estimate Parametric Exposure Response Function
Description
Estimate a constant effect size for matched and weighted data set using parametric models
Usage
estimate_pmetric_erf(formula, family, data, ...)
Arguments
formula |
a vector of outcome variable in matched set. |
family |
a description of the error distribution (see ?gnm) |
data |
dataset that formula is build upon (Note that there should be a
|
... |
Additional parameters for further fine tuning the gnm model. |
Details
This method uses generalized nonlinear model (gnm) from gnm package.
Value
returns an object of class gnm
Estimate semi-exposure-response function (semi-ERF).
Description
Estimates the smoothed exposure-response function using a generalized additive model with splines.
Usage
estimate_semipmetric_erf(formula, family, data, ...)
Arguments
formula |
a vector of outcome variable in matched set. |
family |
a description of the error distribution (see ?gam). |
data |
dataset that formula is build upon Note that there should be a
|
... |
Additional parameters for further fine tuning the gam model. |
Details
This approach uses Generalized Additive Model (gam) using mgcv package.
Value
returns an object of class gam
Generate kernel function
Description
Generates a kernel function
Usage
generate_kernel(t)
Arguments
t |
A standardized vector (z-score) |
Value
probability distribution
Generate pseudo population
Description
Generates pseudo population data set based on user-defined causal inference approach. The function uses an adaptive approach to satisfies covariate balance requirements. The function terminates either by satisfying covariate balance or completing the requested number of iteration, whichever comes first.
Usage
generate_pseudo_pop(
.data,
cw_obj,
covariate_col_names,
covar_bl_trs = 0.1,
covar_bl_trs_type = "maximal",
covar_bl_method = "absolute"
)
Arguments
.data |
A data.frame of observation data with |
cw_obj |
An S3 object of counter_weight. |
covariate_col_names |
A list of covariate columns. |
covar_bl_trs |
Covariate balance threshold |
covar_bl_trs_type |
Type of the covariance balance threshold. |
covar_bl_method |
Covariate balance method. |
Value
Returns a pseudo population (gpsm_pspop) object that is generated or augmented based on the selected causal inference approach (ci_appr). The object includes the following objects:
params
ci_appr
params
pseudo_pop
adjusted_corr_results
original_corr_results
best_gps_used_params
effect size of generated pseudo population
Examples
set.seed(967)
m_d <- generate_syn_data(sample_size = 200)
m_d$id <- seq_along(1:nrow(m_d))
m_xgboost <- function(nthread = 4,
ntrees = 35,
shrinkage = 0.3,
max_depth = 5,
...) {SuperLearner::SL.xgboost(
nthread = nthread,
ntrees = ntrees,
shrinkage=shrinkage,
max_depth=max_depth,
...)}
data_with_gps_1 <- estimate_gps(
.data = m_d,
.formula = w ~ I(cf1^2) + cf2 + I(cf3^2) + cf4 + cf5 + cf6,
sl_lib = c("m_xgboost"),
gps_density = "normal")
cw_object_matching <- compute_counter_weight(gps_obj = data_with_gps_1,
ci_appr = "matching",
bin_seq = NULL,
nthread = 1,
delta_n = 0.1,
dist_measure = "l1",
scale = 0.5)
pseudo_pop <- generate_pseudo_pop(.data = m_d,
cw_obj = cw_object_matching,
covariate_col_names = c("cf1", "cf2",
"cf3", "cf4",
"cf5", "cf6"),
covar_bl_trs = 0.1,
covar_bl_trs_type = "maximal",
covar_bl_method = "absolute")
Generate synthetic data for the CausalGPS package
Description
Generates synthetic data set based on different GPS models and covariates.
Usage
generate_syn_data(
sample_size = 1000,
outcome_sd = 10,
gps_spec = 1,
cova_spec = 1,
vectorized_y = FALSE
)
Arguments
sample_size |
A positive integer number that represents a number of data samples. |
outcome_sd |
A positive double number that represents standard deviation used to generate the outcome in the synthetic data set. |
gps_spec |
A numerical integer values ranging from 1 to 7. The
complexity and form of the relationship between covariates and treatment
variables are determined by the
|
cova_spec |
A numerical value (1 or 2) to modify the covariates. It
determines how the covariates in the synthetic data set are transformed.
If |
vectorized_y |
A Boolean value indicates how Y internally is generated.
(Default = |
Value
synthetic_data
: The function returns a data.frame saved the
constructed synthetic data.
Examples
set.seed(298)
s_data <- generate_syn_data(sample_size = 100,
outcome_sd = 10,
gps_spec = 1,
cova_spec = 1)
Get Logger Settings
Description
Returns current logger settings.
Usage
get_logger()
Value
Returns a list that includes logger_file_path and logger_level.
Examples
set_logger("mylogger.log", "INFO")
log_meta <- get_logger()
Log system information
Description
Logs system related information into the log file.
Usage
log_system_info()
Value
No return value. This function is called for side effects.
Match observations
Description
Matching function using L1 distance on single exposure level w
Usage
matching_fn(
w,
dataset,
exposure_col_name,
e_gps_pred,
e_gps_std_pred,
w_resid,
gps_mx,
w_mx,
dist_measure = "l1",
gps_density = "normal",
delta_n = 1,
scale = 0.5,
nthread = 1
)
Arguments
w |
the targeted single exposure levels. |
dataset |
a completed observational data frame or matrix containing (Y, w, gps, counter, row_index, c). |
e_gps_pred |
a vector of predicted gps values obtained by Machine learning methods. |
e_gps_std_pred |
a vector of predicted std of gps obtained by Machine learning methods. |
w_resid |
the standardized residuals for w. |
gps_mx |
a vector with length 2, includes min(gps), max(gps) |
w_mx |
a vector with length 2, includes min(w), max(w). |
gps_density |
Model type which is used for estimating GPS value, including
|
delta_n |
a specified caliper parameter on the exposure (Default is 1). |
scale |
a specified scale parameter to control the relative weight that is attributed to the distance measures of the exposure versus the GPS estimates (Default is 0.5). |
nthread |
Number of available cores. |
Value
dp
: The function returns a data.table saved the matched points on
by single exposure
level w by the proposed GPS matching approaches.
Extend generic plot functions for cgps_cw class
Description
A wrapper function to extend generic plot functions for cgps_cw class.
Usage
## S3 method for class 'cgps_cw'
plot(x, ...)
Arguments
x |
A cgps_cw object. |
... |
Additional arguments passed to customize the plot. |
Details
Additional parameters:
-
every_n: Puts label to ID at every n interval (default = 10)
-
subset_id: A vector of range of ids to be included in the plot (default = NULL)
Value
Returns a ggplot2 object, invisibly. This function is called for side effects.
Extend generic plot functions for cgps_cw class
Description
A wrapper function to extend generic plot functions for cgps_cw class.
Usage
## S3 method for class 'cgps_erf'
plot(x, ...)
Arguments
x |
A cgps_erf object. |
... |
Additional arguments passed to customize the plot. |
Details
TBD
Value
Returns a ggplot2 object, invisibly. This function is called for side effects.
Extend generic plot functions for cgps_gps class
Description
A wrapper function to extend generic plot functions for cgps_gps class.
Usage
## S3 method for class 'cgps_gps'
plot(x, ...)
Arguments
x |
A cgps_gps object. |
... |
Additional arguments passed to customize the plot. |
Value
Returns a ggplot2 object, invisibly. This function is called for side effects.
Extend generic plot functions for cgps_pspop class
Description
A wrapper function to extend generic plot functions for cgps_pspop class.
Usage
## S3 method for class 'cgps_pspop'
plot(x, ...)
Arguments
x |
A cgps_pspop object. |
... |
Additional arguments passed to customize the plot. |
Details
Additional parameters
-
include_details: If set to TRUE, the plot will include run details (Default = FALSE).
Value
Returns a ggplot2 object, invisibly. This function is called for side effects.
Extend print function for cgps_cw object
Description
Extend print function for cgps_cw object
Usage
## S3 method for class 'cgps_cw'
print(x, ...)
Arguments
x |
A cgps_cw object. |
... |
Additional arguments passed to customize the results. |
Value
No return value. This function is called for side effects.
Extend print function for cgps_erf object
Description
Extend print function for cgps_erf object
Usage
## S3 method for class 'cgps_erf'
print(x, ...)
Arguments
x |
A cgps_erf object. |
... |
Additional arguments passed to customize the results. |
Value
No return value. This function is called for side effects.
Extend print function for cgps_gps object
Description
Extend print function for cgps_gps object
Usage
## S3 method for class 'cgps_gps'
print(x, ...)
Arguments
x |
A cgps_gps object. |
... |
Additional arguments passed to customize the results. |
Value
No return value. This function is called for side effects.
Extend print function for cgps_pspop object
Description
Extend print function for cgps_pspop object
Usage
## S3 method for class 'cgps_pspop'
print(x, ...)
Arguments
x |
A cgps_pspop object. |
... |
Additional arguments passed to customize the results. |
Value
No return value. This function is called for side effects.
Set Logger Settings
Description
Updates logger settings, including log level and location of the file.
Usage
set_logger(logger_file_path = "CausalGPS.log", logger_level = "INFO")
Arguments
logger_file_path |
A path (including file name) to log the messages. (Default: CausalGPS.log) |
logger_level |
The log level. Available levels include:
|
Value
No return value. This function is called for side effects.
Examples
set_logger("Debug")
Smooth exposure response function
Description
Smooths exposure response function based on bandwidth
Usage
smooth_erf(matched_Y, bw, matched_w, matched_cw, x_eval, kernel_appr)
Arguments
matched_Y |
A vector of the outcome variable in the matched set. |
bw |
The bandwidth value. |
matched_w |
A vector of continuous exposure variable in the matched set. |
matched_cw |
A vector of counter or weight variable in the matched set. |
kernel_appr |
Internal kernel approach. Available options are |
Value
Smoothed value of ERF
Compute smoothed erf with kernsmooth approach
Description
Compute smoothed erf with kernsmooth approach
Usage
smooth_erf_kernsmooth(matched_Y, matched_w, matched_cw, x_eval, bw)
Arguments
matched_Y |
A vector of outcome value. |
matched_w |
A vector of treatment value. |
matched_cw |
A vector of weight or count. |
bw |
A scaler number indicating the bandwidth. |
Value
A vector of smoothed ERF.
Compute smoothed erf with locpol approach
Description
Compute smoothed erf with locpol approach
Usage
smooth_erf_locpol(matched_Y, matched_w, matched_cw, x_eval, bw)
Arguments
matched_Y |
A vector of outcome value. |
matched_w |
A vector of treatment value. |
matched_cw |
A vector of weight or count. |
bw |
A scaler number indicating the bandwidth. |
Value
A vector of smoothed ERF.
print summary of cgps_cw object
Description
print summary of cgps_cw object
Usage
## S3 method for class 'cgps_cw'
summary(object, ...)
Arguments
object |
A cgps_cw object. |
... |
Additional arguments passed to customize the results. |
Value
Returns summary of data
print summary of cgps_erf object
Description
print summary of cgps_erf object
Usage
## S3 method for class 'cgps_erf'
summary(object, ...)
Arguments
object |
A cgps_erf object. |
... |
Additional arguments passed to customize the results. |
Value
Returns summary of data
print summary of cgps_gps object
Description
print summary of cgps_gps object
Usage
## S3 method for class 'cgps_gps'
summary(object, ...)
Arguments
object |
A cgps_gps object. |
... |
Additional arguments passed to customize the results. |
Value
Returns summary of data
print summary of cgps_pspop object
Description
print summary of cgps_pspop object
Usage
## S3 method for class 'cgps_pspop'
summary(object, ...)
Arguments
object |
A cgps_pspop object. |
... |
Additional arguments passed to customize the results. |
Value
Returns summary of data
Public data set for air pollution and health studies, case study: 2010 county-Level data set for the contiguous United States
Description
A dataset containing exposure, confounders, and outcome for causal inference studies. The dataset is hosted on Harvard dataverse doi:10.7910/DVN/L7YF2G. This dataset was produced from five different resources. Please see https://github.com/NSAPH-Projects/synthetic_data/ for the data processing pipelines. In the following
Exposure Data
The exposure parameter is PM2.5. Di et al. (2019) provided daily, and annual PM2.5 estimates at 1 km×1 km grid cells in the entire United States. The data can be downloaded from Di et al. (2021). Features in this category starts with qd_ prefix.
Census Data
The main reference for getting the census data is the United States Census
Bureau. There are numerous studies and surveys for different geographical
resolutions. We use 2010 county level American County Survey at the county
level (acs5
). Features in this category starts with cs_ prefix.
CDC Data
The Centers for Disease Control and Prevention (CDC), provides the Behavioral Risk Factor Surveillance System (Centers for Disease Control and Prevention (2021)), which is the nation’s premier system of health-related telephone surveys that collect state data about U.S. residents regarding their health-related risk behaviors.
GridMET Data
Climatology Lab at the University of California, Merced, provides the GridMET data (Abatzoglou (2013)). The data set is daily surface meteorological data covering the contiguous United States.
CMS Data
The Centers for Medicare and Medicaid Services(CMS) provides synthetic data at the county level for 2008-2010 (Centers for Medicare & Medicaid Services (2021)).
The definition of each variables are provided below. All data are collected for 2010 and aggregated into the county level and in the contiguous United States.
Usage
data(synthetic_us_2010)
Format
A data frame with 3109 rows and 46 variables:
- qd_mean_pm25
Mean PM2.5 (microgram/m3)
- cs_poverty
The proportion of below poverty level population among 65+ years old.
- cs_hispanic
The proportion of Hispanic or Latino population among 65+ years old.
- cs_black
The proportion of Black or African American population among 65+ years old.
- cs_white
The proportion of White population among 65 years and over.
- cs_native
The proportion of American Indian or Alaska native population among 65 years and over.
- cs_asian
The proportion of Asian population among 65 years and over.
- cs_other
The proportion of other races population among 65 years and over.
- cs_ed_below_highschool
The proportion of the population with below high school level education among 65 years and over.
- cs_household_income
Median Household income in the past 12 months (in 2010 inflation-adjusted dollars) where householder is 65 years and over.
- cs_median_house_value
Median house value (USD)
- cs_total_population
Total Population
- cs_area
Area of each county (square miles)
- cs_population_density
The number of the population in one square mile.
- cdc_mean_bmi
Body Mass Index.
- cdc_pct_cusmoker
The proportion of current smokers.
- cdc_pct_sdsmoker
The proportion of some days smokers.
- cdc_pct_fmsmoker
The proportion of former smokers.
- cdc_pct_nvsmoker
The proportion of never smokers.
- cdc_pct_nnsmoker
The proportion of not known smokers.
- gmet_mean_tmmn
Annual mean of daily minimum temperature (K)
- gmet_mean_summer_tmmn
The mean of daily minimum temperature during summer (K)
- gmet_mean_winter_tmmn
The mean of daily minimum temperature during winter (K)
- gmet_mean_tmmx
Annual mean of daily maximum temperature (K)
- gmet_mean_summer_tmmx
The mean of daily maximum temperature during summer (K)
- gmet_mean_winter_tmmx
The mean of daily maximum temperature during winter (K)
- gmet_mean_rmn
Annual mean of daily minimum relative humidity (%)
- gmet_mean_summer_rmn
The mean of daily minimum relative humidity during summer (%)
- gmet_mean_winter_rmn
The mean of daily minimum relative humidity during winter (%)
- gmet_mean_rmx
Annual mean of daily maximum relative humidity (%)
- gmet_mean_summer_rmx
The mean of daily maximum relative humidity during summer (%)
- gmet_mean_winter_rmx
The mean of daily maximum relative humidity during winter (%)
- gmet_mean_sph
Annual mean of daily mean specific humidity (kg/kg)
- gmet_mean_summer_sph
The mean of daily mean specific humidity during summer(kg/kg)
- gmet_mean_winter_sph
The mean of daily mean specific humidity during winter(kg/kg)
- cms_mortality_pct
The proportion of deceased patients.
- cms_white_pct
The proportion of White patients.
- cms_black_pct
The proportion of Black patients.
- cms_hispanic_pct
The proportion of Hispanic patients.
- cms_others_pct
The proportion of Other patients.
- cms_female_pct
The proportion of Female patients.
- region
The region that the county is located in.
NORTHEAST=("NY","MA","PA","RI","NH","ME","VT","CT","NJ") SOUTH=("DC","VA","NC","WV","KY","SC","GA","FL","AL","TN","MS","AR","MD","DE","OK","TX","LA") MIDWEST=c("OH","IN","MI","IA","MO","WI","MN","SD","ND","IL","KS","NE") WEST=c("MT","CO","WY","ID","UT","NV","CA","OR","WA","AZ","NM")
- FIPS
Federal Information Processing Standards, a unique ID for each county.
- NAME
County, State name.
- STATE
State abbreviation.
- STATE_CODE
State numerical code.
References
Abatzoglou, John T. 2013. “Development of Gridded Surface Meteorological Data for Ecological Applications and Modelling.” International Journal of Climatology 33 (1): 121–31. doi:10.1002/joc.3413.
Centers for Disease Control and Prevention. 2021. “Behavioral Risk Factor Surveillance System.” https://www.cdc.gov/brfss/annual_data/annual_2010.htm/.
Centers for Medicare & Medicaid Services. 2021. “CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF).” https://www.cms.gov/data-research/statistics-trends-and-reports/medicare-claims-synthetic-public-use-files/cms-2008-2010-data-entrepreneurs-synthetic-public-use-file-de-synpuf.
Di, Qian, Heresh Amini, Liuhua Shi, Itai Kloog, Rachel Silvern, James Kelly, M Benjamin Sabath, et al. 2019. “An Ensemble-Based Model of Pm2. 5 Concentration Across the Contiguous United States with High Spatiotemporal Resolution.” Environment International 130: 104909. doi:10.1016/j.envint.2019.104909.
Di, Qian, Yaguang Wei, Alexandra Shtein, Carolynne Hultquist, Xiaoshi Xing, Heresh Amini, Liuhua Shi, et al. 2021. “Daily and Annual Pm2.5 Concentrations for the Contiguous United States, 1-Km Grids, V1 (2000 - 2016).” NASA Socioeconomic Data; Applications Center (SEDAC). doi:10.7927/0rvr-4538.
Generate Prediction Model
Description
Function to develop prediction model based on user's preferences.
Usage
train_it(target, input, sl_lib_internal = NULL, ...)
Arguments
target |
A vector of target data. |
input |
A vector, matrix, or dataframe of input data. |
sl_lib_internal |
The internal library to be used by SuperLearner |
... |
Model related parameters should be provided. |
Value
prediction model
Trim a data frame or an S3 object
Description
Trims a data frame or an S3 object's .data
attributs.
Usage
trim_it(data_obj, trim_quantiles, variable)
Arguments
data_obj |
A data frame or an S3 object containing the data to be
trimmed. For a data frame, the function operates directly on it. For an S3
object, the function expects a |
trim_quantiles |
A numeric vector of length 2 specifying the lower and upper quantiles used for trimming the data. |
variable |
The name of the variable in the data on which the trimming is to be applied. |
Value
Returns a trimmed data frame or an S3 object with the $.data attribute trimmed, depending on the input type.
Examples
# Example usage with a data frame
df <- data.frame(id = 1:10, value = rnorm(100))
trimmed_df <- trim_it(df, c(0.1, 0.9), "value")
# Example usage with an S3 object
data_obj <- list()
class(data_obj) <- "myobject"
data_obj$.data <- df
trimmed_data_obj <- trim_it(data_obj, c(0.1, 0.9), "value")
Helper function
Description
Helper function
Usage
w_fun(bw, matched_w, w_vals)
Arguments
bw |
bandwidth value |
matched_w |
a vector of continuous exposure variable in matched set. |
w_vals |
a vector of values that you want to calculate the values of the ERF at. |
Value
return value (TODO)