Title: | Correcting Misclassified Binary Outcomes in Association Studies |
Version: | 1.2.0 |
Author: | Kimberly Hochstedler Webb [aut, cre] |
Maintainer: | Kimberly Hochstedler Webb <kah343@cornell.edu> |
Description: | Use frequentist and Bayesian methods to estimate parameters from a binary outcome misclassification model. These methods correct for the problem of "label switching" by assuming that the sum of outcome sensitivity and specificity is at least 1. A description of the analysis methods is available in Hochstedler and Wells (2023) <doi:10.48550/arXiv.2303.10215>. |
Depends: | R (≥ 4.2.0) |
Imports: | dplyr (≥ 1.0.10), tidyr (≥ 1.2.1), Matrix (> 1.4-1), rjags (≥ 4-13), turboEM (≥ 2021.1), SAMBA (≥ 0.9.0), utils (≥ 4.2.0) |
Suggests: | knitr (≥ 1.40), testthat (≥ 3.0.0), devtools (≥ 2.4.5), xtable (≥ 1.8.0) |
SystemRequirements: | JAGS (http://mcmc-jags.sourceforge.net) |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
VignetteBuilder: | knitr |
Collate: | 'sum_every_n1.R' 'sum_every_n.R' 'pistar_compute.R' 'pi_compute.R' 'COMBO_data.R' 'expit.R' 'perfect_sensitivity_EM.R' 'w_j.R' 'q_gamma_f.R' 'q_beta_f.R' 'em_function.R' 'loglik.R' 'COMBO_EM.R' 'pitilde_compute.R' 'COMBO_data_2stage.R' 'w_j_2stage.R' 'q_delta_f.R' 'em_function_2stage.R' 'loglik_2stage.R' 'COMBO_EM_2stage.R' 'COMBO_EM_data.R' 'label_switch.R' 'check_and_fix_chains.R' 'mean_pistarjj_compute.R' 'pistar_compute_for_chains.R' 'pistar_by_chain.R' 'naive_jags_picker.R' 'naive_model_picker.R' 'jags_picker.R' 'model_picker.R' 'COMBO_MCMC.R' 'model_picker_2stage.R' 'COMBO_MCMC_2stage.R' 'LSAC_data.R' 'VPRAI_synthetic_data.R' 'check_and_fix_chains_2stage.R' 'jags_picker_2stage.R' 'label_switch_2stage.R' 'misclassification_prob.R' 'misclassification_prob2.R' 'naive_jags_picker_2stage.R' 'naive_loglik_2stage.R' 'naive_model_picker_2stage.R' 'pistar_by_chain_2stage.R' 'pistar_compute_for_chains_2stage.R' 'pitilde_by_chain.R' 'pitilde_compute_for_chains.R' 'true_classification_prob.R' |
LazyData: | true |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2024-10-30 15:03:41 UTC; KIW58 |
Repository: | CRAN |
Date/Publication: | 2024-10-30 15:50:02 UTC |
EM-Algorithm Estimation of the Binary Outcome Misclassification Model
Description
Jointly estimate \beta
and \gamma
parameters from the true outcome
and observation mechanisms, respectively, in a binary outcome misclassification
model.
Usage
COMBO_EM(
Ystar,
x_matrix,
z_matrix,
beta_start,
gamma_start,
tolerance = 1e-07,
max_em_iterations = 1500,
em_method = "squarem"
)
Arguments
Ystar |
A numeric vector of indicator variables (1, 2) for the observed
outcome |
x_matrix |
A numeric matrix of covariates in the true outcome mechanism.
|
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
beta_start |
A numeric vector or column matrix of starting values for the |
gamma_start |
A numeric vector or matrix of starting values for the |
tolerance |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
max_em_iterations |
An integer specifying the maximum number of
iterations of the EM algorithm. The default is |
em_method |
A character string specifying which EM algorithm will be applied.
Options are |
Value
COMBO_EM
returns a data frame containing four columns. The first
column, Parameter
, represents a unique parameter value for each row.
The next column contains the parameter Estimates
, followed by the standard
error estimates, SE
. The final column, Convergence
, reports
whether or not the algorithm converged for a given parameter estimate.
Estimates are provided for the binary misclassification model, as well as two
additional cases. The "SAMBA" parameter estimates are from the R Package,
SAMBA, which uses the EM algorithm to estimate a binary outcome misclassification
model that assumes there is perfect specificity. The "PSens" parameter estimates
are estimated using the EM algorithm for the binary outcome misclassification
model that assumes there is perfect sensitivitiy. The "Naive" parameter
estimates are from a simple logistic regression Y* ~ X
.
References
Beesley, L. and Mukherjee, B. (2020). Statistical inference for association studies using electronic health records: Handling both selection bias and outcome misclassification. Biometrics, 78, 214-226.
Examples
set.seed(123)
n <- 1000
x_mu <- 0
x_sigma <- 1
z_shape <- 1
true_beta <- matrix(c(1, -2), ncol = 1)
true_gamma <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE)
x_matrix = matrix(rnorm(n, x_mu, x_sigma), ncol = 1)
X = matrix(c(rep(1, n), x_matrix[,1]), ncol = 2, byrow = FALSE)
z_matrix = matrix(rgamma(n, z_shape), ncol = 1)
Z = matrix(c(rep(1, n), z_matrix[,1]), ncol = 2, byrow = FALSE)
exp_xb = exp(X %*% true_beta)
pi_result = exp_xb[,1] / (exp_xb[,1] + 1)
pi_matrix = matrix(c(pi_result, 1 - pi_result), ncol = 2, byrow = FALSE)
true_Y <- rep(NA, n)
for(i in 1:n){
true_Y[i] = which(stats::rmultinom(1, 1, pi_matrix[i,]) == 1)
}
exp_zg = exp(Z %*% true_gamma)
pistar_denominator = matrix(c(1 + exp_zg[,1], 1 + exp_zg[,2]), ncol = 2, byrow = FALSE)
pistar_result = exp_zg / pistar_denominator
pistar_matrix = matrix(c(pistar_result[,1], 1 - pistar_result[,1],
pistar_result[,2], 1 - pistar_result[,2]),
ncol = 2, byrow = FALSE)
obs_Y <- rep(NA, n)
for(i in 1:n){
true_j = true_Y[i]
obs_Y[i] = which(rmultinom(1, 1,
pistar_matrix[c(i, n + i),
true_j]) == 1)
}
Ystar <- obs_Y
starting_values <- rep(1,6)
beta_start <- matrix(starting_values[1:2], ncol = 1)
gamma_start <- matrix(starting_values[3:6], ncol = 2, nrow = 2, byrow = FALSE)
EM_results <- COMBO_EM(Ystar, x_matrix = x_matrix, z_matrix = z_matrix,
beta_start = beta_start, gamma_start = gamma_start)
EM_results
EM-Algorithm Estimation of the Two-Stage Binary Outcome Misclassification Model
Description
Jointly estimate \beta
, \gamma^{(1)}
, \gamma^{(2)}
parameters from the true outcome,
first-stage observation, and second-stage observation mechanisms, respectively,
in a two-stage binary outcome misclassification model.
Usage
COMBO_EM_2stage(
Ystar1,
Ystar2,
x_matrix,
z1_matrix,
z2_matrix,
beta_start,
gamma1_start,
gamma2_start,
tolerance = 1e-07,
max_em_iterations = 1500,
em_method = "squarem"
)
Arguments
Ystar1 |
A numeric vector of indicator variables (1, 2) for the first-stage observed
outcome |
Ystar2 |
A numeric vector of indicator variables (1, 2) for the second-stage
observed outcome |
x_matrix |
A numeric matrix of covariates in the true outcome mechanism.
|
z1_matrix |
A numeric matrix of covariates in the first-stage observation mechanism.
|
z2_matrix |
A numeric matrix of covariates in the second-stage observation mechanism.
|
beta_start |
A numeric vector or column matrix of starting values for the |
gamma1_start |
A numeric vector or matrix of starting values for the |
gamma2_start |
A numeric array of starting values for the |
tolerance |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
max_em_iterations |
An integer specifying the maximum number of
iterations of the EM algorithm. The default is |
em_method |
A character string specifying which EM algorithm will be applied.
Options are |
Value
COMBO_EM_2stage
returns a data frame containing four columns. The first
column, Parameter
, represents a unique parameter value for each row.
The next column contains the parameter Estimates
, followed by the standard
error estimates, SE
. The final column, Convergence
, reports
whether or not the algorithm converged for a given parameter estimate.
Estimates are provided for the two-stage binary misclassification model.
Examples
set.seed(123)
n <- 1000
x_mu <- 0
x_sigma <- 1
z1_shape <- 1
z2_shape <- 1
true_beta <- matrix(c(1, -2), ncol = 1)
true_gamma1 <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE)
true_gamma2 <- array(c(1.5, 1, .5, .5, -.5, 0, -1, -1), dim = c(2, 2, 2))
my_data <- COMBO_data_2stage(sample_size = n,
x_mu = x_mu, x_sigma = x_sigma,
z1_shape = z1_shape, z2_shape = z2_shape,
beta = true_beta, gamma1 = true_gamma1, gamma2 = true_gamma2)
table(my_data[["obs_Ystar2"]], my_data[["obs_Ystar1"]], my_data[["true_Y"]])
beta_start <- rnorm(length(c(true_beta)))
gamma1_start <- rnorm(length(c(true_gamma1)))
gamma2_start <- rnorm(length(c(true_gamma2)))
EM_results <- COMBO_EM_2stage(Ystar1 = my_data[["obs_Ystar1"]],
Ystar2 = my_data[["obs_Ystar2"]],
x_matrix = my_data[["x"]],
z1_matrix = my_data[["z1"]],
z2_matrix = my_data[["z2"]],
beta_start = beta_start,
gamma1_start = gamma1_start,
gamma2_start = gamma2_start)
EM_results
Test data for the COMBO_EM function
Description
A dataset for testing the COMBO_EM
function, generated from the
COMBO_data
function.
Usage
COMBO_EM_data
Format
A list containing 6 variables for 1000 observations:
- Y
The true outcome variable
- Ystar
The observed outcome variable
- x_matrix
A matrix of predictor values in the true outcome mechanism
- z_matrix
A matrix of predictor values in the observed outcome mechanism
- true_beta
Beta parameter values used for data generation in the true outcome mechanism
- true_gamma
Gamma parameter values used for data generation in the observed outcome mechanism
Examples
## Not run:
data("COMBO_EM_data")
head(COMBO_EM_data)
## End(Not run)
MCMC Estimation of the Binary Outcome Misclassification Model
Description
Jointly estimate \beta
and \gamma
parameters from the true outcome
and observation mechanisms, respectively, in a binary outcome misclassification
model.
Usage
COMBO_MCMC(
Ystar,
x_matrix,
z_matrix,
prior,
beta_prior_parameters,
gamma_prior_parameters,
number_MCMC_chains = 4,
MCMC_sample = 2000,
burn_in = 1000,
display_progress = TRUE
)
Arguments
Ystar |
A numeric vector of indicator variables (1, 2) for the observed
outcome |
x_matrix |
A numeric matrix of covariates in the true outcome mechanism.
|
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
prior |
A character string specifying the prior distribution for the
|
beta_prior_parameters |
A numeric list of prior distribution parameters
for the |
gamma_prior_parameters |
A numeric list of prior distribution parameters
for the |
number_MCMC_chains |
An integer specifying the number of MCMC chains to compute.
The default is |
MCMC_sample |
An integer specifying the number of MCMC samples to draw.
The default is |
burn_in |
An integer specifying the number of MCMC samples to discard
for the burn-in period. The default is |
display_progress |
A logical value specifying whether messages should be
displayed during model compilation. The default is |
Value
COMBO_MCMC
returns a list of the posterior samples and posterior
means for both the binary outcome misclassification model and a naive logistic
regression of the observed outcome, Y*
, predicted by the matrix x
.
The list contains the following components:
posterior_sample_df |
A data frame containing three columns. The first
column indicates the chain from which a sample is taken, from 1 to |
posterior_means_df |
A data frame containing three columns. The first
column specifies the parameter associated with a given row. Parameters are
indexed as in the |
naive_posterior_sample_df |
A data frame containing three columns.
The first column indicates the chain from which a sample is taken, from
1 to |
naive_posterior_means_df |
A data frame containing three columns. The first
column specifies the naive parameter associated with a given row. Parameters are
indexed as in the |
Examples
set.seed(123)
n <- 1000
x_mu <- 0
x_sigma <- 1
z_shape <- 1
true_beta <- matrix(c(1, -2), ncol = 1)
true_gamma <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE)
x_matrix = matrix(rnorm(n, x_mu, x_sigma), ncol = 1)
X = matrix(c(rep(1, n), x_matrix[,1]), ncol = 2, byrow = FALSE)
z_matrix = matrix(rgamma(n, z_shape), ncol = 1)
Z = matrix(c(rep(1, n), z_matrix[,1]), ncol = 2, byrow = FALSE)
exp_xb = exp(X %*% true_beta)
pi_result = exp_xb[,1] / (exp_xb[,1] + 1)
pi_matrix = matrix(c(pi_result, 1 - pi_result), ncol = 2, byrow = FALSE)
true_Y <- rep(NA, n)
for(i in 1:n){
true_Y[i] = which(stats::rmultinom(1, 1, pi_matrix[i,]) == 1)
}
exp_zg = exp(Z %*% true_gamma)
pistar_denominator = matrix(c(1 + exp_zg[,1], 1 + exp_zg[,2]), ncol = 2, byrow = FALSE)
pistar_result = exp_zg / pistar_denominator
pistar_matrix = matrix(c(pistar_result[,1], 1 - pistar_result[,1],
pistar_result[,2], 1 - pistar_result[,2]),
ncol = 2, byrow = FALSE)
obs_Y <- rep(NA, n)
for(i in 1:n){
true_j = true_Y[i]
obs_Y[i] = which(rmultinom(1, 1,
pistar_matrix[c(i, n + i),
true_j]) == 1)
}
Ystar <- obs_Y
unif_lower_beta <- matrix(c(-5, -5, NA, NA), nrow = 2, byrow = TRUE)
unif_upper_beta <- matrix(c(5, 5, NA, NA), nrow = 2, byrow = TRUE)
unif_lower_gamma <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA),
dim = c(2,2,2))
unif_upper_gamma <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA),
dim = c(2,2,2))
beta_prior_parameters <- list(lower = unif_lower_beta, upper = unif_upper_beta)
gamma_prior_parameters <- list(lower = unif_lower_gamma, upper = unif_upper_gamma)
MCMC_results <- COMBO_MCMC(Ystar, x = x_matrix, z = z_matrix,
prior = "uniform",
beta_prior_parameters = beta_prior_parameters,
gamma_prior_parameters = gamma_prior_parameters,
number_MCMC_chains = 2,
MCMC_sample = 200, burn_in = 100)
MCMC_results$posterior_means_df
MCMC Estimation of the Two-Stage Binary Outcome Misclassification Model
Description
Jointly estimate \beta
, \gamma^{(1)}
, and \gamma^{(2)}
parameters from the true outcome
first-stage observation, and second-stage observation mechanisms, respectively,
in a two-stage binary outcome misclassification model.
Usage
COMBO_MCMC_2stage(
Ystar1,
Ystar2,
x_matrix,
z1_matrix,
z2_matrix,
prior,
beta_prior_parameters,
gamma1_prior_parameters,
gamma2_prior_parameters,
naive_gamma2_prior_parameters,
number_MCMC_chains = 4,
MCMC_sample = 2000,
burn_in = 1000,
display_progress = TRUE
)
Arguments
Ystar1 |
A numeric vector of indicator variables (1, 2) for the observed
outcome |
Ystar2 |
A numeric vector of indicator variables (1, 2) for the second-stage
observed outcome |
x_matrix |
A numeric matrix of covariates in the true outcome mechanism.
|
z1_matrix |
A numeric matrix of covariates in the observation mechanism.
|
z2_matrix |
A numeric matrix of covariates in the second-stage observation mechanism.
|
prior |
A character string specifying the prior distribution for the
|
beta_prior_parameters |
A numeric list of prior distribution parameters
for the |
gamma1_prior_parameters |
A numeric list of prior distribution parameters
for the |
gamma2_prior_parameters |
A numeric list of prior distribution parameters
for the |
naive_gamma2_prior_parameters |
A numeric list of prior distribution parameters
for the naive model |
number_MCMC_chains |
An integer specifying the number of MCMC chains to compute.
The default is |
MCMC_sample |
An integer specifying the number of MCMC samples to draw.
The default is |
burn_in |
An integer specifying the number of MCMC samples to discard
for the burn-in period. The default is |
display_progress |
A logical value specifying whether messages should be
displayed during model compilation. The default is |
Value
COMBO_MCMC_2stage
returns a list of the posterior samples and posterior
means for both the binary outcome misclassification model and a naive logistic
regression of the observed outcome, Y*
, predicted by the matrix x
.
The list contains the following components:
posterior_sample_df |
A data frame containing three columns. The first
column indicates the chain from which a sample is taken, from 1 to |
posterior_means_df |
A data frame containing three columns. The first
column specifies the parameter associated with a given row. Parameters are
indexed as in the |
naive_posterior_sample_df |
A data frame containing three columns.
The first column indicates the chain from which a sample is taken, from
1 to |
naive_posterior_means_df |
A data frame containing three columns. The first
column specifies the naive parameter associated with a given row. Parameters are
indexed as in the |
Examples
# Helper functions
sum_every_n <- function(x, n){
vector_groups = split(x,
ceiling(seq_along(x) / n))
sum_x = Reduce(`+`, vector_groups)
return(sum_x)
}
sum_every_n1 <- function(x, n){
vector_groups = split(x,
ceiling(seq_along(x) / n))
sum_x = Reduce(`+`, vector_groups) + 1
return(sum_x)
}
# Example
set.seed(123)
n <- 1000
x_mu <- 0
x_sigma <- 1
z1_shape <- 1
z2_shape <- 1
true_beta <- matrix(c(1, -2), ncol = 1)
true_gamma1 <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE)
true_gamma2 <- array(c(1.5, 1, .5, .5, -.5, 0, -1, -1), dim = c(2, 2, 2))
x_matrix = matrix(rnorm(n, x_mu, x_sigma), ncol = 1)
X = matrix(c(rep(1, n), x_matrix[,1]), ncol = 2, byrow = FALSE)
z1_matrix = matrix(rgamma(n, z1_shape), ncol = 1)
Z1 = matrix(c(rep(1, n), z1_matrix[,1]), ncol = 2, byrow = FALSE)
z2_matrix = matrix(rgamma(n, z2_shape), ncol = 1)
Z2 = matrix(c(rep(1, n), z2_matrix[,1]), ncol = 2, byrow = FALSE)
exp_xb = exp(X %*% true_beta)
pi_result = exp_xb[,1] / (exp_xb[,1] + 1)
pi_matrix = matrix(c(pi_result, 1 - pi_result), ncol = 2, byrow = FALSE)
true_Y <- rep(NA, n)
for(i in 1:n){
true_Y[i] = which(stats::rmultinom(1, 1, pi_matrix[i,]) == 1)
}
exp_z1g1 = exp(Z1 %*% true_gamma1)
pistar1_denominator = matrix(c(1 + exp_z1g1[,1], 1 + exp_z1g1[,2]),
ncol = 2, byrow = FALSE)
pistar1_result = exp_z1g1 / pistar1_denominator
pistar1_matrix = matrix(c(pistar1_result[,1], 1 - pistar1_result[,1],
pistar1_result[,2], 1 - pistar1_result[,2]),
ncol = 2, byrow = FALSE)
obs_Y1 <- rep(NA, n)
for(i in 1:n){
true_j = true_Y[i]
obs_Y1[i] = which(rmultinom(1, 1,
pistar1_matrix[c(i, n + i),
true_j]) == 1)
}
Ystar1 <- obs_Y1
exp_z2g2_1 = exp(Z2 %*% true_gamma2[,,1])
exp_z2g2_2 = exp(Z2 %*% true_gamma2[,,2])
pi_denominator1 = apply(exp_z2g2_1, FUN = sum_every_n1, n, MARGIN = 2)
pi_result1 = exp_z2g2_1 / rbind(pi_denominator1)
pi_denominator2 = apply(exp_z2g2_2, FUN = sum_every_n1, n, MARGIN = 2)
pi_result2 = exp_z2g2_2 / rbind(pi_denominator2)
pistar2_matrix1 = rbind(pi_result1,
1 - apply(pi_result1,
FUN = sum_every_n, n = n,
MARGIN = 2))
pistar2_matrix2 = rbind(pi_result2,
1 - apply(pi_result2,
FUN = sum_every_n, n = n,
MARGIN = 2))
pistar2_array = array(c(pistar2_matrix1, pistar2_matrix2),
dim = c(dim(pistar2_matrix1), 2))
obs_Y2 <- rep(NA, n)
for(i in 1:n){
true_j = true_Y[i]
obs_k = Ystar1[i]
obs_Y2[i] = which(rmultinom(1, 1,
pistar2_array[c(i,n+ i),
obs_k, true_j]) == 1)
}
Ystar2 <- obs_Y2
unif_lower_beta <- matrix(c(-5, -5, NA, NA), nrow = 2, byrow = TRUE)
unif_upper_beta <- matrix(c(5, 5, NA, NA), nrow = 2, byrow = TRUE)
unif_lower_gamma1 <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA),
dim = c(2,2,2))
unif_upper_gamma1 <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA),
dim = c(2,2,2))
unif_upper_gamma2 <- array(rep(c(5, NA), 8), dim = c(2,2,2,2))
unif_lower_gamma2 <- array(rep(c(-5, NA), 8), dim = c(2,2,2,2))
unif_lower_naive_gamma2 <- array(data = c(-5, NA, -5, NA, -5, NA, -5, NA),
dim = c(2,2,2))
unif_upper_naive_gamma2 <- array(data = c(5, NA, 5, NA, 5, NA, 5, NA),
dim = c(2,2,2))
beta_prior_parameters <- list(lower = unif_lower_beta, upper = unif_upper_beta)
gamma1_prior_parameters <- list(lower = unif_lower_gamma1, upper = unif_upper_gamma1)
gamma2_prior_parameters <- list(lower = unif_lower_gamma2, upper = unif_upper_gamma2)
naive_gamma2_prior_parameters <- list(lower = unif_lower_naive_gamma2,
upper = unif_upper_naive_gamma2)
MCMC_results <- COMBO_MCMC_2stage(Ystar1, Ystar2,
x_matrix = x_matrix, z1_matrix = z1_matrix,
z2_matrix = z2_matrix,
prior = "uniform",
beta_prior_parameters = beta_prior_parameters,
gamma1_prior_parameters = gamma1_prior_parameters,
gamma2_prior_parameters = gamma2_prior_parameters,
naive_gamma2_prior_parameters = naive_gamma2_prior_parameters,
number_MCMC_chains = 2,
MCMC_sample = 200, burn_in = 100)
MCMC_results$posterior_means_df
Generate Data to use in COMBO Functions
Description
Generate Data to use in COMBO Functions
Usage
COMBO_data(sample_size, x_mu, x_sigma, z_shape, beta, gamma)
Arguments
sample_size |
An integer specifying the sample size of the generated data set. |
x_mu |
A numeric value specifying the mean of |
x_sigma |
A positive numeric value specifying the standard deviation of
|
z_shape |
A positive numeric value specifying the shape parameter of
|
beta |
A column matrix of |
gamma |
A numeric matrix of |
Value
COMBO_data
returns a list of generated data elements:
obs_Y |
A vector of observed outcomes. |
true_Y |
A vector of true outcomes. |
obs_Y_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
x |
A vector of generated predictor values in the true outcome mechanism, from the Normal distribution. |
z |
A vector of generated predictor values in the observation mechanism from the Gamma distribution. |
x_design_matrix |
The design matrix for the |
z_design_matrix |
The design matrix for the |
Examples
set.seed(123)
n <- 500
x_mu <- 0
x_sigma <- 1
z_shape <- 1
true_beta <- matrix(c(1, -2), ncol = 1)
true_gamma <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE)
my_data <- COMBO_data(sample_size = n,
x_mu = x_mu, x_sigma = x_sigma,
z_shape = z_shape,
beta = true_beta, gamma = true_gamma)
table(my_data[["obs_Y"]], my_data[["true_Y"]])
Generate data to use in two-stage COMBO Functions
Description
Generate data to use in two-stage COMBO Functions
Usage
COMBO_data_2stage(
sample_size,
x_mu,
x_sigma,
z1_shape,
z2_shape,
beta,
gamma1,
gamma2
)
Arguments
sample_size |
An integer specifying the sample size of the generated data set. |
x_mu |
A numeric value specifying the mean of |
x_sigma |
A positive numeric value specifying the standard deviation of
|
z1_shape |
A positive numeric value specifying the shape parameter of
|
z2_shape |
A positive numeric value specifying the shape parameter of
|
beta |
A column matrix of |
gamma1 |
A numeric matrix of |
gamma2 |
A numeric array of |
Value
COMBO_data_2stage
returns a list of generated data elements:
obs_Ystar1 |
A vector of first-stage observed outcomes. |
obs_Ystar2 |
A vector of second-stage observed outcomes. |
true_Y |
A vector of true outcomes. |
obs_Ystar1_matrix |
A numeric matrix of indicator variables (0, 1) for the first-stage observed
outcome |
obs_Ystar2_matrix |
A numeric matrix of indicator variables (0, 1) for the second-stage observed
outcome |
x |
A vector of generated predictor values in the true outcome mechanism, from the Normal distribution. |
z1 |
A vector of generated predictor values in the first-stage observation mechanism from the Gamma distribution. |
z2 |
A vector of generated predictor values in the second-stage observation mechanism from the Gamma distribution. |
x_design_matrix |
The design matrix for the |
z1_design_matrix |
The design matrix for the |
z2_design_matrix |
The design matrix for the |
Examples
set.seed(123)
n <- 1000
x_mu <- 0
x_sigma <- 1
z1_shape <- 1
z2_shape <- 1
true_beta <- matrix(c(1, -2), ncol = 1)
true_gamma1 <- matrix(c(.5, 1, -.5, -1), nrow = 2, byrow = FALSE)
true_gamma2 <- array(c(1.5, 1, .5, .5, -.5, 0, -1, -1), dim = c(2, 2, 2))
my_data <- COMBO_data_2stage(sample_size = n,
x_mu = x_mu, x_sigma = x_sigma,
z1_shape = z1_shape, z2_shape = z2_shape,
beta = true_beta, gamma1 = true_gamma1, gamma2 = true_gamma2)
table(my_data[["obs_Ystar2"]], my_data[["obs_Ystar1"]], my_data[["true_Y"]])
Example data from The Law School Admissions Council's (LSAC) National Bar Passage Study (Linda Wightman, 1998)
Description
Example data from The Law School Admissions Council's (LSAC) National Bar Passage Study (Linda Wightman, 1998)
Usage
LSAC_data
Format
A dataframe 39 columns, including background and demographic information, as well as if the candidates passed the bar exam to become lawyers in the USA.
Source
Examples
## Not run:
data("LSAC_data")
head(LSAC_data)
## End(Not run)
Synthetic example data of pretrial failure risk factors and outcomes, VPRAI recommendations, and judge decisions
Description
Synthetic example data of pretrial failure risk factors and outcomes, VPRAI recommendations, and judge decisions
Usage
VPRAI_synthetic_data
Format
A dataframe 1990 columns, including defendant race, risk factors, VPRAI recommendations, judge decisions, and pretrial failure outcomes.
Examples
## Not run:
data("VPRAI_synthetic_data")
head(VPRAI_synthetic_data)
## End(Not run)
Check Assumption and Fix Label Switching if Assumption is Broken for a List of MCMC Samples
Description
Check Assumption and Fix Label Switching if Assumption is Broken for a List of MCMC Samples
Usage
check_and_fix_chains(
n_chains,
chains_list,
pistarjj_matrix,
dim_x,
dim_z,
n_cat
)
Arguments
n_chains |
An integer specifying the number of MCMC chains to compute over. |
chains_list |
A numeric list containing the samples from |
pistarjj_matrix |
A numeric matrix of the average
conditional probability |
dim_x |
The number of columns of the design matrix of the true outcome mechanism, |
dim_z |
The number of columns of the design matrix of the observation mechanism, |
n_cat |
The number of categorical values that the true outcome, |
Value
check_and_fix_chains
returns a numeric list of the samples from
n_chains
MCMC chains which have been corrected for label switching if
the following assumption is not met: P(Y^* = j | Y = j, Z) > 0.50 \forall j
.
Check Assumption and Fix Label Switching if Assumption is Broken for a List of MCMC Samples
Description
Check Assumption and Fix Label Switching if Assumption is Broken for a List of MCMC Samples
Usage
check_and_fix_chains_2stage(
n_chains,
chains_list,
pistarjj_matrix,
pitildejjj_matrix,
dim_x,
dim_z,
dim_v,
n_cat
)
Arguments
n_chains |
An integer specifying the number of MCMC chains to compute over. |
chains_list |
A numeric list containing the samples from |
pistarjj_matrix |
A numeric matrix of the average
conditional probability |
pitildejjj_matrix |
A numeric matrix of the average conditional probability
|
dim_x |
The number of columns of the design matrix of the true outcome mechanism, |
dim_z |
The number of columns of the design matrix of the first-stage observation mechanism, |
dim_v |
The number of columns of the design matrix of the second-stage observation mechanism, |
n_cat |
The number of categorical values that the true outcome, |
Value
check_and_fix_chains
returns a numeric list of the samples from
n_chains
MCMC chains which have been corrected for label switching if
the following assumption is not met: P(Y^* = j | Y = j, Z) > 0.50 \forall j
.
EM-Algorithm Function for Estimation of the Misclassification Model
Description
EM-Algorithm Function for Estimation of the Misclassification Model
Usage
em_function(param_current, obs_Y_matrix, X, Z, sample_size, n_cat)
Arguments
param_current |
A numeric vector of regression parameters, in the order
|
obs_Y_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
Z |
A numeric design matrix for the observation mechanism. |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
em_function
returns a numeric vector of updated parameter
estimates from one iteration of the EM-algorithm.
EM-Algorithm Function for Estimation of the Two-Stage Misclassification Model
Description
EM-Algorithm Function for Estimation of the Two-Stage Misclassification Model
Usage
em_function_2stage(
param_current,
obs_Ystar_matrix,
obs_Ytilde_matrix,
X,
Z,
V,
sample_size,
n_cat
)
Arguments
param_current |
A numeric vector of regression parameters, in the order
|
obs_Ystar_matrix |
A numeric matrix of indicator variables (0, 1) for the first-stage observed
outcome |
obs_Ytilde_matrix |
A numeric matrix of indicator variables (0, 1) for the second-stage observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
Z |
A numeric design matrix for the first-stage observation mechanism. |
V |
A numeric design matrix for the second-stage observation mechanism. |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrices, |
n_cat |
The number of categorical values that the true outcome, |
Value
em_function_2stage
returns a numeric vector of updated parameter
estimates from one iteration of the EM-algorithm.
Expit function
Description
\frac{\exp\{x\}}{1 + \exp\{x\}}
Usage
expit(x)
Arguments
x |
A numeric value or vector to compute the expit function on. |
Value
expit
returns the result of the function
f(x) = \frac{\exp\{x\}}{1 + \exp\{x\}}
for a given x
.
Set up a Binary Outcome Misclassification jags.model
Object for a Given Prior
Description
Set up a Binary Outcome Misclassification jags.model
Object for a Given Prior
Usage
jags_picker(
prior,
sample_size,
dim_x,
dim_z,
n_cat,
Ystar,
X,
Z,
beta_prior_parameters,
gamma_prior_parameters,
number_MCMC_chains,
model_file,
display_progress = TRUE
)
Arguments
prior |
A character string specifying the prior distribution for the
|
sample_size |
An integer value specifying the number of observations in the sample. |
dim_x |
An integer specifying the number of columns of the design matrix of the true outcome mechanism, |
dim_z |
An integer specifying the number of columns of the design matrix of the observation mechanism, |
n_cat |
An integer specifying the number of categorical values that the true outcome, |
Ystar |
A numeric vector of indicator variables (1, 2) for the observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
Z |
A numeric design matrix for the observation mechanism. |
beta_prior_parameters |
A numeric list of prior distribution parameters
for the |
gamma_prior_parameters |
A numeric list of prior distribution parameters
for the |
number_MCMC_chains |
An integer specifying the number of MCMC chains to compute. |
model_file |
A .BUG file and used
for MCMC estimation with |
display_progress |
A logical value specifying whether messages should be
displayed during model compilation. The default is |
Value
jags_picker
returns a jags.model
object for a binary
outcome misclassification model. The object includes the specified
prior distribution, model, number of chains, and data.
Set up a Two-Stage Binary Outcome Misclassification jags.model
Object for a Given Prior
Description
Set up a Two-Stage Binary Outcome Misclassification jags.model
Object for a Given Prior
Usage
jags_picker_2stage(
prior,
sample_size,
dim_x,
dim_z,
dim_v,
n_cat,
Ystar,
Ytilde,
X,
Z,
V,
beta_prior_parameters,
gamma_prior_parameters,
delta_prior_parameters,
number_MCMC_chains,
model_file,
display_progress = TRUE
)
Arguments
prior |
A character string specifying the prior distribution for the
|
sample_size |
An integer value specifying the number of observations in the sample. |
dim_x |
An integer specifying the number of columns of the design matrix of the true outcome mechanism, |
dim_z |
An integer specifying the number of columns of the design matrix of the first-stage observation mechanism, |
dim_v |
An integer specifying the number of columns of the design matrix of the second-stage observation mechanism, |
n_cat |
An integer specifying the number of categorical values that the true outcome, |
Ystar |
A numeric vector of indicator variables (1, 2) for the first-stage observed
outcome |
Ytilde |
A numeric vector of indicator variables (1, 2) for the second-stage observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
Z |
A numeric design matrix for the first-stage observation mechanism. |
V |
A numeric design matrix for the second-stage observation mechanism. |
beta_prior_parameters |
A numeric list of prior distribution parameters
for the |
gamma_prior_parameters |
A numeric list of prior distribution parameters
for the |
delta_prior_parameters |
A numeric list of prior distribution parameters
for the |
number_MCMC_chains |
An integer specifying the number of MCMC chains to compute. |
model_file |
A .BUG file and used
for MCMC estimation with |
display_progress |
A logical value specifying whether messages should be
displayed during model compilation. The default is |
Value
jags_picker
returns a jags.model
object for a two-stage binary
outcome misclassification model. The object includes the specified
prior distribution, model, number of chains, and data.
Fix Label Switching in MCMC Results from a Binary Outcome Misclassification Model
Description
Fix Label Switching in MCMC Results from a Binary Outcome Misclassification Model
Usage
label_switch(chain_matrix, dim_x, dim_z, n_cat)
Arguments
chain_matrix |
A numeric matrix containing the posterior samples for all
parameters in a given MCMC chain. |
dim_x |
An integer specifying the number of columns of the design matrix of the true outcome mechanism, |
dim_z |
An integer specifying the number of columns of the design matrix of the observation mechanism, |
n_cat |
An integer specifying the number of categorical values that the true outcome, |
Value
label_switch
returns a named matrix of MCMC posterior samples for
all parameters after performing label switching according the following pattern:
all \beta
terms are multiplied by -1, all \gamma
terms are "swapped"
with the opposite j
index.
Fix Label Switching in MCMC Results from a Binary Outcome Misclassification Model
Description
Fix Label Switching in MCMC Results from a Binary Outcome Misclassification Model
Usage
label_switch_2stage(chain_matrix, dim_x, dim_z, dim_v, n_cat)
Arguments
chain_matrix |
A numeric matrix containing the posterior samples for all
parameters in a given MCMC chain. |
dim_x |
An integer specifying the number of columns of the design matrix of the true outcome mechanism, |
dim_z |
An integer specifying the number of columns of the design matrix of the first-stage observation mechanism, |
dim_v |
An integer specifying the number of columns of the design matrix of the second-stage observation mechanism, |
n_cat |
An integer specifying the number of categorical values that the true outcome, |
Value
label_switch_2stage
returns a named matrix of MCMC posterior samples for
all parameters after performing label switching according the following pattern:
all \beta
terms are multiplied by -1, all \gamma
and \delta
terms are "swapped"
with the opposite j
index.
Expected Complete Data Log-Likelihood Function for Estimation of the Misclassification Model
Description
Expected Complete Data Log-Likelihood Function for Estimation of the Misclassification Model
Usage
loglik(param_current, obs_Y_matrix, X, Z, sample_size, n_cat)
Arguments
param_current |
A numeric vector of regression parameters, in the order
|
obs_Y_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
Z |
A numeric design matrix for the observation mechanism. |
sample_size |
Integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
loglik
returns the negative value of the expected log-likelihood function,
Q = \sum_{i = 1}^N \Bigl[ \sum_{j = 1}^2 w_{ij} \text{log} \{ \pi_{ij} \} + \sum_{j = 1}^2 \sum_{k = 1}^2 w_{ij} y^*_{ik} \text{log} \{ \pi^*_{ikj} \}\Bigr]
,
at the provided inputs.
Expected Complete Data Log-Likelihood Function for Estimation of the Two-Stage Misclassification Model
Description
Expected Complete Data Log-Likelihood Function for Estimation of the Two-Stage Misclassification Model
Usage
loglik_2stage(
param_current,
obs_Ystar_matrix,
obs_Ytilde_matrix,
X,
Z,
V,
sample_size,
n_cat
)
Arguments
param_current |
A numeric vector of regression parameters, in the order
|
obs_Ystar_matrix |
A numeric matrix of indicator variables (0, 1) for the first-stage observed
outcome |
obs_Ytilde_matrix |
A numeric matrix of indicator variables (0, 1) for the second-stage observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
Z |
A numeric design matrix for the first-stage observation mechanism. |
V |
A numeric design matrix for the second-stage observation mechanism. |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrices, |
n_cat |
The number of categorical values that the true outcome, |
Value
loglik_2stage
returns the negative value of the expected log-likelihood function,
Q = \sum_{i = 1}^N \Bigl[ \sum_{j = 1}^2 w_{ij} \text{log} \{ \pi_{ij} \} + \sum_{j = 1}^2 \sum_{k = 1}^2 w_{ij} y^*_{ik} \text{log} \{ \pi^*_{ikj} \} +
\sum_{j = 1}^2 \sum_{k = 1}^2 \sum_{\ell = 1}^2 w_{ij} y^*_{ik} \tilde{y}_{i \ell} \text{log} \{ \tilde{\pi}_{i \ell kj} \}\Bigr]
,
at the provided inputs.
Compute the Mean Conditional Probability of Correct Classification, by True Outcome Across all Subjects
Description
Compute the Mean Conditional Probability of Correct Classification, by True Outcome Across all Subjects
Usage
mean_pistarjj_compute(pistar_matrix, j, sample_size)
Arguments
pistar_matrix |
A numeric matrix of conditional probabilities obtained from
the internal function |
j |
An integer value representing the true outcome category to compute
the average conditional probability of correct classification for.
|
sample_size |
An integer value specifying the number of observations in the sample. |
Value
mean_pistarjj_compute
returns a numeric value equal to the average
conditional probability P(Y^* = j | Y = j, Z)
across all subjects.
Compute Conditional Probability of Each Observed Outcome Given Each True Outcome, for Every Subject
Description
Compute the conditional probability of observing outcome Y^* \in \{1, 2 \}
given
the latent true outcome Y \in \{1, 2 \}
as
\frac{\text{exp}\{\gamma_{kj0} + \gamma_{kjZ} Z_i\}}{1 + \text{exp}\{\gamma_{kj0} + \gamma_{kjZ} Z_i\}}
for each of the i = 1, \dots,
n
subjects.
Usage
misclassification_prob(gamma_matrix, z_matrix)
Arguments
gamma_matrix |
A numeric matrix of estimated regression parameters for the
observation mechanism, |
z_matrix |
A numeric matrix of covariates in the observation mechanism.
|
Value
misclassification_prob
returns a dataframe containing four columns.
The first column, Subject
, represents the subject ID, from 1
to n
,
where n
is the sample size, or equivalently, the number of rows in z_matrix
.
The second column, Y
, represents a true, latent outcome category Y \in \{1, 2 \}
.
The third column, Ystar
, represents an observed outcome category Y^* \in \{1, 2 \}
.
The last column, Probability
, is the value of the equation
\frac{\text{exp}\{\gamma_{kj0} + \gamma_{kjZ} Z_i\}}{1 + \text{exp}\{\gamma_{kj0} + \gamma_{kjZ} Z_i\}}
computed for each subject, observed outcome category, and true, latent outcome category.
Examples
set.seed(123)
sample_size <- 1000
cov1 <- rnorm(sample_size)
cov2 <- rnorm(sample_size, 1, 2)
z_matrix <- matrix(c(cov1, cov2), nrow = sample_size, byrow = FALSE)
estimated_gammas <- matrix(c(1, -1, .5, .2, -.6, 1.5), ncol = 2)
P_Ystar_Y <- misclassification_prob(estimated_gammas, z_matrix)
head(P_Ystar_Y)
Compute Conditional Probability of Each Second-Stage Observed Outcome Given Each True Outcome and First-Stage Observed Outcome, for Every Subject
Description
Compute the conditional probability of observing second-stage outcome Y^{*(2)} \in \{1, 2 \}
given
the latent true outcome Y \in \{1, 2 \}
and the first-stage outcome Y^{*(1)} \in \{1, 2\}
as
\frac{\text{exp}\{\gamma^{(2)}_{\ell kj0} + \gamma^{(2)}_{\ell kjZ^{(2)}} Z^{(2)}\}}{1 + \text{exp}\{\gamma^{(2)}_{\ell kj0} + \gamma^{(2)}_{\ell kjZ^{(2)}} Z^{(2)}_i\}}
for each of the i = 1, \dots,
n
subjects.
Usage
misclassification_prob2(gamma2_array, z2_matrix)
Arguments
gamma2_array |
A numeric array of estimated regression parameters for the
observation mechanism, |
z2_matrix |
A numeric matrix of covariates in the second-stage observation mechanism.
|
Value
misclassification_prob2
returns a dataframe containing five columns.
The first column, Subject
, represents the subject ID, from 1
to n
,
where n
is the sample size, or equivalently, the number of rows in z2_matrix
.
The second column, Y
, represents a true, latent outcome category Y \in \{1, 2 \}
.
The third column, Ystar1
, represents a first-stage observed outcome category Y^{*(1)} \in \{1, 2 \}
.
The fourth column, Ystar2
, represents a second-stage observed outcome category Y^{*(2)} \in \{1, 2 \}
.
The last column, Probability
, is the value of the equation
\frac{\text{exp}\{\gamma^{(2)}_{\ell kj0} + \gamma^{(2)}_{\ell kjZ^{(2)}} Z^{(2)}\}}{1 + \text{exp}\{\gamma^{(2)}_{\ell kj0} + \gamma^{(2)}_{\ell kjZ^{(2)}} Z^{(2)}_i\}}
computed for each subject, first-stage observed outcome category, second-stage
observed outcome category, and true, latent outcome category.
Examples
set.seed(123)
sample_size <- 1000
cov1 <- rnorm(sample_size)
cov2 <- rnorm(sample_size, 1, 2)
z2_matrix <- matrix(c(cov1, cov2), nrow = sample_size, byrow = FALSE)
estimated_gamma2 <- array(c(1, -1, .5, .2, -.6, 1.5,
-1, .5, -1, -.5, -1, -.5), dim = c(3,2,2))
P_Ystar2_Ystar1_Y <- misclassification_prob2(estimated_gamma2, z2_matrix)
head(P_Ystar2_Ystar1_Y)
Select a Binary Outcome Misclassification Model for a Given Prior
Description
Select a Binary Outcome Misclassification Model for a Given Prior
Usage
model_picker(prior)
Arguments
prior |
A character string specifying the prior distribution for the
|
Value
model_picker
returns a character string specifying the binary
outcome misclassification model to be turned into a .BUG file and used
for MCMC estimation with rjags
.
Select a Two-Stage Binary Outcome Misclassification Model for a Given Prior
Description
Select a Two-Stage Binary Outcome Misclassification Model for a Given Prior
Usage
model_picker_2stage(prior)
Arguments
prior |
A character string specifying the prior distribution for the
|
Value
model_picker
returns a character string specifying the two-stage binary
outcome misclassification model to be turned into a .BUG file and used
for MCMC estimation with rjags
.
Set up a Naive Logistic Regression jags.model
Object for a Given Prior
Description
Set up a Naive Logistic Regression jags.model
Object for a Given Prior
Usage
naive_jags_picker(
prior,
sample_size,
dim_x,
n_cat,
Ystar,
X,
beta_prior_parameters,
number_MCMC_chains,
naive_model_file,
display_progress = TRUE
)
Arguments
prior |
character string specifying the prior distribution for the naive
|
sample_size |
An integer value specifying the number of observations in the sample. |
dim_x |
An integer specifying the number of columns of the design matrix of the true outcome mechanism, |
n_cat |
An integer specifying the number of categorical values that the true outcome, |
Ystar |
A numeric vector of indicator variables (1, 2) for the observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
beta_prior_parameters |
A numeric list of prior distribution parameters
for the |
number_MCMC_chains |
An integer specifying the number of MCMC chains to compute. |
naive_model_file |
A .BUG file and used
for MCMC estimation with |
display_progress |
A logical value specifying whether messages should be
displayed during model compilation. The default is |
Value
naive_jags_picker
returns a jags.model
object for a naive
logistic regression model predicting the potentially misclassified Y*
from the predictor matrix x
. The object includes the specified
prior distribution, model, number of chains, and data.
Set up a Naive Two-Stage Regression jags.model
Object for a Given Prior
Description
Set up a Naive Two-Stage Regression jags.model
Object for a Given Prior
Usage
naive_jags_picker_2stage(
prior,
sample_size,
dim_x,
dim_v,
n_cat,
Ystar,
Ytilde,
X,
V,
beta_prior_parameters,
delta_prior_parameters,
number_MCMC_chains,
naive_model_file,
display_progress = TRUE
)
Arguments
prior |
character string specifying the prior distribution for the naive
|
sample_size |
An integer value specifying the number of observations in the sample. |
dim_x |
An integer specifying the number of columns of the design matrix of the first-stage outcome mechanism, |
dim_v |
An integer specifying the number of columns of the design matrix of the second-stage outcome mechanism, |
n_cat |
An integer specifying the number of categorical values that the observed outcomes can take. |
Ystar |
A numeric vector of indicator variables (1, 2) for the first-stage observed
outcome |
Ytilde |
A numeric vector of indicator variables (1, 2) for the second-stage observed
outcome |
X |
A numeric design matrix for the true outcome mechanism. |
V |
A numeric design matrix for the second-stage outcome mechanism. |
beta_prior_parameters |
A numeric list of prior distribution parameters
for the |
delta_prior_parameters |
A numeric list of prior distribution parameters
for the naive |
number_MCMC_chains |
An integer specifying the number of MCMC chains to compute. |
naive_model_file |
A .BUG file and used
for MCMC estimation with |
display_progress |
A logical value specifying whether messages should be
displayed during model compilation. The default is |
Value
naive_jags_picker_2stage
returns a jags.model
object for a naive
two-stage regression model predicting the potentially misclassified Y*
from the predictor matrix x
and the potentially misclassified \tilde{Y} | Y^*
from the predictor matrix v
. The object includes the specified
prior distribution, model, number of chains, and data.
Observed Data Log-Likelihood Function for Estimation of the Naive Two-Stage Misclassification Model
Description
Observed Data Log-Likelihood Function for Estimation of the Naive Two-Stage Misclassification Model
Usage
naive_loglik_2stage(
param_current,
X,
V,
obs_Ystar_matrix,
obs_Ytilde_matrix,
sample_size,
n_cat
)
Arguments
param_current |
A numeric vector of regression parameters, in the order
|
X |
A numeric design matrix for the first-stage observed mechanism. |
V |
A numeric design matrix for the second-stage observed mechanism. |
obs_Ystar_matrix |
A numeric matrix of indicator variables (0, 1) for the first-stage observed
outcome |
obs_Ytilde_matrix |
A numeric matrix of indicator variables (0, 1) for the second-stage observed
outcome |
sample_size |
Integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the first- and second-stage outcomes,
|
Value
naive_loglik_2stage
returns the negative value of the observed data log-likelihood function,
\sum_{i = 1}^N \Bigl[ \sum_{k = 1}^2 \sum_{k = 1}^2 \sum_{\ell = 1}^2 y^*_{ik} \tilde{y_i} \text{log} \{ P(\tilde{Y}_{i} = \ell, Y^*_i = k | x_i, v_i) \}\Bigr]
,
at the provided inputs.
Select a Logisitic Regression Model for a Given Prior
Description
Select a Logisitic Regression Model for a Given Prior
Usage
naive_model_picker(prior)
Arguments
prior |
A character string specifying the prior distribution for the naive
|
Value
naive_model_picker
returns a character string specifying the
logistic regression model to be turned into a .BUG file and used
for MCMC estimation with rjags
.
Select a Naive Two-Stage Regression Model for a Given Prior
Description
Select a Naive Two-Stage Regression Model for a Given Prior
Usage
naive_model_picker_2stage(prior)
Arguments
prior |
A character string specifying the prior distribution for the naive
|
Value
naive_model_picker_2stage
returns a character string specifying the
logistic regression model to be turned into a .BUG file and used
for MCMC estimation with rjags
.
EM-Algorithm Estimation of the Binary Outcome Misclassification Model while Assuming Perfect Sensitivity
Description
Code is adapted by the SAMBA R package from Lauren Beesley and Bhramar Mukherjee.
Usage
perfect_sensitivity_EM(
Ystar,
Z,
X,
start,
beta0_fixed = NULL,
weights = NULL,
expected = TRUE,
tolerance = 1e-07,
max_em_iterations = 1500
)
Arguments
Ystar |
A numeric vector of indicator variables (1, 0) for the observed
outcome |
Z |
A numeric matrix of covariates in the true outcome mechanism.
|
X |
A numeric matrix of covariates in the observation mechanism.
|
start |
Numeric vector of starting values for parameters in the true
outcome mechanism ( |
beta0_fixed |
Optional numeric vector of values of the observation mechanism
intercept to profile over. If a single value is entered, this corresponds to
fixing the intercept at the specified value. The default is |
weights |
Optional vector of row-specific weights used for selection bias
adjustment. The default is |
expected |
A logical value indicating whether or not to calculate the
covariance matrix via the expected Fisher information matrix. The default is |
tolerance |
A numeric value specifying when to stop estimation, based on
the difference of subsequent log-likelihood estimates. The default is |
max_em_iterations |
An integer specifying the maximum number of
iterations of the EM algorithm. The default is |
Value
perfect_sensitivity_EM
returns a list containing nine elements.
The elements are detailed in ?SAMBA::obsloglikEM
documentation. Code
is adapted from the SAMBA::obsloglikEM
function.
References
Beesley, L. and Mukherjee, B. (2020). Statistical inference for association studies using electronic health records: Handling both selection bias and outcome misclassification. Biometrics, 78, 214-226.
Compute Probability of Each True Outcome, for Every Subject
Description
Compute Probability of Each True Outcome, for Every Subject
Usage
pi_compute(beta, X, n, n_cat)
Arguments
beta |
A numeric column matrix of regression parameters for the
|
X |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
pi_compute
returns a matrix of probabilities,
P(Y_i = j | X_i) = \frac{\exp(X_i \beta)}{1 + \exp(X_i \beta)}
for each of the i = 1, \dots,
n
subjects. Rows of the matrix
correspond to each subject. Columns of the matrix correspond to the true outcome
categories j = 1, \dots,
n_cat
.
Compute the Mean Conditional Probability of Correct Classification, by True Outcome Across all Subjects for each MCMC Chain
Description
Compute the Mean Conditional Probability of Correct Classification, by True Outcome Across all Subjects for each MCMC Chain
Usage
pistar_by_chain(n_chains, chains_list, Z, n, n_cat)
Arguments
n_chains |
An integer specifying the number of MCMC chains to compute over. |
chains_list |
A numeric list containing the samples from |
Z |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
pistar_by_chain
returns a numeric matrix of the average
conditional probability P(Y^* = j | Y = j, Z)
across all subjects for
each MCMC chain. Rows of the matrix correspond to MCMC chains, up to n_chains
.
The first column contains the conditional probability P(Y^* = 1 | Y = 1, Z)
.
The second column contains the conditional probability P(Y^* = 2 | Y = 2, Z)
.
Compute the Mean Conditional Probability of Correct Classification, by True Outcome Across all Subjects for each MCMC Chain for a 2-stage model
Description
Compute the Mean Conditional Probability of Correct Classification, by True Outcome Across all Subjects for each MCMC Chain for a 2-stage model
Usage
pistar_by_chain_2stage(n_chains, chains_list, Z, n, n_cat)
Arguments
n_chains |
An integer specifying the number of MCMC chains to compute over. |
chains_list |
A numeric list containing the samples from |
Z |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
pistar_by_chain
returns a numeric matrix of the average
conditional probability P(Y^* = j | Y = j, Z)
across all subjects for
each MCMC chain. Rows of the matrix correspond to MCMC chains, up to n_chains
.
The first column contains the conditional probability P(Y^* = 1 | Y = 1, Z)
.
The second column contains the conditional probability P(Y^* = 2 | Y = 2, Z)
.
Compute Conditional Probability of Each Observed Outcome Given Each True Outcome, for Every Subject
Description
Compute Conditional Probability of Each Observed Outcome Given Each True Outcome, for Every Subject
Usage
pistar_compute(gamma, Z, n, n_cat)
Arguments
gamma |
A numeric matrix of regression parameters for the observed
outcome mechanism, |
Z |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
pistar_compute
returns a matrix of conditional probabilities,
P(Y_i^* = k | Y_i = j, Z_i) = \frac{\text{exp}\{\gamma_{kj0} + \gamma_{kjZ} Z_i\}}{1 + \text{exp}\{\gamma_{kj0} + \gamma_{kjZ} Z_i\}}
for each of the i = 1, \dots,
n
subjects. Rows of the matrix
correspond to each subject and observed outcome. Specifically, the probability
for subject i
and observed category $1$ occurs at row i
. The probability
for subject i
and observed category $2$ occurs at row i +
n
.
Columns of the matrix correspond to the true outcome categories j = 1, \dots,
n_cat
.
Compute Conditional Probability of Each Observed Outcome Given Each True Outcome for a given MCMC Chain, for Every Subject
Description
Compute Conditional Probability of Each Observed Outcome Given Each True Outcome for a given MCMC Chain, for Every Subject
Usage
pistar_compute_for_chains(chain_colMeans, Z, n, n_cat)
Arguments
chain_colMeans |
A numeric vector containing the posterior means for all
sampled parameters in a given MCMC chain. |
Z |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
pistar_compute_for_chains
returns a matrix of conditional probabilities,
P(Y_i^* = k | Y_i = j, Z_i) = \frac{\text{exp}\{\gamma_{kj0} + \gamma_{kjZ} Z_i\}}{1 + \text{exp}\{\gamma_{kj0} + \gamma_{kjZ} Z_i\}}
for each of the i = 1, \dots,
n
subjects. Rows of the matrix
correspond to each subject and observed outcome. Specifically, the probability
for subject i
and observed category $0$ occurs at row i
. The probability
for subject i
and observed category $1$ occurs at row i +
n
.
Columns of the matrix correspond to the true outcome categories j = 1, \dots,
n_cat
.
Compute Conditional Probability of Each Observed Outcome Given Each True Outcome for a given MCMC Chain, for Every Subject for 2-stage models
Description
Compute Conditional Probability of Each Observed Outcome Given Each True Outcome for a given MCMC Chain, for Every Subject for 2-stage models
Usage
pistar_compute_for_chains_2stage(chain_colMeans, Z, n, n_cat)
Arguments
chain_colMeans |
A numeric vector containing the posterior means for all
sampled parameters in a given MCMC chain. |
Z |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
pistar_compute_for_chains
returns a matrix of conditional probabilities,
P(Y_i^* = k | Y_i = j, Z_i) = \frac{\text{exp}\{\gamma_{kj0} + \gamma_{kjZ} Z_i\}}{1 + \text{exp}\{\gamma_{kj0} + \gamma_{kjZ} Z_i\}}
for each of the i = 1, \dots,
n
subjects. Rows of the matrix
correspond to each subject and observed outcome. Specifically, the probability
for subject i
and observed category $0$ occurs at row i
. The probability
for subject i
and observed category $1$ occurs at row i +
n
.
Columns of the matrix correspond to the true outcome categories j = 1, \dots,
n_cat
.
Compute the Mean Conditional Probability of Second-Stage Correct Classification, by First-Stage and True Outcome Across all Subjects for each MCMC Chain
Description
Compute the Mean Conditional Probability of Second-Stage Correct Classification, by First-Stage and True Outcome Across all Subjects for each MCMC Chain
Usage
pitilde_by_chain(n_chains, chains_list, V, n, n_cat)
Arguments
n_chains |
An integer specifying the number of MCMC chains to compute over. |
chains_list |
A numeric list containing the samples from |
V |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
pitilde_by_chain
returns a numeric matrix of the average
conditional probability P( \tilde{Y} = j | Y^* = j, Y = j, V)
across all subjects for
each MCMC chain. Rows of the matrix correspond to MCMC chains, up to n_chains
.
The first column contains the conditional probability P( \tilde{Y} = 1 | Y^* = 1, Y = 1, V)
.
The second column contains the conditional probability P( \tilde{Y} = 2 | Y^* = 2, Y = 2, V)
.
Compute Conditional Probability of Each Second-Stage Observed Outcome Given Each True Outcome and First-Stage Observed Outcome, for Every Subject
Description
Compute Conditional Probability of Each Second-Stage Observed Outcome Given Each True Outcome and First-Stage Observed Outcome, for Every Subject
Usage
pitilde_compute(delta, V, n, n_cat)
Arguments
delta |
A numeric array of regression parameters for the second-stage observed
outcome mechanism, |
V |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
pitilde_compute
returns an array of conditional probabilities,
P(\tilde{Y}_i = \ell | Y^*_i = k, Y_i = j, V_i) = \frac{\text{exp}\{\delta_{\ell kj0} + \delta_{\ell kjV} V_i\}}{1 + \text{exp}\{\delta_{\ell kj0} + \delta_{\ell kjV} V_i\}}
for each of the i = 1, \dots,
n
subjects. Rows of the matrix
correspond to each subject and second-stage observed outcome. Specifically, the probability
for subject i
and observed category $1$ occurs at row i
. The probability
for subject i
and observed category $2$ occurs at row i +
n
.
Columns of the matrix correspond to the first-stage outcome categories, k = 1, \dots,
n_cat
.
The third dimension of the array corresponds to the true outcome categories,
j = 1, \dots,
n_cat
.
Compute Conditional Probability of Each Observed Outcome Given Each True Outcome for a given MCMC Chain, for Every Subject
Description
Compute Conditional Probability of Each Observed Outcome Given Each True Outcome for a given MCMC Chain, for Every Subject
Usage
pitilde_compute_for_chains(chain_colMeans, V, n, n_cat)
Arguments
chain_colMeans |
A numeric vector containing the posterior means for all
sampled parameters in a given MCMC chain. |
V |
A numeric design matrix. |
n |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
pitilde_compute_for_chains
returns a matrix of conditional probabilities,
P(\tilde{Y}_i = \ell | Y^*_i = k, Y_i = j, V_i) = \frac{\text{exp}\{\delta_{\ell kj0} + \delta_{\ell kjV} V_i\}}{1 + \text{exp}\{\delta_{\ell kj0} + \delta_{\ell kjV} V_i\}}
corresponding to each subject and observed outcome. Specifically, the probability
for subject i
and second-stage observed category $1$ occurs at row i
. The probability
for subject i
and second-stage observed category $2$ occurs at row i +
n
.
Columns of the matrix correspond to the first-stage outcome categories j = 1, \dots,
n_cat
.
The third dimension of the array corresponds to the true outcome categories,
j = 1, \dots,
n_cat
.
M-Step Expected Log-Likelihood with respect to Beta
Description
Objective function of the form:
Q_\beta = \sum_{i = 1}^N \Bigl[ \sum_{j = 0}^1 w_{ij} \text{log} \{ \pi_{ij} \}\Bigr]
.
Used to obtain estimates of \beta
parameters.
Usage
q_beta_f(beta, X, w_mat, sample_size, n_cat)
Arguments
beta |
A numeric vector of regression parameters for the
|
X |
A numeric design matrix. |
w_mat |
Matrix of E-step weights obtained from |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
q_beta_f
returns the negative value of the expected log-likelihood function,
Q_\beta = \sum_{i = 1}^N \Bigl[ \sum_{j = 1}^2 w_{ij} \text{log} \{ \pi_{ij} \}\Bigr]
,
at the provided inputs.
M-Step Expected Log-Likelihood with respect to Delta
Description
Objective function of the form:
Q_{\delta} = \sum_{i = 1}^N \Bigl[\sum_{j = 1}^2 \sum_{k = 1}^2 \sum_{\ell = 1}^2 w_{ij} y^*_{ik} \tilde{y}_{i \ell} \text{log} \{ \tilde{\pi}_{i \ell kj} \}\Bigr]
.
Used to obtain estimates of \delta
parameters.
Usage
q_delta_f(
delta_v,
V,
obs_Ystar_matrix,
obs_Ytilde_matrix,
w_mat,
sample_size,
n_cat
)
Arguments
delta_v |
A numeric array of regression parameters for the second-stage observed
outcome mechanism, |
V |
A numeric design matrix. |
obs_Ystar_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
obs_Ytilde_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
w_mat |
Matrix of E-step weights obtained from |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
q_beta_f
returns the negative value of the expected log-likelihood function,
Q_{\delta} = \sum_{i = 1}^N \Bigl[\sum_{j = 1}^2 \sum_{k = 1}^2 \sum_{\ell = 1}^2 w_{ij} y^*_{ik} \tilde{y}_{i \ell} \text{log} \{ \tilde{\pi}_{i \ell kj} \}\Bigr]
,
at the provided inputs.
M-Step Expected Log-Likelihood with respect to Gamma
Description
Objective function of the form:
Q_{\gamma} = \sum_{i = 1}^N \Bigl[\sum_{j = 1}^2 \sum_{k = 1}^2 w_{ij} y^*_{ik} \text{log} \{ \pi^*_{ikj} \}\Bigr]
.
Used to obtain estimates of \gamma
parameters.
Usage
q_gamma_f(gamma_v, Z, obs_Y_matrix, w_mat, sample_size, n_cat)
Arguments
gamma_v |
A numeric vector of regression parameters for the observed
outcome mechanism, |
Z |
A numeric design matrix. |
obs_Y_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
w_mat |
Matrix of E-step weights obtained from |
sample_size |
An integer value specifying the number of observations in the sample.
This value should be equal to the number of rows of the design matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
q_beta_f
returns the negative value of the expected log-likelihood function,
Q_{\gamma} = \sum_{i = 1}^N \Bigl[\sum_{j = 1}^2 \sum_{k = 1}^2 w_{ij} y^*_{ik} \text{log} \{ \pi^*_{ikj} \}\Bigr]
,
at the provided inputs.
Sum Every "n"th Element
Description
Sum Every "n"th Element
Usage
sum_every_n(x, n)
Arguments
x |
A numeric vector to sum over |
n |
A numeric value specifying the distance between the reference index and the next index to be summed |
Value
sum_every_n
returns a vector of sums of every n
th element of the vector x
.
Sum Every "n"th Element, then add 1
Description
Sum Every "n"th Element, then add 1
Usage
sum_every_n1(x, n)
Arguments
x |
A numeric vector to sum over |
n |
A numeric value specifying the distance between the reference index and the next index to be summed |
Value
sum_every_n1
returns a vector of sums of every n
th element of the vector x
, plus 1.
Compute Probability of Each True Outcome, for Every Subject
Description
Compute the probability of the latent true outcome Y \in \{1, 2 \}
as
P(Y_i = j | X_i) = \frac{\exp(X_i \beta)}{1 + \exp(X_i \beta)}
for each of the i = 1, \dots,
n
subjects.
Usage
true_classification_prob(beta_matrix, x_matrix)
Arguments
beta_matrix |
A numeric column matrix of estimated regression parameters for the
true outcome mechanism, |
x_matrix |
A numeric matrix of covariates in the true outcome mechanism.
|
Value
true_classification_prob
returns a dataframe containing three columns.
The first column, Subject
, represents the subject ID, from 1
to n
,
where n
is the sample size, or equivalently, the number of rows in x_matrix
.
The second column, Y
, represents a true, latent outcome category Y \in \{1, 2 \}
.
The last column, Probability
, is the value of the equation
P(Y_i = j | X_i) = \frac{\exp(X_i \beta)}{1 + \exp(X_i \beta)}
computed
for each subject and true, latent outcome category.
Examples
set.seed(123)
sample_size <- 1000
cov1 <- rnorm(sample_size)
cov2 <- rnorm(sample_size, 1, 2)
x_matrix <- matrix(c(cov1, cov2), nrow = sample_size, byrow = FALSE)
estimated_betas <- matrix(c(1, -1, .5), ncol = 1)
P_Y <- true_classification_prob(estimated_betas, x_matrix)
head(P_Y)
Compute E-step for Binary Outcome Misclassification Model Estimated With the EM-Algorithm
Description
Compute E-step for Binary Outcome Misclassification Model Estimated With the EM-Algorithm
Usage
w_j(ystar_matrix, pistar_matrix, pi_matrix, sample_size, n_cat)
Arguments
ystar_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
pistar_matrix |
A numeric matrix of conditional probabilities obtained from
the internal function |
pi_matrix |
A numeric matrix of probabilities obtained from the internal
function |
sample_size |
An integer value specifying the number of observations in
the sample. This value should be equal to the number of rows of the observed
outcome matrix, |
n_cat |
The number of categorical values that the true outcome, |
Value
w_j
returns a matrix of E-step weights for the EM-algorithm,
computed as follows:
\sum_{k = 1}^2 \frac{y^*_{ik} \pi^*_{ikj} \pi_{ij}}{\sum_{\ell = 1}^2 \pi^*_{i k \ell} \pi_{i \ell}}
.
Rows of the matrix correspond to each subject. Columns of the matrix correspond
to the true outcome categories j = 1, \dots,
n_cat
.
Compute E-step for Two-Stage Binary Outcome Misclassification Model Estimated With the EM-Algorithm
Description
Compute E-step for Two-Stage Binary Outcome Misclassification Model Estimated With the EM-Algorithm
Usage
w_j_2stage(
ystar_matrix,
ytilde_matrix,
pitilde_array,
pistar_matrix,
pi_matrix,
sample_size,
n_cat
)
Arguments
ystar_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
ytilde_matrix |
A numeric matrix of indicator variables (0, 1) for the observed
outcome |
pitilde_array |
A numeric array of conditional probabilities obtained from
the internal function |
pistar_matrix |
A numeric matrix of conditional probabilities obtained from
the internal function |
pi_matrix |
A numeric matrix of probabilities obtained from the internal
function |
sample_size |
An integer value specifying the number of observations in
the sample. This value should be equal to the number of rows of the observed
outcome matrices, |
n_cat |
The number of categorical values that the true outcome, |
Value
w_j
returns a matrix of E-step weights for the EM-algorithm,
computed as follows:
\sum_{k = 1}^2 \sum_{\ell = 1}^2 \frac{y^*_{ik} \tilde{y}_{i \ell} \tilde{\pi}_{i \ell kj} \pi^*_{ikj} \pi_{ij}}{\sum_{h = 1}^2 \tilde{\pi}_{i \ell kh} \pi^*_{ikh} \pi_{ih}}
.
Rows of the matrix correspond to each subject. Columns of the matrix correspond
to the true outcome categories j = 1, \dots,
n_cat
.