Title: Detect Clinical Trial Sites Over- or Under-Reporting Clinical Events
Version: 1.0.0
Description: Monitoring reporting rates of subject-level clinical events (e.g. adverse events, protocol deviations) reported by clinical trial sites is an important aspect of risk-based quality monitoring strategy. Sites that are under-reporting or over-reporting events can be detected using bootstrap simulations during which patients are redistributed between sites. Site-specific distributions of event reporting rates are generated that are used to assign probabilities to the observed reporting rates. (Koneswarakantha 2024 <doi:10.1007/s43441-024-00631-8>).
URL: https://openpharma.github.io/simaerep/, https://github.com/openpharma/simaerep/
License: MIT + file LICENSE
Encoding: UTF-8
Depends: R (≥ 4.0), ggplot2
Imports: dplyr (≥ 1.1.0), tidyr (≥ 1.1.0), magrittr, purrr, rlang, stringr, forcats, cowplot, RColorBrewer, furrr (≥ 0.2.1), progressr, knitr, tibble, dbplyr, glue
Suggests: testthat, devtools, pkgdown, spelling, haven, vdiffr, lintr, DBI, duckdb, ggExtra
RoxygenNote: 7.3.2
Language: en-US
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2025-10-28 11:22:11 UTC; koneswab
Author: Bjoern Koneswarakantha ORCID iD [aut, cre, cph], F. Hoffmann-La Roche Ltd [cph]
Maintainer: Bjoern Koneswarakantha <bjoern.koneswarakantha@roche.com>
Repository: CRAN
Date/Publication: 2025-10-28 11:40:02 UTC

Pipe operator

Description

See magrittr::%>% for details.

Usage

lhs %>% rhs

Value

returns output of rhs function


Aggregate duplicated visits.

Description

Internal function called by check_df_visit().

Usage

aggr_duplicated_visits(df_visit, event_names = "ae")

Arguments

df_visit

dataframe with columns: study_id, site_number, patnum, visit, n_ae

event_names

vector, contains the event names, default = "ae"

Value

df_visit corrected


Integrity check for df_visit.

Description

Internal function used by all functions that accept df_visit as a parameter. Checks for NA columns, numeric visits and AEs, implicitly missing and duplicated visits.

Usage

check_df_visit(df_visit, event_names = c("event"))

Arguments

df_visit

dataframe with columns: study_id, site_number, patnum, visit, n_ae

event_names

vector, contains the event names, default = "ae"

Value

corrected df_visit

See Also

simaerep

Examples


df_visit <- sim_test_data_study(
  n_pat = 100,
  n_sites = 5,
  ratio_out = 0.4,
  factor_event_rate = 0.6
  ) %>%
  # internal functions require internal column names
  dplyr::rename(
    site_number = site_id,
    patnum = patient_id
  )

df_visit_filt <- df_visit %>%
  dplyr::filter(visit != 3)

df_visit_corr <- check_df_visit(df_visit_filt)
3 %in% df_visit_corr$visit
nrow(df_visit_corr) == nrow(df_visit)

df_visit_corr <- check_df_visit(dplyr::bind_rows(df_visit, df_visit))
nrow(df_visit_corr) == nrow(df_visit)


Evaluate sites.

Description

Correct under-reporting probabilities using p.adjust.

Usage

eval_sites(
  df_sim_sites,
  method = "BH",
  under_only = TRUE,
  visit_med75 = TRUE,
  ...
)

Arguments

df_sim_sites

dataframe generated by sim_sites or sim_inframe()

method

character, passed to stats::p.adjust(), if NULL no multiplicity correction will be made.

under_only

Logical, compute under-reporting probabilities only. only applies to the classic algorithm in which a one-sided evaluation can save computation time. Default: FALSE

visit_med75

Logical, should evaluation point visit_med75 be used. Compatible with inframe and classic version of the algorithm. Default: FALSE

...

use to pass r_sim_sites parameter to eval_sites_deprecated()

Value

dataframe with the following columns:

study_id

study identification

site_number

site identification

visit_med75

median(max(visit)) * 0.75

mean_ae_site_med75

mean AE at visit_med75 site level

mean_ae_study_med75

mean AE at visit_med75 study level

pval

p-value as returned by poisson.test

prob

bootstrapped probability

See Also

site_aggr, sim_sites, sim_inframe, p.adjust

Examples

df_visit <- sim_test_data_study(
  n_pat = 100,
  n_sites = 5,
  ratio_out = 0.4,
  factor_event_rate = 0.6
  ) %>%
  # internal functions require internal column names
  dplyr::rename(
    n_ae = n_event,
    site_number = site_id,
    patnum = patient_id
  )

df_site <- site_aggr(df_visit)

df_sim_sites <- sim_sites(df_site, df_visit, r = 100)

df_eval <- eval_sites(df_sim_sites)
df_eval


Expose implicitly missing visits.

Description

Internal function called by check_df_visit().

Usage

exp_implicit_missing_visits(df_visit, event_names = "ae")

Arguments

df_visit

dataframe with columns: study_id, site_number, patnum, visit, n_ae

event_names

vector, contains the event names, default = "ae"

Value

df_visit corrected


Get cumulative mean event development

Description

Calculate average increase of events per visit and cumulative average increase.

Usage

get_cum_mean_event_dev(
  df_visit,
  group = c("site_number", "study_id"),
  event_names = c("ae")
)

Arguments

df_visit

Data frame with columns: study_id, site_number, patnum, visit, n_ae.

group

character, grouping variable, one of: c("site_number", "study_id")

event_names

vector, contains the event names, default = "event"

Details

This is more stable than using mean cumulative patient count per visit as only a few patients will contribute to later visits. Here the impact of the later visits is reduced as they can only add or subtract to the results from earlier visits and not shift the mean independently.

Examples


df_visit <- sim_test_data_study(n_pat = 1000, n_sites = 10) %>%
  dplyr::rename(
    site_number = site_id,
    patnum = patient_id,
    n_ae = n_event
  )

get_cum_mean_event_dev(df_visit)
get_cum_mean_event_dev(df_visit, group = "study_id")


Get df_visit_test

Description

Get df_visit_test

Usage

get_df_visit_test()

Get df_visit_test mapped

Description

Get df_visit_test mapped

Usage

get_df_visit_test_mapped()

replace cowplot::get_legend, to silence warning Multiple components found; returning the first one. To return all, use 'return_all = TRUE

Description

replace cowplot::get_legend, to silence warning Multiple components found; returning the first one. To return all, use 'return_all = TRUE

Usage

get_legend(p)

Get Portfolio Configuration

Description

Get Portfolio configuration from a df_visit input dataframe. Will . filter studies with only a few sites and patients and will anonymize IDs. . Portfolio configuration can be used by sim_test_data_portfolio to generate data for an artificial portfolio.

Usage

get_portf_config(
  df_visit,
  check = TRUE,
  min_pat_per_study = 100,
  min_sites_per_study = 10,
  anonymize = TRUE,
  pad_width = 4
)

Arguments

df_visit

input dataframe with columns study_id, site_id, patient_id, visit, n_events. Can also be a lazy database table.

check

logical, perform standard checks on df_visit, Default: TRUE

min_pat_per_study

minimum number of patients per study, Default: 100

min_sites_per_study

minimum number of sites per study, Default: 10

anonymize

logical, Default: TRUE

pad_width

padding width for newly created IDs, Default: 4

Value

dataframe with the following columns:

study_id

study identification

event_per_visit_mean

mean event per visit per study

site_id

site

max_visit_sd

standard deviation of maximum patient visits per site

max_visit_mean

mean of maximum patient visits per site

n_pat

number of patients

See Also

sim_test_data_study get_portf_config sim_test_data_portfolio

Examples

df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10,
                                 ratio_out = 0.4, factor_event_rate = - 0.6,
                                 study_id = "A")


df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10,
                                 ratio_out = 0.2, factor_event_rate = - 0.1,
                                 study_id = "B")


df_visit <- dplyr::bind_rows(df_visit1, df_visit2)


get_portf_config(df_visit)


# Database example
con <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:")
dplyr::copy_to(con, df_visit, "visit")
tbl_visit <- dplyr::tbl(con, "visit")
get_portf_config(tbl_visit)
DBI::dbDisconnect(con)


Get Portfolio Event Rates Calculates mean event rates per study and visit in a df_visit simaerep input dataframe.

Description

Get Portfolio Event Rates Calculates mean event rates per study and visit in a df_visit simaerep input dataframe.

Usage

get_portf_event_rates(df_visit, check = TRUE, anonymize = TRUE, pad_width = 4)

Arguments

df_visit

input dataframe with columns study_id, site_id, patient_id, visit, n_events. Can also be a lazy database table.

check

logical, perform standard checks on df_visit, Default: TRUE

anonymize

logical, Default: TRUE

pad_width

padding width for newly created IDs, Default: 4

Examples


df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10,
                                 ratio_out = 0.4, factor_event_rate = - 0.6,
                                 study_id = "A")


df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10,
                                 ratio_out = 0.2, factor_event_rate = - 0.1,
                                 study_id = "B")


df_visit <- dplyr::bind_rows(df_visit1, df_visit2)


get_portf_event_rates(df_visit)


# Database example
con <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:")
dplyr::copy_to(con, df_visit, "visit")
tbl_visit <- dplyr::tbl(con, "visit")
get_portf_event_rates(tbl_visit)
DBI::dbDisconnect(con)


Get site mean ae development.

Description

Internal function used by site_aggr(), returns mean AE development from visit 0 to visit_med75.

Usage

get_site_mean_ae_dev(df_visit, df_pat, df_site, event_names = c("ae"))

Arguments

df_visit

dataframe

df_pat

dataframe as returned by pat_aggr()

df_site

dataframe as returned by site_aggr()

event_names

vector, contains the event names, default = "ae"

Value

dataframe


Get visit_med75.

Description

Internal function used by site_aggr().

Usage

get_visit_med75(df_pat, method = "med75_adj", min_pat_pool = 0.2)

Arguments

df_pat

dataframe as returned by pat_aggr()

method

character, one of c("med75", "med75_adj", "max") defining method for defining evaluation point visit_med75 (see details), Default: "med75_adj"

min_pat_pool

double, minimum ratio of available patients available for sampling. Determines maximum visit_med75 value see Details. Default: 0.2

Value

dataframe


is orivisit class

Description

internal function

Usage

is_orivisit(x)

Arguments

x

object

Value

logical


is simaerep class

Description

internal function

Usage

is_simaerep(x)

Arguments

x

object

Value

logical


Calculate Max Rank

Description

like rank() with ties.method = "max", works on tbl objects

Usage

max_rank(df, col, col_new)

Arguments

df

dataframe

col

character column name to rank y

col_new

character column name for rankings

Details

this is needed for hochberg p value adjustment. We need to assign higher rank when multiple sites have same p value

Examples


df <- tibble::tibble(s = c(1, 2, 2, 2, 5, 10)) %>%
 dplyr::mutate(
   rank = rank(s, ties.method = "max")
 )

df %>%
 simaerep:::max_rank("s", "max_rank")

# Database
con <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:")

dplyr::copy_to(con, df, "df")
simaerep:::max_rank(dplyr::tbl(con, "df"), "s", "max_rank")

DBI::dbDisconnect(con)


create orivisit object

Description

Internal S3 object, stores lazy reference to original visit data.

Usage

orivisit(
  df_visit,
  call = NULL,
  env = parent.frame(),
  event_names = c("event"),
  col_names = list(study_id = "study_id", site_id = "site_id", patient_id = "patient_id",
    visit = "visit")
)

Arguments

df_visit

Data frame with columns: study_id, site_number, patnum, visit, n_ae.

call

optional, provide call, Default: NULL

env

Optional, provide environment of original visit data. Default: parent.frame().

event_names

vector, contains the event names, default = "event"

col_names

named list, indicate study_id, site_id, patient_id and visit column in df_visit input dataframe. Default: list( study_id = "study_id", site_id = "site_id", patient_id = "patient_id", visit = "visit" )

Details

Saves variable name of original visit data, checks whether it can be retrieved from parent environment and stores summary. Original data can be retrieved using as.data.frame(x).

Value

orivisit object

Examples


df_visit <- sim_test_data_study(
  n_pat = 100,
  n_sites = 5,
  ratio_out = 0.4,
  factor_event_rate = - 0.6
)#'

visit <- orivisit(df_visit)

object.size(df_visit)
object.size(visit)

as.data.frame(visit)


benjamini hochberg p value correction using table operations

Description

benjamini hochberg p value correction using table operations

Usage

p_adjust_bh_inframe(df_eval, cols)

Aggregate visit to patient level.

Description

Internal function used by site_aggr() and plot_visit_med75(), adds the maximum visit for each patient.

Usage

pat_aggr(df_visit)

Arguments

df_visit

dataframe

Value

dataframe


Create a study specific patient pool for sampling

Description

Internal function for sim_sites, filter all visits greater than max_visit_med75_study returns dataframe with one column for studies and one column with nested patient data.

Usage

pat_pool(df_visit, df_site)

Arguments

df_visit

dataframe, created by sim_sites

df_site

dataframe created by site_aggr

Value

dataframe with nested pat_pool column

Examples

df_visit <- sim_test_data_study(
  n_pat = 100,
  n_sites = 5,
  ratio_out = 0.4,
  factor_event_rate = 0.6
  ) %>%
  # internal functions require internal column names
  dplyr::rename(
    n_ae = n_event,
    site_number = site_id,
    patnum = patient_id
  )

df_site <- site_aggr(df_visit)

df_pat_pool <- simaerep:::pat_pool(df_visit, df_site)

df_pat_pool

plot AE under-reporting simulation results

Description

generic plot function for simaerep objects

Usage

## S3 method for class 'simaerep'
plot(
  x,
  ...,
  study = NULL,
  what = c("prob", "med75"),
  n_sites = 16,
  df_visit = NULL,
  env = parent.frame(),
  plot_event = x$event_names[1]
)

Arguments

x

simaerep object

...

additional parameters passed to plot_study() or plot_visit_med75()

study

character specifying study to be plotted, Default: NULL

what

one of c("ur", "med75"), specifying whether to plot site AE under-reporting or visit_med75 values, Default: 'ur'

n_sites

number of sites to plot, Default: 16

df_visit

optional, pass original visit data if it cannot be retrieved from parent environment, Default: NULL

env

optional, pass environment from which to retrieve original visit data, Default: parent.frame()

plot_event

vector containing the events that should be plotted, default = "ae"

Details

see plot_study() and plot_visit_med75()

Value

ggplot object

Examples


df_visit <- sim_test_data_study(
  n_pat = 100,
  n_sites = 5,
  ratio_out = 0.4,
  factor_event_rate = - 0.6
)

evrep <- simaerep(df_visit)

plot(evrep, what = "prob", study = "A")
plot(evrep, what = "med75", study = "A")


Plots AE per site as dots.

Description

This plot is meant to supplement the package documentation.

Usage

plot_dots(
  df,
  nrow = 10,
  ncols = 10,
  col_group = "site",
  thresh = NULL,
  color_site_a = "#BDBDBD",
  color_site_b = "#757575",
  color_site_c = "gold3",
  color_high = "#00695C",
  color_low = "#25A69A",
  size_dots = 10
)

Arguments

df

dataframe, cols = c('site', 'patients', 'n_ae')

nrow

integer, number of rows, Default: 10

ncols

integer, number of columns, Default: 10

col_group

character, grouping column, Default: 'site'

thresh

numeric, threshold to determine color of mean_ae annotation, Default: NULL

color_site_a

character, hex color value, Default: '#BDBDBD'

color_site_b

character, hex color value, Default: '#757575'

color_site_c

character, hex color value, Default: 'gold3'

color_high

character, hex color value, Default: '#00695C'

color_low

character, hex color value, Default: '#25A69A'

size_dots

integer, Default: 10

Value

ggplot object

Examples

study <- tibble::tibble(
  site = LETTERS[1:3],
  patients = c(list(seq(1, 50, 1)), list(seq(1, 40, 1)), list(seq(1, 10, 1)))
) %>%
  tidyr::unnest(patients) %>%
  dplyr::mutate(n_ae = as.integer(runif(min = 0, max = 10, n = nrow(.))))

plot_dots(study)

Plot simulation example.

Description

This plots supplements the package documentation.

Usage

plot_sim_example(
  substract_ae_per_pat = 0,
  size_dots = 10,
  size_raster_label = 12,
  color_site_a = "#BDBDBD",
  color_site_b = "#757575",
  color_site_c = "gold3",
  color_high = "#00695C",
  color_low = "#25A69A",
  title = TRUE,
  legend = TRUE,
  seed = 5
)

Arguments

substract_ae_per_pat

integer, subtract aes from patients at site C, Default: 0

size_dots

integer, Default: 10

size_raster_label

integer, Default: 12

color_site_a

character, hex color value, Default: '#BDBDBD'

color_site_b

character, hex color value, Default: '#757575'

color_site_c

character, hex color value, Default: 'gold3'

color_high

character, hex color value, Default: '#00695C'

color_low

character, hex color value, Default: '#25A69A'

title

logical, include title, Default: T

legend

logical, include legend, Default: T

seed

pass seed for simulations Default: 5

Details

uses plot_dots() and adds 2 simulation panels, uses made-up site config with three sites A,B,C simulating site C

Value

ggplot

See Also

get_legend,plot_grid

Examples


plot_sim_example(size_dots = 5)


Plot multiple simulation examples.

Description

This plot is meant to supplement the package documentation.

Usage

plot_sim_examples(substract_ae_per_pat = c(0, 1, 3), ...)

Arguments

substract_ae_per_pat

integer, Default: c(0, 1, 3)

...

parameters passed to plot_sim_example()

Details

This function is a wrapper for plot_sim_example()

Value

ggplot

See Also

ggdraw,draw_label,plot_grid

Examples


plot_sim_examples(size_dot = 3, size_raster_label = 10)
plot_sim_examples()


Plot ae development of study and sites highlighting at risk sites.

Description

Most suitable visual representation of the AE under-reporting statistics.

Usage

plot_study(
  df_visit,
  df_site,
  df_eval,
  study,
  n_sites = 16,
  prob_col = "prob",
  event_names = c("ae"),
  plot_event = "ae",
  mult_corr = FALSE,
  delta = TRUE
)

Arguments

df_visit

dataframe, created by sim_sites()

df_site

dataframe created by site_aggr()

df_eval

dataframe created by eval_sites()

study

study

n_sites

integer number of most at risk sites, Default: 16

prob_col

character, denotes probability column, Default: "prob_low_prob_ur"

event_names

vector, contains the event names, default = "ae"

plot_event

vector containing the events that should be plotted, default = "ae"

mult_corr

Logical, multiplicity correction, Default: TRUE

delta

logical, show delta events on plot

Details

Left panel shows mean AE reporting per site (lightblue and darkblue lines) against mean AE reporting of the entire study (golden line). Single sites are plotted in descending order by AE under-reporting probability on the right panel in which grey lines denote cumulative AE count of single patients. Grey dots in the left panel plot indicate sites that were picked for single plotting. AE under-reporting probability of dark blue lines crossed threshold of 95%. Numbers in the upper left corner indicate the ratio of patients that have been used for the analysis against the total number of patients. Patients that have not been on the study long enough to reach the evaluation point (visit_med75) will be ignored.

Value

ggplot

Examples


df_visit <- sim_test_data_study(
  n_pat = 100,
  n_sites = 5,
  ratio_out = 0.4,
  factor_event_rate = 0.6
  ) %>%
  # internal functions require internal column names
  dplyr::rename(
    n_ae = n_event,
    site_number = site_id,
    patnum = patient_id
  )

df_site <- site_aggr(df_visit)

df_sim_sites <- sim_sites(df_site, df_visit, r = 100)

df_eval <- eval_sites(df_sim_sites)

simaerep:::plot_study(df_visit, df_site, df_eval, study = "A")



Plot patient visits against visit_med75.

Description

Plots cumulative AEs against visits for patients at sites of given study and compares against visit_med75.

Usage

plot_visit_med75(
  df_visit,
  df_site = NULL,
  study_id_str,
  n_sites = 6,
  min_pat_pool = 0.2,
  verbose = TRUE,
  event_names = "ae",
  plot_event = "ae",
  ...
)

Arguments

df_visit

dataframe

df_site

dataframe, as returned by site_aggr()

study_id_str

character, specify study in study_id column

n_sites

integer, Default: 6

min_pat_pool

double, minimum ratio of available patients available for sampling. Determines maximum visit_med75 value see Details. Default: 0.2

verbose

logical, Default: TRUE

event_names

vector, contains the event names, default = "ae"

plot_event

vector containing the events that should be plotted, default = "ae"

...

not used

Value

ggplot

Examples

df_visit <- sim_test_data_study(
  n_pat = 120,
  n_sites = 6,
  ratio_out = 0.4,
  factor_event_rate = - 0.6
 ) %>%
 dplyr::rename(
  site_number = site_id,
  patnum = patient_id,
  n_ae = n_event
 )

df_site <- site_aggr(df_visit)

simaerep:::plot_visit_med75(df_visit, df_site, study_id_str = "A", n_site = 6)

Poisson test for vector with site AEs vs vector with study AEs.

Description

Internal function used by simaerep.

Usage

poiss_test_site_ae_vs_study_ae(site_ae, study_ae, visit_med75)

Arguments

site_ae

vector with AE numbers

study_ae

vector with AE numbers

visit_med75

integer

Details

sets pvalue=1 if mean AE site is greater than mean AE study or ttest gives error

Value

pval

See Also

sim_sites()

Examples

simaerep:::poiss_test_site_ae_vs_study_ae(
   site_ae = c(5, 3, 3, 2, 1, 6),
   study_ae = c(9, 8, 7, 9, 6, 7, 8),
   visit_med75 = 10
)

simaerep:::poiss_test_site_ae_vs_study_ae(
   site_ae = c(11, 9, 8, 6, 3),
   study_ae = c(9, 8, 7, 9, 6, 7, 8),
   visit_med75 = 10
)

Prepare data for simulation.

Description

Internal function called by sim_sites. Collect AEs per patient at visit_med75 for site and study as a vector of integers.

Usage

prep_for_sim(df_site, df_visit)

Arguments

df_site

dataframe created by site_aggr

df_visit

dataframe, created by sim_sites

Value

dataframe

See Also

sim_sites, sim_after_prep

Examples

df_visit <- sim_test_data_study(
  n_pat = 100,
  n_sites = 5,
  ratio_out = 0.4,
  factor_event_rate = 0.6
  ) %>%
  # internal functions require internal column names
  dplyr::rename(
    n_ae = n_event,
    site_number = site_id,
    patnum = patient_id
  )

df_site <- site_aggr(df_visit)

df_prep <- simaerep:::prep_for_sim(df_site, df_visit)
df_prep

Print method for orivisit objects

Description

Print method for orivisit objects

Usage

## S3 method for class 'orivisit'
print(x, ..., n = 10)

Arguments

x

An object of class 'orivisit'

...

Additional arguments passed to print (not used)

n

Number of rows to display from the data frame (default: 10)


Print method for simaerep objects

Description

Print method for simaerep objects

Usage

## S3 method for class 'simaerep'
print(x, ..., n = 10)

Arguments

x

An object of class 'simaerep'

...

Additional arguments passed to print (not used)

n

Number of rows to display from df_eval (default: 5)


Calculate bootstrapped probability for obtaining a lower site mean AE number.

Description

Internal function used by sim_sites()

Usage

prob_lower_site_ae_vs_study_ae(site_ae, study_ae, r = 1000, under_only = TRUE)

Arguments

site_ae

vector with AE numbers

study_ae

vector with AE numbers

r

integer, denotes number of simulations, default = 1000

under_only

compute under-reporting probabilities only, default = TRUE

Details

sets pvalue=1 if mean AE site is greater than mean AE study

Value

pval

See Also

sim_sites()

Examples

simaerep:::prob_lower_site_ae_vs_study_ae(
  site_ae = c(5, 3, 3, 2, 1, 6),
  study_ae = c(9, 8, 7, 9, 6, 7, 8)
)

prune visits to visit_med75 using table operations

Description

prune visits to visit_med75 using table operations

Usage

prune_to_visit_med75_inframe(df_visit, df_site)

Arguments

df_visit

Data frame with columns: study_id, site_number, patnum, visit, n_ae.

df_site

dataframe, as returned by site_aggr()


Execute a purrr or furrr function with a progress bar.

Description

Internal utility function.

Usage

purrr_bar(
  ...,
  .purrr,
  .f,
  .f_args = list(),
  .purrr_args = list(),
  .steps,
  .slow = FALSE,
  .progress = TRUE
)

Arguments

...

iterable arguments passed to .purrr

.purrr

purrr or furrr function

.f

function to be executed over iterables

.f_args

list of arguments passed to .f, Default: list()

.purrr_args

list of arguments passed to .purrr, Default: list()

.steps

integer number of iterations

.slow

logical slows down execution, Default: FALSE

.progress

logical, show progress bar, Default: TRUE

Details

Call still needs to be wrapped in with_progress or with_progress_cnd()

Value

result of function passed to .f

Examples

# purrr::map
progressr::with_progress(
  purrr_bar(rep(0.25, 5), .purrr = purrr::map, .f = Sys.sleep, .steps = 5)
)


# purrr::walk
progressr::with_progress(
 purrr_bar(rep(0.25, 5), .purrr = purrr::walk,.f = Sys.sleep, .steps = 5)
)

# progress bar off
progressr::with_progress(
  purrr_bar(
    rep(0.25, 5), .purrr = purrr::walk,.f = Sys.sleep, .steps = 5, .progress = FALSE
  )
)

# purrr::map2
progressr::with_progress(
  purrr_bar(
    rep(1, 5), rep(2, 5),
    .purrr = purrr::map2,
    .f = `+`,
    .steps = 5,
    .slow = TRUE
 )
)

# purrr::pmap
progressr::with_progress(
  purrr_bar(
    list(rep(1, 5), rep(2, 5)),
    .purrr = purrr::pmap,
    .f = `+`,
    .steps = 5,
    .slow = TRUE
 )
)

# define function within purr_bar() call
progressr::with_progress(
  purrr_bar(
    list(rep(1, 5), rep(2, 5)),
    .purrr = purrr::pmap,
    .f = function(x, y) {
      paste0(x, y)
    },
    .steps = 5,
    .slow = TRUE
 )
)

# with mutate
progressr::with_progress(
 tibble::tibble(x = rep(0.25, 5)) %>%
  dplyr::mutate(x = purrr_bar(x, .purrr = purrr::map, .f = Sys.sleep, .steps = 5))
)


renames internal simaerep col_names to externally applied colnames

Description

renames internal simaerep col_names to externally applied colnames

Usage

remap_col_names(df, col_names)

Start simulation after preparation.

Description

Internal function called by sim_sites after prep_for_sim

Usage

sim_after_prep(
  df_sim_prep,
  r = 1000,
  poisson_test = FALSE,
  prob_lower = TRUE,
  progress = FALSE,
  under_only = TRUE
)

Arguments

df_sim_prep

dataframe as returned by prep_for_sim

r

integer, denotes number of simulations, default = 1000

poisson_test

logical, calculates poisson.test pvalue

prob_lower

logical, calculates probability for getting a lower value

progress

logical, display progress bar, Default = TRUE

under_only

compute under-reporting probabilities only, default = TRUE check_df_visit(), computationally expensive on large data sets. Default: TRUE

Value

dataframe

See Also

sim_sites, prep_for_sim

Examples

df_visit <- sim_test_data_study(
  n_pat = 100,
  n_sites = 5,
  ratio_out = 0.4,
  factor_event_rate = 0.6
  ) %>%
  # internal functions require internal column names
  dplyr::rename(
    n_ae = n_event,
    site_number = site_id,
    patnum = patient_id
  )

df_site <- site_aggr(df_visit)

df_prep <- simaerep:::prep_for_sim(df_site, df_visit)

df_sim <- simaerep:::sim_after_prep(df_prep)

df_sim

Calculate prob for study sites using table operations

Description

Calculate prob for study sites using table operations

Usage

sim_inframe(df_visit, r = 1000, df_site = NULL, event_names = c("ae"))

Arguments

df_visit

Data frame with columns: study_id, site_number, patnum, visit, n_ae.

r

Integer or tbl_object, number of repetitions for bootstrap simulation. Pass a tbl object referring to a table with one column and as many rows as desired repetitions. Default: 1000.

df_site

dataframe as returned be site_aggr(), Will switch to visit_med75. Default: NULL

event_names

vector, contains the event names, default = "event"

Examples

df_visit <- sim_test_data_study(
  n_pat = 100,
  n_sites = 5,
  ratio_out = 0.4,
  factor_event_rate = - 0.6
) %>%
dplyr::rename(
  site_number = site_id,
  patnum = patient_id,
  n_ae = n_event
)

df_sim <- simaerep:::sim_inframe(df_visit)

simulate under-reporting

Description

we remove a fraction of events from a specific site

Usage

sim_out(df_visit, study_id, site_id, factor_event)

Arguments

df_visit

dataframe

study_id

character

site_id

character

factor_event

double, negative values for under-reporting positive for for over-reporting.

Details

we determine the absolute number of events per patient for removal. Then them remove them at the first visit. We intentionally allow fractions

Examples

df_visit <- sim_test_data_study(n_pat = 100, n_sites = 10)


df_ur <- sim_out(df_visit, "A", site_id = "S0001", factor_event = - 0.35)

# Example cumulated event for first patient with 35% under-reporting
df_ur[df_ur$site_id == "S0001" & df_ur$patient_id == "P000001",]$n_event

# Example cumulated event for first patient with no under-reporting
df_visit[df_visit$site_id == "S0001" & df_visit$patient_id == "P000001",]$n_event


simulate patients and events for sites supports constant and non-constant event rates

Description

simulate patients and events for sites supports constant and non-constant event rates

Usage

sim_pat(vs_max, vs_sd, is_out, event_rates, event_names, factor_event_rate)

Calculate prob_lower and poisson.test pvalue for study sites.

Description

Collects the number of AEs of all eligible patients that meet visit_med75 criteria of site. Then calculates poisson.test pvalue and bootstrapped probability of having a lower mean value. Used by simaerep_classic()

Usage

sim_sites(
  df_site,
  df_visit,
  r = 1000,
  poisson_test = TRUE,
  prob_lower = TRUE,
  progress = TRUE,
  under_only = TRUE
)

Arguments

df_site

dataframe created by site_aggr

df_visit

dataframe, created by sim_sites

r

integer, denotes number of simulations, default = 1000

poisson_test

logical, calculates poisson.test pvalue

prob_lower

logical, calculates probability for getting a lower value

progress

logical, display progress bar, Default = TRUE

under_only

compute under-reporting probabilities only, default = TRUE check_df_visit(), computationally expensive on large data sets. Default: TRUE

Value

dataframe with the following columns:

study_id

study identification

site_number

site identification

n_pat

number of patients at site

visit_med75

median(max(visit)) * 0.75

n_pat_with_med75

number of patients at site with med75

mean_ae_site_med75

mean AE at visit_med75 site level

mean_ae_study_med75

mean AE at visit_med75 study level

n_pat_with_med75_study

number of patients at study with med75 excl. site

pval

p-value as returned by poisson.test

prob_low

bootstrapped probability for having mean_ae_site_med75 or lower

See Also

sim_sites, site_aggr, pat_pool, prob_lower_site_ae_vs_study_ae, poiss_test_site_ae_vs_study_ae, sim_sites, prep_for_sim simaerep_classic

Examples

df_visit <- sim_test_data_study(
  n_pat = 100,
  n_sites = 5,
  ratio_out = 0.4,
  factor_event_rate = 0.6
  ) %>%
  # internal functions require internal column names
  dplyr::rename(
    n_ae = n_event,
    site_number = site_id,
    patnum = patient_id
  )

df_site <- site_aggr(df_visit)

df_sim_sites <- sim_sites(df_site, df_visit, r = 100)

df_sim_sites %>%
 knitr::kable(digits = 2)

simulate test data events

Description

generates multi-event data using sim_test_data_study()

Usage

sim_test_data_events(
  n_pat = 100,
  n_sites = 5,
  event_rates = c(NULL),
  event_names = list("event")
)

Arguments

n_pat

integer, number of patients, Default: 100

n_sites

integer, number of sites, Default: 5

event_rates

vector with visit-specific event rates, Default: Null

event_names

vector, contains the event names, default = "event"

Value

tibble with columns site_id, patient_id, is_ur, max_visit_mean, max_visit_sd, visit, and event data (events_per_visit_mean and n_events)


simulate patient event reporting test data

Description

helper function for sim_test_data_study()

Usage

sim_test_data_patient(
  .f_sample_max_visit = function() rnorm(1, mean = 20, sd = 4),
  .f_sample_event_per_visit = function(max_visit) rpois(max_visit, 0.5)
)

Arguments

.f_sample_max_visit

function used to sample the maximum number of events, Default: function() rnorm(1, mean = 20, sd = 4)

.f_sample_event_per_visit

function used to sample the events for each visit, Default: function(x) rpois(x, 0.5)

Details

""

Value

vector containing cumulative events

Examples

replicate(5, sim_test_data_patient())
replicate(5, sim_test_data_patient(
    .f_sample_event_per_visit = function(x) rpois(x, 1.2))
  )
replicate(5, sim_test_data_patient(
    .f_sample_max_visit = function() rnorm(1, mean = 5, sd = 5))
  )

Simulate Portfolio Test Data

Description

Simulate visit level data from a portfolio configuration.

Usage

sim_test_data_portfolio(
  df_config,
  df_event_rates = NULL,
  progress = TRUE,
  parallel = TRUE
)

Arguments

df_config

dataframe as returned by get_portf_config

df_event_rates

dataframe with event rates. Default: NULL

progress

logical, Default: TRUE

parallel

logical activate parallel processing, see details, Default: FALSE

Details

uses sim_test_data_study. We use the furrr package to implement parallel processing as these simulations can take a long time to run. For this to work we need to specify the plan for how the code should run, e.g. 'plan(multisession, workers = 3)

Value

dataframe with the following columns:

study_id

study identification

event_per_visit_mean

mean event per visit per study

site_id

site

max_visit_sd

standard deviation of maximum patient visits per site

max_visit_mean

mean of maximum patient visits per site

patient_id

number of patients

visit

visit number

n_event

cumulative sum of events

See Also

sim_test_data_study get_portf_config sim_test_data_portfolio

Examples


df_visit1 <- sim_test_data_study(n_pat = 100, n_sites = 10,
                                 ratio_out = 0.4, factor_event_rate = 0.6,
                                 study_id = "A")

df_visit2 <- sim_test_data_study(n_pat = 100, n_sites = 10,
                                 ratio_out = 0.2, factor_event_rate = 0.1,
                                 study_id = "B")


df_visit <- dplyr::bind_rows(df_visit1, df_visit2)

df_config <- get_portf_config(df_visit)

df_config

df_portf <- sim_test_data_portfolio(df_config)

df_portf



simulate study test data

Description

evenly distributes a number of given patients across a number of given sites. Then simulates event reporting of each patient reducing the number of reported events for patients distributed to event-under-reporting sites.

Usage

sim_test_data_study(
  n_pat = 1000,
  n_sites = 20,
  ratio_out = 0,
  factor_event_rate = 0,
  max_visit_mean = 20,
  max_visit_sd = 4,
  event_rates = dgamma(seq(1, 20, 0.5), shape = 5, rate = 2) * 5 + 0.1,
  event_names = c("event"),
  study_id = "A"
)

Arguments

n_pat

integer, number of patients, Default: 1000

n_sites

integer, number of sites, Default: 20

ratio_out

ratio of sites with outlier, Default: 0

factor_event_rate

event reporting rate factor for site outlier, will modify mean event per visit rate used for outlier sites. Negative Values will simulate under-reporting, positive values over-reporting, e.g. -0.4 -> 40% under-reporting, +0.4 -> 40% over-reporting Default: 0

max_visit_mean

mean of the maximum number of visits of each patient, Default: 20

max_visit_sd

standard deviation of maximum number of visits of each patient, Default: 4

event_rates

list or vector with visit-specific event rates. Use list for multiple event names, Default: dgamma(seq(1, 20, 0.5), shape = 5, rate = 2) * 5 + 0.1

event_names

vector, contains the event names, default = "event"

study_id

character, Default: "A"

Details

maximum visit number will be sampled from normal distribution with characteristics derived from max_visit_mean and max_visit_sd, while the events per visit will be sampled from a poisson distribution described by events_per_visit_mean.

Value

tibble with columns site_id, patient_id, is_out, max_visit_mean, max_visit_sd, event_per_visit_mean, visit, n_event

Examples

set.seed(1)
# no outlier
df_visit <- sim_test_data_study(n_pat = 100, n_sites = 5)
df_visit[which(df_visit$patient_id == "P000001"),]

# under-reporting outlier
df_visit <- sim_test_data_study(n_pat = 100, n_sites = 5,
    ratio_out = 0.2, factor_event_rate = -0.5)
df_visit[which(df_visit$patient_id == "P000001"),]

# constant event rates
sim_test_data_study(n_pat = 100, n_sites = 5, event_rates = 0.5)

# non-constant event rates for two event types
event_rates_ae <- c(0.7, rep(0.5, 8), rep(0.3, 5))
event_rates_pd <- c(0.3, rep(0.4, 6), rep(0.1, 5))

sim_test_data_study(
n_pat = 100,
n_sites = 5,
event_names = c("ae", "pd"),
event_rates = list(event_rates_ae, event_rates_pd)
)


Create simaerep object

Description

Simulate AE under-reporting probabilities.

Usage

simaerep(
  df_visit,
  r = 1000,
  check = TRUE,
  under_only = FALSE,
  visit_med75 = FALSE,
  inframe = TRUE,
  progress = TRUE,
  mult_corr = TRUE,
  poisson_test = FALSE,
  env = parent.frame(),
  event_names = c("event"),
  col_names = list(study_id = "study_id", site_id = "site_id", patient_id = "patient_id",
    visit = "visit")
)

simaerep_inframe(
  df_visit,
  r = 1000,
  under_only = FALSE,
  visit_med75 = FALSE,
  check = TRUE,
  env = parent.frame(),
  event_names = c("event"),
  mult_corr = FALSE,
  col_names = list(study_id = "study_id", site_id = "site_id", patient_id = "patient_id",
    visit = "visit")
)

simaerep_classic(
  df_visit,
  check = TRUE,
  progress = TRUE,
  env = parent.frame(),
  under_only = TRUE,
  r = 1000,
  mult_corr = FALSE,
  poisson_test = FALSE,
  event_names = "event",
  col_names = list(study_id = "study_id", site_id = "site_id", patient_id = "patient_id",
    visit = "visit")
)

Arguments

df_visit

Data frame with columns: study_id, site_number, patnum, visit, n_ae.

r

Integer or tbl_object, number of repetitions for bootstrap simulation. Pass a tbl object referring to a table with one column and as many rows as desired repetitions. Default: 1000.

check

Logical, perform data check and attempt repair with check_df_visit(). Computationally expensive on large data sets. Default: TRUE.

under_only

Logical, compute under-reporting probabilities only. only applies to the classic algorithm in which a one-sided evaluation can save computation time. Default: FALSE

visit_med75

Logical, should evaluation point visit_med75 be used. Compatible with inframe and classic version of the algorithm. Default: FALSE

inframe

Logical, when FALSE classic simaerep algorithm will be used. The default inframe method uses only table operations and is compatible with dbplyr supported database backends. Default: TRUE

progress

Logical, display progress bar. Default: TRUE.

mult_corr

Logical, multiplicity correction, Default: TRUE

poisson_test

logical, compute p-value with poisson test, only supported by the classic algorithm using visit_med75. Default: FALSE

env

Optional, provide environment of original visit data. Default: parent.frame().

event_names

vector, contains the event names, default = "event"

col_names

named list, indicate study_id, site_id, patient_id and visit column in df_visit input dataframe. Default: list( study_id = "study_id", site_id = "site_id", patient_id = "patient_id", visit = "visit" )

Details

Executes site_aggr(), sim_sites(), and eval_sites() on original visit data and stores all intermediate results. Stores lazy reference to original visit data for facilitated plotting using generic plot(x).

Value

A simaerep object. Results are contained in the attached df_eval dataframe.

Column Name Description Type
study_id The study ID Character
site_id. The site ID Character
(event)_count Site event count Numeric
(event)_per_visit_site Site Ratio of event count divided by visits Numeric
visits Site visit count Numeric
n_pat Site patient count Numeric
(event)_per_visit_study Simulated study ratio Numeric
(event)_prob Site event ratio probability from -1 to 1 Numeric
(event)_delta Difference expected vs reported events Numeric

See Also

site_aggr, sim_sites, eval_sites, orivisit, plot.simaerep, print.simaerep, simaerep_inframe

Examples

df_visit <- sim_test_data_study(
  n_pat = 100,
  n_sites = 5,
  ratio_out = 0.4,
  factor_event_rate = - 0.6
)

evrep <- simaerep(df_visit)
evrep
str(evrep)

# simaerep classic algorithm

evrep <- simaerep(df_visit, inframe = FALSE, under_only = TRUE, mult_corr = TRUE)
evrep

# multiple events

df_visit_events_test <- sim_test_data_study(
  n_pat = 100,
  n_sites = 5,
  ratio_out = 0.4,
  factor_event_rate = - 0.6,
  event_rates = list(0.5, 0.3),
  event_names = c("ae", "pd")
)

evsrep <- simaerep(df_visit_events_test, inframe = TRUE, event_names = c("ae", "pd"))

evsrep


# Database example
con <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:")
df_r <- tibble::tibble(rep = seq(1, 1000))
dplyr::copy_to(con, df_visit, "visit")
dplyr::copy_to(con, df_r, "r")
tbl_visit <- dplyr::tbl(con, "visit")
tbl_r <- dplyr::tbl(con, "r")
simaerep(tbl_visit, r = tbl_r)
DBI::dbDisconnect(con)


Aggregate from visit to site level.

Description

Calculates visit_med75, n_pat_with_med75 and mean_ae_site_med75. Used by simaerep_classic()

Usage

site_aggr(
  df_visit,
  method = "med75_adj",
  min_pat_pool = 0.2,
  event_names = c("ae")
)

Arguments

df_visit

dataframe with columns: study_id, site_number, patnum, visit, n_ae

method

character, one of c("med75", "med75_adj", "max") defining method for defining evaluation point visit_med75 (see details), Default: "med75_adj"

min_pat_pool

double, minimum ratio of available patients available for sampling. Determines maximum visit_med75 value see Details. Default: 0.2

event_names

vector, contains the event names, default = "ae"

Details

For determining the visit number at which we are going to evaluate AE reporting we take the maximum visit of each patient at the site and take the median. Then we multiply with 0.75 which will give us a cut-off point determining which patient will be evaluated. Of those patients we will evaluate we take the minimum of all maximum visits hence ensuring that we take the highest visit number possible without excluding more patients from the analysis. In order to ensure that the sampling pool for that visit is large enough we limit the visit number by the 80% quantile of maximum visits of all patients in the study. "max" will determine site max visit, flag patients that concluded max visit and count patients and patients that concluded max visit.

Value

dataframe with the following columns:

study_id

study identification

site_number

site identification

n_pat

number of patients, site level

visit_med75

adjusted median(max(visit)) * 0.75 see Details

n_pat_with_med75

number of patients that meet visit_med75 criterion, site level

mean_ae_site_med75

mean AE at visit_med75, site level

See Also

simaerep_classic()

Examples

df_visit <- sim_test_data_study(
  n_pat = 100,
  n_sites = 5,
  ratio_out = 0.4,
  factor_event_rate = 0.6
  ) %>%
  # internal functions require internal column names
  dplyr::rename(
    n_ae = n_event,
    site_number = site_id,
    patnum = patient_id
  )

df_site <- site_aggr(df_visit)

df_site %>%
  knitr::kable(digits = 2)

Conditional with_progress.

Description

Internal function. Use instead of with_progress within custom functions with progress bars.

Usage

with_progress_cnd(ex, progress = TRUE)

Arguments

ex

expression

progress

logical, Default: TRUE

Details

This wrapper adds a progress parameter to with_progress so that we can control the progress bar in the user facing functions. The progressbar only shows in interactive mode.

Value

No return value, called for side effects

See Also

with_progress

Examples

if (interactive()) {

 with_progress_cnd(
   purrr_bar(rep(0.25, 5), .purrr = purrr::map, .f = Sys.sleep, .steps = 5),
   progress = TRUE
 )

 with_progress_cnd(
   purrr_bar(rep(0.25, 5), .purrr = purrr::map, .f = Sys.sleep, .steps = 5),
   progress = FALSE
 )

# wrap a function with progress bar with another call with progress bar

f1 <- function(x, progress = TRUE) {
  with_progress_cnd(
    purrr_bar(x, .purrr = purrr::walk, .f = Sys.sleep, .steps = length(x), .progress = progress),
    progress = progress
  )
}

# inner progress bar blocks outer progress bar
progressr::with_progress(
  purrr_bar(
    rep(rep(1, 3),3), .purrr = purrr::walk, .f = f1, .steps = 3,
    .f_args = list(progress = TRUE)
  )
)

# inner progress bar turned off
progressr::with_progress(
  purrr_bar(
    rep(list(rep(0.25, 3)), 5), .purrr = purrr::walk, .f = f1, .steps = 5,
    .f_args = list(progress = FALSE)
  )
)
}