Type: | Package |
Title: | Isotonic Subgroup Selection |
Version: | 1.0.0 |
Description: | Methodology for subgroup selection in the context of isotonic regression including methods for sub-Gaussian errors, classification, homoscedastic Gaussian errors and quantile regression. See the documentation of ISS(). Details can be found in the paper by Müller, Reeve, Cannings and Samworth (2023) <doi:10.48550/arXiv.2305.04852>. |
License: | GPL (≥ 3) |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
Imports: | parallel, stats, Rdpack (≥ 0.7) |
RdMacros: | Rdpack |
NeedsCompilation: | no |
Packaged: | 2023-07-06 12:18:47 UTC; manuel |
Author: | Manuel M. Müller [aut, cre], Henry W. J. Reeve [aut], Timothy I. Cannings [aut], Richard J. Samworth [aut] |
Maintainer: | Manuel M. Müller <mm2559@cam.ac.uk> |
Repository: | CRAN |
Date/Publication: | 2023-07-06 22:10:02 UTC |
ISS
Description
The function implements the combination of p-value calculation and familywise error rate control through DAG testing procedures described in Müller et al. (2023).
Usage
ISS(
X,
y,
tau,
alpha = 0.05,
m = nrow(X),
p_value = c("sub-Gaussian-normalmixture", "sub-Gaussian", "Gaussian", "classification",
"quantile"),
sigma2,
rho = 1/2,
FWER_control = c("ISS", "Holm", "MG all", "MG any", "split", "split oracle"),
minimal = FALSE,
split_proportion = 1/2,
eta = NA,
theta = 1/2
)
Arguments
X |
a numeric matrix specifying the covariates. |
y |
a numeric vector with |
tau |
a single numeric value specifying the threshold of interest. |
alpha |
a numeric value in (0, 1] specifying the Type I error rate. |
m |
an integer value between 1 and |
p_value |
one of |
sigma2 |
a single positive numeric value specifying the variance parameter (only needed if |
rho |
a single positive numeric value serving as hyperparameter (only used if |
FWER_control |
one of |
minimal |
a logical value determining whether the output should be reduced to the minimal number of points leading to the same selected set. |
split_proportion |
when |
eta |
when |
theta |
a single numeric value in (0, 1) specifying the quantile of interest when |
Value
A numeric matrix giving the points in X
determined to lie in the tau
-superlevel set of the regression function with probability at least 1 - alpha
or, if minimal == TRUE
, a subset of points thereof that have the same upper hull.
References
Meijer RJ, Goeman JJ (2015).
“A multiple testing method for hypotheses structured in a directed acyclic graph.”
Biometrical Journal, 57(1), 123–143.
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023).
“Isotonic subgroup selection.”
arXiv preprint arXiv:2305.04852v2.
Examples
d <- 2
n <- 1000
m <- 100
sigma2 <- (1 / 4)^2
tau <- 0.5
alpha <- 0.05
X <- matrix(runif(n * d), nrow = n)
eta_X <- apply(X, MARGIN = 1, max)
y <- eta_X + rnorm(n, sd = sqrt(sigma2))
X_rej <- ISS(X = X, y = y, tau = tau, alpha = alpha, m = m, sigma2 = sigma2)
if (d == 2) {
plot(0, type = "n", xlim = c(0, 1), ylim = c(0, 1), xlab = NA, ylab = NA)
for (i in 1:nrow(X_rej)) {
rect(
xleft = X_rej[i, 1], xright = 1, ybottom = X_rej[i, 2], ytop = 1,
border = NA, col = "indianred"
)
}
points(X, pch = 16, cex = 0.5, col = "gray")
points(X[1:m, ], pch = 16, cex = 0.5, col = "black")
lines(x = c(0, tau), y = c(tau, tau), lty = 2)
lines(x = c(tau, tau), y = c(tau, 0), lty = 2)
legend(
x = "bottomleft",
legend = c(
"superlevel set boundary",
"untested covariate points",
"tested covariate points",
"selected set"
),
col = c("black", "gray", "black", "indianred"),
lty = c(2, NA, NA, NA),
lwd = c(1, NA, NA, NA),
pch = c(NA, 16, 16, NA),
fill = c(NA, NA, NA, "indianred"),
border = c(NA, NA, NA, "indianred")
)
}
dag_test_FS
Description
Implements the fixed sequence testing procedure of familywise error rate control. The sequence is given through ordering elements of p_order
increasingly.
Usage
dag_test_FS(p_order, p, alpha, decreasing = FALSE)
Arguments
p_order |
a numeric vector or matrix with one column whose order determines the sequence of tests. |
p |
a numeric vector taking values in (0, 1] such that |
alpha |
a numeric value in (0, 1] specifying the Type I error rate. |
decreasing |
a boolean value determining whether the order of p_order should be understood in decreasing order. |
Value
A boolean vector of the same length as p
with each element being TRUE
if the corresponding hypothesis is rejected and FALSE
otherwise.
Examples
p_order <- c(0.5, 0, 1)
p <- c(0.01, 0.1, 0.05)
alpha <- 0.05
dag_test_FS(p_order, p, alpha, decreasing = TRUE)
dag_test_Holm
Description
Given a vector of p-values, each concerning a row in the matrix X0,
dag_test_Holm()
first applies Holm's method to the p-values and then also rejects
hypotheses corresponding to points coordinate-wise greater or equal to any
point whose hypothesis has been rejected.
Usage
dag_test_Holm(X0, p, alpha)
Arguments
X0 |
a numeric matrix giving points corresponding to hypotheses. |
p |
a numeric vector taking values in (0, 1] such that |
alpha |
a numeric value in (0, 1] specifying the Type I error rate. |
Value
A boolean vector of the same length as p
with each element being TRUE
if the corresponding hypothesis is rejected and FALSE
otherwise.
Examples
X0 <- rbind(c(0.5, 0.5), c(0.8, 0.9), c(0.4, 0.6))
p <- c(0.01, 0.1, 0.05)
alpha <- 0.05
dag_test_Holm(X0, p, alpha)
dag_test_ISS
Description
Implements the DAG testing procedure given in Algorithm 1 by Müller et al. (2023).
Usage
dag_test_ISS(X0, p, alpha)
Arguments
X0 |
a numeric matrix giving points corresponding to hypotheses. |
p |
a numeric vector taking values in (0, 1] such that |
alpha |
a numeric value in (0, 1] specifying the Type I error rate. |
Value
A boolean vector of the same length as p
with each element being TRUE
if the corresponding hypothesis is rejected and FALSE
otherwise.
References
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852.
Examples
X0 <- rbind(c(0.5, 0.6), c(0.8, 0.9), c(0.9, 0.8))
p <- c(0.02, 0.025, 0.1)
alpha <- 0.05
dag_test_ISS(X0, p, alpha)
dag_test_MG
Description
Implements the graph-testing procedures proposed by Meijer and Goeman (2015) for one-way logical relationships. Here implemented for the specific application to isotonic subgroup selection.
Usage
dag_test_MG(
X0,
p,
alpha,
version = c("all", "any"),
leaf_weights,
sparse = FALSE
)
Arguments
X0 |
a numeric matrix giving points corresponding to hypotheses. |
p |
a numeric vector taking values in (0, 1] such that |
alpha |
a numeric value in (0, 1] specifying the Type I error rate. |
version |
either |
leaf_weights |
optional weights for the leaf nodes. Would have to be a numeric vector
of the same length as there are leaf nodes in the DAG (resp. polytree, see |
sparse |
a logical value specifying whether |
Value
A boolean vector of the same length as p
with each element being TRUE
if the corresponding hypothesis is rejected and FALSE
otherwise.
References
Meijer RJ, Goeman JJ (2015). “A multiple testing method for hypotheses structured in a directed acyclic graph.” Biometrical Journal, 57(1), 123–143.
Examples
X0 <- rbind(c(0.5, 0.6), c(0.8, 0.9), c(0.9, 0.8))
p <- c(0.02, 0.025, 0.1)
alpha <- 0.05
dag_test_MG(X0, p, alpha)
dag_test_MG(X0, p, alpha, version = "any")
dag_test_MG(X0, p, alpha, sparse = TRUE)
get_DAG
Description
This function is used to construct the induced DAG, induced polyforest and
reverse topological orderings thereof from a numeric matrix X0
. See
Definition 2 in Müller et al. (2023).
Usage
get_DAG(X0, sparse = FALSE, twoway = FALSE)
Arguments
X0 |
a numeric matrix. |
sparse |
logical. Either the induced DAG ( |
twoway |
logical. If |
Value
A list with named elements giving the leaves, parents, ancestors and
reverse topological ordering and additionally, if twoway == TRUE
, the
roots, children and descendants, of the constructed graph.
References
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852.
Examples
X <- rbind(
c(0.2, 0.8), c(0.2, 0.8), c(0.1, 0.7),
c(0.2, 0.1), c(0.3, 0.5), c(0.3, 0)
)
get_DAG(X0 = X)
get_DAG(X0 = X, sparse = TRUE, twoway = TRUE)
get_boundary_points
Description
Given a set of points, returns the minimal subset with the same upper hull.
Usage
get_boundary_points(X)
Arguments
X |
a numeric matrix with one point per row. |
Value
A numeric matrix of the same number of columns as X
.
Examples
X <- rbind(c(0, 1), c(1, 0), c(1, 0), c(1, 1))
get_boundary_points(X)
get_p_Gaussian
Description
Calculate the p-value in Definition 19 of Müller et al. (2023).
Usage
get_p_Gaussian(X, y, x0, tau)
Arguments
X |
a numeric matrix specifying the covariates. |
y |
a numeric vector with |
x0 |
a numeric vector specifying the point of interest, such that
|
tau |
a single numeric value specifying the threshold of interest. |
Value
A single numeric value in (0, 1].
References
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852.
Examples
set.seed(123)
n <- 100
d <- 2
X <- matrix(runif(d * n), ncol = d)
eta <- function(x) sum(x)
y <- apply(X, MARGIN = 1, FUN = eta) + rnorm(n, sd = 1)
get_p_Gaussian(X, y, x0 = c(1, 1), tau = 1)
get_p_Gaussian(X, y, x0 = c(1, 1), tau = -1)
get_p_classification
Description
Calculate the p-value in Definition 21 of Müller et al. (2023).
Usage
get_p_classification(X, y, x0, tau)
Arguments
X |
a numeric matrix specifying the covariates. |
y |
a numeric vector with |
x0 |
a numeric vector specifying the point of interest, such that
|
tau |
a single numeric value in [0,1) specifying the threshold of interest. |
Value
A single numeric value in (0, 1].
References
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852.
Examples
set.seed(123)
n <- 100
d <- 2
X <- matrix(runif(d * n), ncol = d)
eta <- function(x) sum(x)
X_eta <- apply(X, MARGIN = 1, FUN = function(x) 1 / (1 + exp(-eta(x))))
y <- as.numeric(runif(n) < X_eta)
get_p_classification(X, y, x0 = c(1, 1), tau = 0.6)
get_p_classification(X, y, x0 = c(1, 1), tau = 0.9)
get_p_subGaussian
Description
Calculate the p-value in Definition 1 of Müller et al. (2023).
Usage
get_p_subGaussian(X, y, x0, sigma2, tau)
Arguments
X |
a numeric matrix specifying the covariates. |
y |
a numeric vector with |
x0 |
a numeric vector specifying the point of interest, such that
|
sigma2 |
a single positive numeric value specifying the variance parameter. |
tau |
a single numeric value specifying the threshold of interest. |
Value
A single numeric value in (0, 1].
References
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852.
Examples
set.seed(123)
n <- 100
d <- 2
X <- matrix(runif(d*n), ncol = d)
eta <- function(x) sum(x)
y <- apply(X, MARGIN = 1, FUN = eta) + rnorm(n, sd = 0.5)
get_p_subGaussian(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 1)
get_p_subGaussian(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 3)
get_p_subGaussian_NM
Description
Calculate the p-value in Definition 18 of Müller et al. (2023).
Usage
get_p_subGaussian_NM(X, y, x0, sigma2, tau, rho = 0.5)
Arguments
X |
a numeric matrix specifying the covariates. |
y |
a numeric vector with |
x0 |
a numeric vector specifying the point of interest, such that
|
sigma2 |
a single positive numeric value specifying the variance parameter. |
tau |
a single numeric value specifying the threshold of interest. |
rho |
a single positive numeric value serving as hyperparameter. |
Value
A single numeric value in (0, 1].
References
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852.
Examples
set.seed(123)
n <- 100
d <- 2
X <- matrix(runif(d * n), ncol = d)
eta <- function(x) sum(x)
y <- apply(X, MARGIN = 1, FUN = eta) + rnorm(n, sd = 0.5)
get_p_subGaussian_NM(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 3)
get_p_subGaussian_NM(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 1)
get_p_subGaussian_NM(X, y, x0 = c(1, 1), sigma2 = 0.25, tau = 1, rho = 2)
get_p_value
Description
A wrapper function used to call the correct function for calculating the p-value.
Usage
get_p_value(
p_value_method = c("sub-Gaussian-normalmixture", "sub-Gaussian", "Gaussian",
"classification", "quantile"),
X,
y,
x0,
tau,
sigma2,
rho = 1/2,
theta = 1/2
)
Arguments
p_value_method |
one of |
X |
a numeric matrix specifying the covariates. |
y |
a numeric vector with |
x0 |
a numeric vector specifying the point of interest, such that |
tau |
a single numeric value specifying the threshold of interest. |
sigma2 |
a single positive numeric value specifying the variance parameter (required only if |
rho |
a single positive numeric value serving as hyperparameter (required only if |
theta |
a single numeric value in (0, 1) specifying the quantile of interest when |
Value
A single numeric value in (0, 1].
References
Müller MM, Reeve HWJ, Cannings TI, Samworth RJ (2023). “Isotonic subgroup selection.” arXiv preprint arXiv:2305.04852.
Examples
set.seed(123)
n <- 100
d <- 2
X <- matrix(runif(d * n), ncol = d)
eta <- function(x) sum(x)
X_eta <- apply(X, MARGIN = 1, FUN = function(x) 1 / (1 + exp(-eta(x))))
y <- as.numeric(runif(n) < X_eta)
get_p_value(p_value_method = "classification", X, y, x0 = c(1, 1), tau = 0.6)
get_p_value(p_value_method = "classification", X, y, x0 = c(1, 1), tau = 0.9)
X_eta <- apply(X, MARGIN = 1, FUN = eta)
y <- X_eta + rcauchy(n)
get_p_value(p_value_method = "quantile", X, y, x0 = c(1, 1), tau = 1/2)
get_p_value(p_value_method = "quantile", X, y, x0 = c(1, 1), tau = 3)
get_p_value(p_value_method = "quantile", X, y, x0 = c(1, 1), tau = 3, theta = 0.95)