Type: | Package |
Title: | General Hypothesis Testing Problems for Multivariate Coefficients of Variation |
Version: | 0.1.0 |
Date: | 2023-02-02 |
Description: | Performs test procedures for general hypothesis testing problems for four multivariate coefficients of variation (Ditzhaus and Smaga, 2023 <doi:10.48550/arXiv.2301.12009>). We can verify the global hypothesis about equality as well as the particular hypotheses defined by contrasts, e.g., we can conduct post hoc tests. We also provide the simultaneous confidence intervals for contrasts. |
License: | LGPL-2 | LGPL-3 | GPL-2 | GPL-3 |
Imports: | Rcpp (≥ 1.0.9), mvtnorm, doParallel, MASS, foreach, Matrix, stringr, HSAUR |
LinkingTo: | Rcpp, RcppArmadillo |
RoxygenNote: | 7.2.3 |
Encoding: | UTF-8 |
NeedsCompilation: | yes |
Packaged: | 2023-02-02 13:54:19 UTC; ls |
Author: | Marc Ditzhaus [aut], Lukasz Smaga [aut, cre] |
Maintainer: | Lukasz Smaga <ls@amu.edu.pl> |
Repository: | CRAN |
Date/Publication: | 2023-02-02 17:00:02 UTC |
Inference for four multivariate coefficients of variation
Description
The function GFDmcv()
calculates the Wald-type statistic for global null hypotheses
and max-type statistics for multiple local null hypotheses, both in terms of the four variants
of the multivariate coefficient of variation. Respective p
-values
are obtained by a \chi^2
-approximation, a pooled bootstrap strategy and a pooled permutation approach (only for the
Wald-type statistic), respectively.
Usage
GFDmcv(
x,
h_mct,
h_wald,
alpha = 0.05,
n_perm = 1000,
n_boot = 1000,
parallel = FALSE,
n_cores = NULL
)
Arguments
x |
a list of length |
h_mct |
a |
h_wald |
a |
alpha |
a significance level (then |
n_perm |
a number of permutation replicates. |
n_boot |
a number of bootstrap replicates. |
parallel |
a logical indicating whether to use parallelization. |
n_cores |
if |
Details
The function GFDmcv()
calculates the Wald-type statistic for global null hypotheses of the form
\mathcal H_0: \mathbf{H} (C_1,\ldots,C_k)^\top = \mathbf{0}\,\,\text{and}\,\,\mathcal H_0: \mathbf{H} (B_1,\ldots,B_k)^\top = \mathbf{0},
where \mathbf{H}
is a contrast matrix reflecting the research question of interest and
C_i
(B_i
) are the subgroup-specific MCVs (and their reciprocal) by Reyment (1960, RR), Van Valen (1974, VV),
Voinov and Nikulin (1996, VN) or Albert and Zhang (2010, AZ), respectively.
We refer to the function e_mcv()
for the detailed definitions of the different
variants. The p
-value of the Wald-type statistic relies on a \chi^2
-approximation,
a (pooled) bootstrap or permutation approach.
Furthermore, the function GFDmcv()
calculates a max-type test statistic for the multiple
comparison of q
local null hypotheses:
\mathcal H_{0,\ell}: \mathbf{h_\ell}^\top \mathbf{C} = \mathbf{0}\,\,
\text{or}\,\,\mathcal H_{0,\ell}: \mathbf{h_\ell}^\top \mathbf{B} = \mathbf{0}, \,\,\ell=1,\ldots,q,
where \mathbf{C}=(C_1,\ldots,C_k)^\top
and \mathbf{B}=(B_1,\ldots,B_k)^\top
. The p
-values are determined by a Gaussian approximation and a bootstrap approach, respectively.
In addition to the local test decisions, multiple adjusted confidence intervals for the contrasts \mathbf{h_{\ell}^{\top}\pmb{C}}
and
\mathbf{h_{\ell}^{\top}\pmb{B}}
, respectively, are calculated.
Please have a look on the plot and summary functions designed for this package. They can be used
to simplify the output of GFDmcv()
.
Value
A list of class gfdmcv
containing the following components:
overall_res |
a list of two elements representing the results for testing
the global null hypothesis. The first one is a matrix |
mct_res |
all results of MCT tests for particular hypothesis in |
h_mct |
an argument |
h_wald |
an argument |
alpha |
an argument |
References
Albert A., Zhang L. (2010) A novel definition of the multivariate coefficient of variation. Biometrical Journal 52:667-675.
Ditzhaus M., Smaga L. (2022) Permutation test for the multivariate coefficient of variation in factorial designs. Journal of Multivariate Analysis 187, 104848.
Ditzhaus M., Smaga L. (2023) Inference for all variants of the multivariate coefficient of variation in factorial designs. Preprint https://arxiv.org/abs/2301.12009.
Reyment R.A. (1960) Studies on Nigerian Upper Cretaceous and Lower Tertiary Ostracoda: part 1. Senonian and Maastrichtian Ostracoda, Stockholm Contributions in Geology, vol 7.
Van Valen L. (1974) Multivariate structural statistics in natural history. Journal of Theoretical Biology 45:235-247.
Voinov V., Nikulin M. (1996) Unbiased Estimators and Their Applications, Vol. 2, Multivariate Case. Kluwer, Dordrecht.
Examples
# Some of the examples may run some time.
# one-way analysis for MCV and CV
# d > 1 (MCV)
data_set <- lapply(list(iris[iris$Species == "setosa", 1:3],
iris[iris$Species == "versicolor", 1:3],
iris[iris$Species == "virginica", 1:3]),
as.matrix)
# estimators and confidence intervals of MCVs and their reciprocals
lapply(data_set, e_mcv)
# contrast matrices
k <- length(data_set)
# Tukey's contrast matrix
h_mct <- contr_mat(k, type = "Tukey")
# centering matrix P_k
h_wald <- contr_mat(k, type = "center")
# testing without parallel computing
res <- GFDmcv(data_set, h_mct, h_wald)
summary(res, digits = 3)
oldpar <- par(mar = c(4, 5, 2, 0.3))
plot(res)
par(oldpar)
# testing with parallel computing
library(doParallel)
res <- GFDmcv(data_set, h_mct, h_wald, parallel = TRUE, n_cores = 2)
summary(res, digits = 3)
oldpar <- par(mar = c(4, 5, 2, 0.3))
plot(res)
par(oldpar)
# two-way analysis for CV (based on the example in Ditzhaus and Smaga, 2022)
library(HSAUR)
data_set <- lapply(list(BtheB$bdi.pre[BtheB$drug == "No" & BtheB$length == "<6m"],
BtheB$bdi.pre[BtheB$drug == "No" & BtheB$length == ">6m"],
BtheB$bdi.pre[BtheB$drug == "Yes" & BtheB$length == "<6m"],
BtheB$bdi.pre[BtheB$drug == "Yes" & BtheB$length == ">6m"]),
as.matrix)
# estimators and confidence intervals of CV and its reciprocal
lapply(data_set, e_mcv)
# interaction
h_mct <- contr_mat(4, type = "Tukey")
h_wald <- kronecker(contr_mat(2, type = "center"),
contr_mat(2, type = "center"))
res <- GFDmcv(data_set, h_mct, h_wald)
summary(res, digits = 3)
oldpar <- par(mar = c(4, 6, 2, 0.1))
plot(res)
par(oldpar)
# main effect drug
h_mct <- matrix(c(1, 1, -1, -1), nrow = 1)
h_wald <- kronecker(contr_mat(2, type = "center"), 0.5 * matrix(1, 1, 2))
res <- GFDmcv(data_set, h_mct, h_wald)
summary(res, digits = 3)
oldpar <- par(mar = c(4, 6, 2, 0.1))
plot(res)
par(oldpar)
# main effect length
h_mct <- matrix(c(1, -1, 1, -1), nrow = 1)
h_wald <- kronecker(0.5 * matrix(1, 1, 2), contr_mat(2, type = "center"))
res <- GFDmcv(data_set, h_mct, h_wald)
summary(res, digits = 3)
oldpar <- par(mar = c(4, 6, 2, 0.1))
plot(res)
par(oldpar)
Contrasts' matrices
Description
The output are different contrast matrices, namely the centering matrix as well as the matrices of Dunnett's and Tukey's contrasts for given number of groups.
Usage
contr_mat(k, type = c("center", "Dunnett", "Tukey"))
Arguments
k |
an integer denoting a number of groups |
type |
an character denoting type of contrasts. The possible values are |
Details
The centering matrix is \mathbf{P}_k = \mathbf{I}_k - \mathbf{J}_k/k
, where
\mathbf{I}_k
is the unity matrix and \mathbf{J}_k=\mathbf{1}\mathbf{1}^{\top} \in R^{k\times k}
consisting of 1
's only.
The matrix of Dunnett's contrasts:
\left(\begin{array}{rrrrrr}
-1 & 1 & 0 & \ldots & \ldots & 0\\
-1 & 0 & 1 & 0 & \ldots & 0 \\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\
-1 & 0 & \ldots & \ldots & 0 & 1\\
\end{array}\right)\in R^{(k-1)\times k}.
The matrix of Tukey's contrasts:
\left(\begin{array}{rrrrrrr}
-1 & 1 & 0 & \ldots & \ldots & 0 & 0\\
-1 & 0 & 1 & 0 & \ldots & \ldots & 0 \\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\
-1 & 0 & 0 & 0 & \ldots & \ldots & 1 \\
0 & -1 & 1 & 0 &\ldots & \ldots & 0 \\
\vdots & \vdots & \vdots & \vdots & \vdots & \vdots & \vdots \\
0 & \ldots & \ldots & \ldots & 0 & -1 & 1\\
\end{array}\right)\in R^{{k \choose 2}\times k}.
Value
The matrix of contrasts.
References
Ditzhaus M., Smaga L. (2022) Permutation test for the multivariate coefficient of variation in factorial designs. Journal of Multivariate Analysis 187, 104848.
Ditzhaus M., Smaga L. (2023) Inference for all variants of the multivariate coefficient of variation in factorial designs. Preprint https://arxiv.org/abs/2301.12009.
Dunnett C. (1955) A multiple comparison procedure for comparing several treatments with a control. Journal of the American Statistical Association 50, 1096-1121.
Tukey J.W. (1953) The problem of multiple comparisons. Princeton University.
Examples
contr_mat(4, type = "center")
contr_mat(4, type = "Dunnett")
contr_mat(4, type = "Tukey")
Estimators and confidence intervals of four multivariate coefficients of variation and their reciprocals
Description
Calculates the estimators with respective (1-\alpha)
-confidence intervals for the four different variants of the multivariate coefficients (MCV) and their reciprocals
by Reyment (1960), Van Valen (1974), Voinov and Nikulin (1996) and Albert and Zhang (2010).
Usage
e_mcv(x, conf_level = 0.95)
Arguments
x |
a matrix of data of size |
conf_level |
a confidence level. By default, it is equal to 0.95. |
Details
The function e_mcv()
calculates four different variants of multivariate coefficient of variation for d
-dimensional data. These variant were introduced by
by Reyment (1960, RR), Van Valen (1974, VV), Voinov and Nikulin (1996, VN) and Albert and Zhang (2010, AZ):
{\widehat C}^{RR}=\sqrt{\frac{(\det\mathbf{\widehat\Sigma})^{1/d}}{\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu}}},\
{\widehat C}^{VV}=\sqrt{\frac{\mathrm{tr}\mathbf{\widehat\Sigma}}{\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu}}},\
{\widehat C}^{VN}=\sqrt{\frac{1}{\boldsymbol{\widehat\mu}^{\top}\mathbf{\widehat\Sigma}^{-1}\boldsymbol{\widehat\mu}}},\
{\widehat C}^{AZ}=\sqrt{\frac{\boldsymbol{\widehat\mu}^{\top}\mathbf{\widehat\Sigma}\boldsymbol{\widehat\mu}}{(\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu})^2}},
where n
is the sample size, \boldsymbol{\widehat\mu}
is the empirical mean vector and \mathbf{\widehat \Sigma}
is the empirical covariance matrix:
\boldsymbol{\widehat\mu}_i = \frac{1}{n}\sum_{j=1}^{n} \mathbf{X}_{j},\; \mathbf{\widehat \Sigma} =\frac{1}{n}\sum_{j=1}^{n} (\mathbf{X}_{j} - \boldsymbol{\widehat \mu})(\mathbf{X}_{j} - \boldsymbol{\widehat \mu})^{\top}.
In the univariate case (d=1
), all four variants reduce to coefficient of variation. Furthermore, their reciprocals, the so-called standardized means, are determined:
{\widehat B}^{RR}=\sqrt{\frac{\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu}}{(\det\mathbf{\widehat\Sigma})^{1/d}}},\
{\widehat B}^{VV}=\sqrt{\frac{\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu}}{\mathrm{tr}\mathbf{\widehat\Sigma}}},\
{\widehat B}^{VN}=\sqrt{\boldsymbol{\widehat\mu}^{\top}\mathbf{\widehat\Sigma}^{-1}\boldsymbol{\widehat\mu}},\
{\widehat B}^{AZ}=\sqrt{\frac{(\boldsymbol{\widehat\mu}^{\top}\boldsymbol{\widehat\mu})^2}{\boldsymbol{\widehat\mu}^{\top}\mathbf{\widehat\Sigma}\boldsymbol{\widehat\mu}}}.
In addition to the estimators, the respective confidence intervals [C_lwr
, C_upr
] for a given confidence level 1-\alpha
are calculated by the e_mcv()
function.
These confidence intervals are based on an asymptotic approximation by a normal distribution, see Ditzhaus and Smaga (2023) for the technical details. These approximations
do not rely on any specific (semi-)parametric assumption on the distribution and are valid nonparametrically, even for tied data.
Value
When d>1
(respectively d=1
) a data frame with four rows (one row) corresponding to the four MCVs (the univariate CV)
and six columns containing the estimators C_est
for the MCV (CV) and the estimators B_est
for their reciprocals as well as the upper and lower bounds of the corresponding
confidence intervals [C_lwr
, C_upr
] and [B_lwr
, B_upr
].
References
Albert A., Zhang L. (2010) A novel definition of the multivariate coefficient of variation. Biometrical Journal 52:667-675.
Ditzhaus M., Smaga L. (2023) Inference for all variants of the multivariate coefficient of variation in factorial designs. Preprint https://arxiv.org/abs/2301.12009.
Reyment R.A. (1960) Studies on Nigerian Upper Cretaceous and Lower Tertiary Ostracoda: part 1. Senonian and Maastrichtian Ostracoda, Stockholm Contributions in Geology, vol 7.
Van Valen L. (1974) Multivariate structural statistics in natural history. Journal of Theoretical Biology 45:235-247.
Voinov V., Nikulin M. (1996) Unbiased Estimators and Their Applications, Vol. 2, Multivariate Case. Kluwer, Dordrecht.
Examples
# d > 1 (MCVs)
data_set <- lapply(list(iris[iris$Species == "setosa", 1:3],
iris[iris$Species == "versicolor", 1:3],
iris[iris$Species == "virginica", 1:3]),
as.matrix)
lapply(data_set, e_mcv)
# d = 1 (CV)
data_set <- lapply(list(iris[iris$Species == "setosa", 1],
iris[iris$Species == "versicolor", 1],
iris[iris$Species == "virginica", 1]),
as.matrix)
lapply(data_set, e_mcv)
Plot simultaneous confidence intervals for contrasts
Description
Simultaneous confidence intervals for contrasts for CV and MCVs and their reciprocals are plotted.
Usage
## S3 method for class 'gfdmcv'
plot(x, ...)
Arguments
x |
an "gfdmcv" object. |
... |
additional arguments not used. |
Value
No return value, called for side effects.
Examples
# Some of the examples may run some time.
# For more examples, see the documentation of the GFDmcv() function.
data_set <- lapply(list(iris[iris$Species == "setosa", 1:3],
iris[iris$Species == "versicolor", 1:3],
iris[iris$Species == "virginica", 1:3]),
as.matrix)
# estimators and confidence intervals of MCVs and their reciprocals
lapply(data_set, e_mcv)
# contrast matrices
k <- length(data_set)
# Tukey's contrast matrix
h_mct <- contr_mat(k, type = "Tukey")
# centering matrix P_k
h_wald <- contr_mat(k, type = "center")
# testing without parallel computing
res <- GFDmcv(data_set, h_mct, h_wald)
oldpar <- par(mar = c(4, 5, 2, 0.3))
plot(res)
par(oldpar)
Print "gfdmcv" object
Description
Prints the summary of the inference methods for CV and MCVs.
Usage
## S3 method for class 'gfdmcv'
summary(object, ...)
Arguments
object |
an "gfdmcv" object. |
... |
integer indicating the number of decimal places to be used to present the numerical results,
It can be named |
Details
The function prints out the information about the significance level, constrast matrices,
test statistics, p
-values, critical values, simultaneous confidence intervals for contrasts
performed by the GFDmcv()
function.
Value
No return value, called for side effects.
Examples
# Some of the examples may run some time.
# For more examples, see the documentation of the GFDmcv() function.
data_set <- lapply(list(iris[iris$Species == "setosa", 1:3],
iris[iris$Species == "versicolor", 1:3],
iris[iris$Species == "virginica", 1:3]),
as.matrix)
# estimators and confidence intervals of MCVs and their reciprocals
lapply(data_set, e_mcv)
# contrast matrices
k <- length(data_set)
# Tukey's contrast matrix
h_mct <- contr_mat(k, type = "Tukey")
# centering matrix P_k
h_wald <- contr_mat(k, type = "center")
# testing without parallel computing
res <- GFDmcv(data_set, h_mct, h_wald)
summary(res, digits = 3)