Title: | Distributed Online Goodness-of-Fit Tests for Distributed Datasets |
Date: | 2025-06-16 |
Version: | 0.2 |
Description: | Distributed Online Goodness-of-Fit Test can process the distributed datasets. The philosophy of the package is described in Guo G.(2024) <doi:10.1016/j.apm.2024.115709>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Imports: | stats |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | no |
Packaged: | 2025-06-16 15:04:59 UTC; ASUS |
Author: | Guangbao Guo |
Maintainer: | Guangbao Guo <ggb11111111@163.com> |
Depends: | R (≥ 3.5.0) |
Repository: | CRAN |
Date/Publication: | 2025-06-24 09:50:08 UTC |
Two-Sample Anderson-Darling Test (Bootstrap Version)
Description
Performs a two-sample Anderson-Darling (AD) goodness-of-fit test using bootstrap resampling to compare whether two samples come from the same distribution. This test is sensitive to differences in both location and shape between the two distributions.
Usage
AD2gof(
x,
y,
alternative = c("two.sided", "less", "greater"),
nboots = 2000,
keep.boots = FALSE
)
Arguments
x |
A numeric vector of data values from the first sample. |
y |
A numeric vector of data values from the second sample. |
alternative |
Character string specifying the alternative hypothesis. One of '"two.sided"' (default), '"less"', or '"greater"'. |
nboots |
Integer. Number of bootstrap replicates to compute the null distribution (default: 2000). |
keep.boots |
Logical. If 'TRUE', returns the full vector of bootstrap statistics (default: 'FALSE'). |
Details
The test computes the Anderson-Darling statistic using the pooled empirical distribution functions (ECDFs) of the two samples. A bootstrap procedure resamples the group labels to approximate the null distribution and compute a p-value. If 'p.value = 0', it is adjusted to '1 / (2 * nboots)' for stability.
Value
A list of class '"htest"' containing:
- statistic
The observed Anderson-Darling test statistic.
- p.value
The estimated bootstrap p-value.
- alternative
The alternative hypothesis used.
- method
A character string describing the test.
- bootstraps
(Optional) A numeric vector of bootstrap statistics if 'keep.boots = TRUE'.
Examples
set.seed(123)
x <- rnorm(100, mean = 0, sd = 4)
y <- rnorm(100, mean = 2, sd = 4)
AD2gof(x, y)
Anderson-Darling Goodness-of-Fit Test for a Specified Distribution
Description
Performs the Anderson-Darling (AD) goodness-of-fit test for a given univariate distribution. The function computes the AD statistic and returns an approximate p-value based on adjusted formulas.
Usage
ADgof(
x,
dist = c("norm", "exp", "unif", "lnorm", "weibull", "gamma", "t", "chisq"),
...,
eps = 1e-15
)
Arguments
x |
A numeric vector of sample observations. |
dist |
A character string specifying the null distribution. Options are
|
... |
Additional named parameters passed to the corresponding distribution functions
(e.g., |
eps |
A small positive constant to avoid log(0) during computation (default: |
Details
This implementation supports several common distributions. Parameters of the null distribution
must be supplied via ...
. The p-value is calculated using the approximations suggested
by Stephens (1986) and other refinements. For small samples or custom distributions, a bootstrap
version may be preferred.
Value
A list of class "htest"
with components:
- statistic
The value of the Anderson-Darling test statistic.
- p.value
The approximate p-value computed using adjustment formulas.
- method
A description of the test performed.
- data.name
The name of the input data.
Examples
set.seed(123)
x1 <- rnorm(500, mean = 5, sd = 2)
ADgof(x1, dist = "norm", mean = 5, sd = 2)
x2 <- rexp(400, rate = 1.5)
ADgof(x2, dist = "exp")
ADgof(x2, dist = "exp", rate = 1.5)
x3 <- runif(300, min = -2, max = 4)
ADgof(x3, dist = "unif", min = -2, max = 4)
Two-Sample Cramér–von Mises Test (Bootstrap Version)
Description
Performs a nonparametric two-sample Cramér–von Mises test using a permutation-based bootstrap method to assess whether two samples come from the same distribution.
Usage
CVM2gof(
x,
y,
alternative = c("two.sided", "less", "greater"),
nboots = 2000,
keep.boots = FALSE
)
Arguments
x |
Numeric vector of observations from the first sample. |
y |
Numeric vector of observations from the second sample. |
alternative |
Character string specifying the alternative hypothesis.
Must be one of |
nboots |
Number of bootstrap replicates to approximate the null distribution (default: 2000). |
keep.boots |
Logical. If |
Details
The test compares two empirical cumulative distribution functions (ECDFs). The bootstrap procedure permutes group labels to generate the null distribution. Tailored one-sided tests use one-sided squared differences of ECDFs.
Value
An object of class "htest"
with elements:
- statistic
Observed Cramér–von Mises test statistic.
- p.value
Bootstrap-based p-value.
- alternative
The alternative hypothesis used.
- method
A description of the test.
- bootstraps
(Optional) Vector of bootstrap test statistics if
keep.boots = TRUE
.
Examples
set.seed(123)
x <- rnorm(100, mean = 0, sd = 4)
y <- rnorm(100, mean = 2, sd = 4)
CVM2gof(x, y)
# One-sided test
CVM2gof(x, y, alternative = "greater")
# Store bootstrap replicates
res <- CVM2gof(x, y, keep.boots = TRUE)
hist(res$bootstraps, main = "Bootstrap Distribution", xlab = "Test Statistic")
One-Sample Cramér–von Mises Goodness-of-Fit Test
Description
Performs the one-sample Cramér–von Mises goodness-of-fit (GoF) test to assess whether a sample comes from a specified distribution using asymptotic p-value approximations.
Usage
CVMgof2(
x,
dist = c("norm", "exp", "unif", "lnorm", "weibull", "gamma", "t", "chisq"),
...,
eps = 1e-15
)
Arguments
x |
A numeric vector of observations. |
dist |
A character string specifying the theoretical distribution. Must be one of
|
... |
Distribution parameters passed to the corresponding |
eps |
A small value to truncate extreme p-values (default is |
Details
The test uses the Cramér–von Mises statistic to assess how well the empirical distribution function (EDF) of the sample agrees with the cumulative distribution function (CDF) of the specified theoretical distribution. The p-value is computed using approximation formulas derived from the asymptotic distribution of the test statistic.
Value
An object of class "htest"
with the following components:
- statistic
The computed Cramér–von Mises test statistic.
- p.value
The asymptotic p-value.
- method
A description of the test and distribution.
- data.name
The name of the data vector.
Examples
set.seed(123)
x1 <- rnorm(500, mean = 0, sd = 1)
CVMgof2(x1, dist = "norm", mean = 0, sd = 1)
x2 <- rexp(500, rate = 2)
CVMgof2(x2, dist = "exp", rate = 2)
x3 <- runif(200, min = -1, max = 3)
CVMgof2(x3, dist = "unif", min = -1, max = 3)
Two-Sample Kolmogorov–Smirnov Test with Bootstrap
Description
Performs a two-sample Kolmogorov–Smirnov (KS) test using a bootstrap method to assess whether two independent samples come from the same distribution.
Usage
KS2gof(
x,
y,
alternative = c("two.sided", "less", "greater"),
nboots = 5000,
keep.boots = FALSE
)
Arguments
x , y |
Numeric vectors of data values for the two independent samples. |
alternative |
Character string specifying the alternative hypothesis,
must be one of |
nboots |
Number of bootstrap resamples used to approximate the null distribution (default: 5000). |
keep.boots |
Logical; if |
Details
This implementation performs a nonparametric KS test for equality of distributions by resampling under the null hypothesis. It supports one-sided and two-sided alternatives.
If keep.boots = TRUE
, the function returns all bootstrap statistics,
which can be used for further analysis (e.g., plotting).
If the p-value is zero due to no bootstrap statistic exceeding the observed value,
it is adjusted to 1 / (2 * nboots)
to avoid a zero p-value.
Value
An object of class "htest"
with the following components:
- statistic
The observed KS statistic.
- p.value
The p-value based on the bootstrap distribution.
- alternative
The alternative hypothesis.
- method
Description of the test used.
Examples
set.seed(123)
x <- rnorm(100, mean = 0, sd = 4)
y <- rnorm(100, mean = 2, sd = 4)
KS2gof(x, y)
One-sample Kolmogorov-Smirnov goodness-of-fit test
Description
Performs the one-sample Kolmogorov-Smirnov test for a specified theoretical distribution.
Usage
KSgof2(
x,
dist = c("norm", "exp", "unif", "lnorm", "weibull", "gamma", "t", "chisq"),
...,
eps = 1e-15
)
Arguments
x |
Numeric vector of observations. |
dist |
Character string specifying the distribution to test against.
One of |
... |
Additional parameters passed to the distribution’s cumulative distribution function (CDF).
For example, |
eps |
Numeric lower and upper bound for tail probabilities to avoid numerical issues (default: |
Details
The test compares the empirical distribution function of x
with the cumulative distribution function
of a specified theoretical distribution using the Kolmogorov-Smirnov statistic.
For large sample sizes, a p-value approximation based on the asymptotic distribution is used.
A correction is applied when sample size exceeds 100, adjusting the test statistic to approximate a fixed sample size. For very small or very large statistics, piecewise polynomial approximations are used to compute the p-value.
Value
An object of class "htest"
containing the test statistic, p-value, method description, and data name.
Examples
set.seed(123)
x <- rnorm(1000, mean = 5, sd = 2)
KSgof2(x, dist = "norm", mean = 5, sd = 2)
y <- rexp(500, rate = 0.5)
KSgof2(y, dist = "exp", rate = 0.5)
u <- runif(300, min = 0, max = 10)
KSgof2(u, dist = "unif", min = 0, max = 10)
Two-Sample Kuiper Test with Bootstrap
Description
Performs a two-sample Kuiper test using bootstrap resampling to test whether two independent samples come from the same distribution.
Usage
Kuiper2gof(
x,
y,
alternative = c("two.sided", "less", "greater"),
nboots = 2000,
keep.boots = FALSE
)
Arguments
x , y |
Numeric vectors of data values for the two samples. |
alternative |
Character string indicating the alternative hypothesis. Must be one of |
nboots |
Integer. Number of bootstrap resamples to compute the empirical null distribution (default: 2000). |
keep.boots |
Logical. If |
Details
The Kuiper test is a nonparametric test similar to the Kolmogorov–Smirnov test, but sensitive to discrepancies in both location and shape between two distributions. This implementation uses bootstrap resampling to estimate the p-value.
The two.sided
test uses the sum of maximum positive and negative ECDF differences.
The greater
and less
options use one-sided variations.
If the observed test statistic exceeds all bootstrap values, the p-value is set to 1 / (2 * nboots)
to avoid zero.
Value
An object of class "htest"
containing:
- statistic
The observed Kuiper statistic.
- p.value
The p-value computed from the bootstrap distribution.
- alternative
The specified alternative hypothesis.
- method
A character string describing the test.
- bootstraps
(If requested) A numeric vector of bootstrap statistics.
Examples
set.seed(123)
x <- rnorm(100, 0, 4)
y <- rnorm(100, 2, 4)
Kuiper2gof(x, y)
Watson goodness-of-fit test Performs the Watson test for goodness-of-fit to a specified distribution.
Description
Watson goodness-of-fit test Performs the Watson test for goodness-of-fit to a specified distribution.
Usage
Wgof(x, dist = c("norm", "exp", "unif", "lnorm", "gamma"), ..., eps = 1e-15)
Arguments
x |
Numeric vector of observations. |
dist |
Character string specifying the distribution to test against.
One of |
... |
Additional parameters passed to the distribution's cumulative distribution function (CDF).
For example, |
eps |
Numeric tolerance for probability bounds to avoid extremes (default: 1e-15). |
Details
The Watson test is a modification of the Cramér–von Mises test, adjusting for mean deviations. It measures the squared distance between the empirical distribution function of the data and the specified theoretical cumulative distribution function, with a correction for location.
Value
An object of class "htest"
containing the test statistic, p-value, method description, data name,
and any distribution parameters used.
Examples
set.seed(123)
x_norm <- rnorm(1000, mean = 5, sd = 2)
Wgof(x_norm, dist = "norm", mean = 5, sd = 2)
x_exp <- rexp(500, rate = 0.5)
Wgof(x_exp, dist = "exp", rate = 0.5)
x_unif <- runif(300, min = 0, max = 10)
Wgof(x_unif, dist = "unif", min = 0, max = 10)
x_lnorm <- rlnorm(200, meanlog = 0, sdlog = 1)
Wgof(x_lnorm, dist = "lnorm", meanlog = 0, sdlog = 1)
x_gamma <- rgamma(400, shape = 1, rate = 1)
Wgof(x_gamma, dist = "gamma", shape = 1, rate = 1)
Perform the Cramer-von Mises Goodness-of-Fit Test for Normality
Description
Perform the Cramer-von Mises Goodness-of-Fit Test for Normality
Usage
cvmgof(x)
Arguments
x |
A numeric vector containing the sample data. |
Value
statistic |
The value of the Cramer-von Mises test statistic. |
p.value |
The p-value for the test. |
method |
A character string describing the test. |
Examples
# Example usage:
set.seed(123)
x <- rnorm(100) # Generate a sample from a normal distribution
result <- cvmgof(x)
print(result)
# Example with non-normal data:
y <- rexp(100) # Generate a sample from an exponential distribution
result <- cvmgof(y)
print(result)
Perform the Lilliefors (Kolmogorov-Smirnov) Goodness-of-Fit Test for Normality
Description
Perform the Lilliefors (Kolmogorov-Smirnov) Goodness-of-Fit Test for Normality
Usage
ksgof(x)
Arguments
x |
A numeric vector containing the sample data. |
Value
statistic |
The value of the Lilliefors (Kolmogorov-Smirnov) test statistic. |
p.value |
The p-value for the test. |
method |
A character string describing the test. |
Examples
# Example usage:
set.seed(123)
x <- rnorm(100) # Generate a sample from a normal distribution
result <- ksgof(x)
print(result)
# Example with non-normal data:
y <- rexp(100) # Generate a sample from an exponential distribution
result <- ksgof(y)
print(result)
Calculate the Quantile of the Cramer-von Mises Goodness-of-Fit Statistic
Description
This function calculates the quantile of the Cramer-von Mises goodness-of-fit statistic using the 'uniroot' function to find the root of the given function.
Usage
qCvMgof(X, p)
Arguments
X |
A numeric vector containing the sample data. |
p |
A numeric value representing the desired quantile probability. |
Value
root |
The quantile value corresponding to the given probability. |
Examples
# Example usage:
set.seed(123)
X <- rnorm(100) # Generate a sample from a normal distribution
p <- 0.95 # Desired quantile probability
result <- qCvMgof(X, p)
print(result)
Perform a Simple Cramer-von Mises Goodness-of-Fit Test
Description
This function performs a simple Cramer-von Mises goodness-of-fit test to assess whether a given sample comes from a uniform distribution. The test statistic and p-value are calculated based on the sorted sample data.
Usage
simpleCvMgof(X)
Arguments
X |
A numeric vector containing the sample data. |
Value
statistic |
The value of the Cramer-von Mises test statistic. |
pvalue |
The p-value for the test. |
statname |
The name of the test statistic. |
Examples
# Example usage:
set.seed(123)
X <- runif(100) # Generate a sample from a uniform distribution
result <- simpleCvMgof(X)
print(result)
# Example with non-uniform data:
Y <- rnorm(100) # Generate a sample from a normal distribution
result <- simpleCvMgof(Y)
print(result)