Title: | Density Ratio Permutation Test |
Version: | 1.1 |
Description: | Implementation of the Density Ratio Permutation Test for testing the goodness-of-fit of a hypothesised ratio of two densities, as described in Bordino and Berrett (2025) <doi:10.48550/arXiv.2505.24529>. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
LinkingTo: | Rcpp |
Imports: | Rcpp, BiasedUrn, rootSolve, future, future.apply, Rdpack (≥ 2.6) |
RdMacros: | Rdpack |
Suggests: | testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
NeedsCompilation: | yes |
Packaged: | 2025-07-02 16:26:52 UTC; laburd |
Author: | Alberto Bordino |
Maintainer: | Alberto Bordino <alberto.bordino@warwick.ac.uk> |
Repository: | CRAN |
Date/Publication: | 2025-07-07 09:10:05 UTC |
A function implementing the Density Ratio Permutation Test based on an estimate of the shifted-MMD.
Description
A function that implements the DRPT based on the U-statistic (12)
defined in Bordino and Berrett (2025). An estimator of the shifted-MMD
with kernel k(\cdot, \cdot)
as defined in Section 3.2 of the paper is computed using
the function shiftedMMD
, which is provided in the package.
Usage
DRPT(X, Y, r, kernel, H = 99, S = 50)
Arguments
X |
A numeric vector containing the first sample. |
Y |
A numeric vector containing the second sample. |
r |
A function specifying the hypothesised density ratio. |
kernel |
A function defining the kernel to be used for the U-statistic. |
H |
An integer specifying the number of permutations to use. Defaults to 99. |
S |
An integer specifying the number of steps for the Markov-Chain defined in Algorithm 2 in Bordino and Berrett (2025). Defaults to 50. |
Value
The p-value of the DRPT as defined in (2) in Bordino and Berrett (2025).
References
Bordino A, Berrett TB (2025). “Density Ratio Permutation Tests with connections to distributional shifts and conditional two-sample testing.” arXiv:2505.24529, https://arxiv.org/abs/2505.24529.
Examples
n = 50; m = 50; d = 2
r = function(x,y) {
return(4*x*y)
}
gaussian.kernel = function(x, y, lambda = 1){
return(lambda^(-d) * exp(-sum(((x - y) ^ 2) / (lambda ^ 2))))
}
X = as.matrix(cbind(runif(n, 0, 1), runif(n, 0, 1)))
Y = as.matrix(cbind(rbeta(m, 0.5, 0.3), rbeta(m, 0.5, 0.4)))
DRPT(X,Y, r, gaussian.kernel, H=19, S=10)
DRPT(X,Y, r, gaussian.kernel, H=9)
A function implementing the discrete version of the DRPT for discrete data with finite support.
Description
A function that implements the discrete version of the DRPT for discrete data with finite support as defined in Section 2.1 in Bordino and Berrett (2025).
Usage
discrete.DRPT(X, Y, r, H = 99, type = "V")
Arguments
X |
A numeric vector containing the first sample. |
Y |
A numeric vector containing the second sample. |
r |
A numeric vector of positive values specifying the hypothesised density ratio in the discrete setting. |
H |
An integer specifying the number of permutations to use. Defaults to 99. |
type |
A character string indicating the test statistic to use. See the Details section for more information.
Defaults to |
Details
Counts for the permuted samples are drawn using rMFNCHypergeo
from the package BiasedUrn
.
When type="U"
the test statistic is the U-statistic (12); when type="V"
the test statistic is the V-statistic (11); setting type="D"
gives the test statistic (56) in Appendix B of the paper.
Value
The p-value of the DRPT as defined in (2) in Bordino and Berrett (2025).
References
Bordino A, Berrett TB (2025). “Density Ratio Permutation Tests with connections to distributional shifts and conditional two-sample testing.” arXiv:2505.24529, https://arxiv.org/abs/2505.24529.
Examples
n = 100; m = n
X = sample(0:3, n, prob = c(1/8, 1/8, 3/8, 3/8), replace = TRUE)
Y = sample(0:3, m, prob = c(1/43, 3/43, 16/43, 23/43), replace = TRUE)
r = c(1, 3, 3, 10)
discrete.DRPT(X,Y,r,H=19)
discrete.DRPT(X,Y,r, type = "U", H=19)
discrete.DRPT(X,Y,r, type = "D", H=19)
Compute test statistics for the DRPT in discrete settings.
Description
Computes the test statistics introduced in Bordino and Berrett (2025) for settings where the data support is discrete and finite.
Usage
discreteT(NX, NY, r, n, m, type = "V")
Arguments
NX |
A vector of counts for the first sample.
This corresponds to the sequence |
NY |
A vector of counts for the second sample.
This corresponds to the sequence |
r |
A numeric vector of positive values specifying the hypothesised density ratio in the discrete setting. |
n |
The size of the first sample. |
m |
The size of the second sample. |
type |
A character string indicating which test statistic to compute.
One of |
Details
When type = "U"
, the U-statistic (12) is calculated.
When type = "V"
, the V-statistic (11) is computed.
When type = "D"
, the test statistic (56) from Appendix B is returned.
Value
A numeric value representing the computed test statistic.
References
Bordino A, Berrett TB (2025). “Density Ratio Permutation Tests with connections to distributional shifts and conditional two-sample testing.” arXiv:2505.24529, https://arxiv.org/abs/2505.24529.
Examples
n = 100; m = n
X = sample(0:3, n, prob = c(1/4, 1/4, 1/4, 1/4), replace = TRUE)
Y = sample(0:3, m, prob = c(1/17, 3/17, 3/17, 10/17), replace = TRUE)
r = c(1, 3, 3, 10)
NX = table(X)
NY = table(Y)
discreteT(NX, NY, r, sum(NX), sum(NY), type = "V")
discreteT(NX, NY, r, sum(NX), sum(NY), type = "D")
A function computing an estimate of the shifted-MMD.
Description
A function computing the U-statistic (12). This serves as an estimator of the shifted-MMD defined in Section 3.2 of Bordino and Berrett (2025).
Usage
shiftedMMD(X, Y, r, kernel)
Arguments
X |
A numeric vector containing the first sample. |
Y |
A numeric vector containing the second sample. |
r |
A function specifying the hypothesised density ratio. |
kernel |
A function defining the kernel to be used for the U-statistic. |
Value
The value of the U-statistic (12).
References
Bordino A, Berrett TB (2025). “Density Ratio Permutation Tests with connections to distributional shifts and conditional two-sample testing.” arXiv:2505.24529, https://arxiv.org/abs/2505.24529.
Examples
n = 250; m = 250; d = 2
r = function(x,y) {
return(4*x*y)
}
gaussian.kernel = function(x, y, lambda = 1){
return(lambda^(-d) * exp(-sum(((x - y) ^ 2) / (lambda ^ 2))))
}
X = as.matrix(cbind(runif(n, 0, 1), runif(n, 0, 1)))
Y = as.matrix(cbind(rbeta(m, 0.5, 0.3), rbeta(m, 0.5, 0.4)))
shiftedMMD(X,Y, r, gaussian.kernel)
A function implementing the star-sampler for the DRPT.
Description
A function implementing Algorithm 2 in Bordino and Berrett (2025).
Usage
starSampler(X, Y, r, H = 99, S = 50)
Arguments
X |
A numeric vector containing the first sample. |
Y |
A numeric vector containing the second sample. |
r |
A function specifying the hypothesised density ratio. |
H |
An integer specifying the number of permutations to use. Defaults to 99. |
S |
An integer specifying the number of steps for the Markov-Chain defined in Algorithm 2 in Bordino and Berrett (2025). Defaults to 50. |
Value
A list of H+1
rearrangements of the whole sample. The first element of
the list is the original dataset. The other H
elements are permutations of the original
dataset, where permutations are generated using Algorithm 2 in the paper.
References
Bordino A, Berrett TB (2025). “Density Ratio Permutation Tests with connections to distributional shifts and conditional two-sample testing.” arXiv:2505.24529, https://arxiv.org/abs/2505.24529.
Examples
n = 250; m = n
r = function(x,y) {
return(4*x*y)
}
X = as.matrix(cbind(runif(n, 0, 1), runif(n, 0, 1)))
Y = as.matrix(cbind(rbeta(m, 0.5, 0.3), rbeta(m, 0.5, 0.4)))
starSampler(X, Y, r, H = 3, S = 20)