| Title: | Robust Graph-Based Two-Sample Test | 
| Version: | 0.1 | 
| Description: | Useful tools for determining whether two samples are from the same distribution. Utilizes a robust method to address the problematic structure of the similarity graph constructed from high-dimensional data. The method is provided in Yichuan Bai and Lynna Chu (2023) <doi:10.48550/arXiv.2307.12325>. | 
| License: | MIT + file LICENSE | 
| Encoding: | UTF-8 | 
| RoxygenNote: | 7.2.3 | 
| Imports: | ade4, stats | 
| Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) | 
| Config/testthat/edition: | 3 | 
| VignetteBuilder: | knitr | 
| Depends: | R (≥ 3.0.1) | 
| LazyData: | true | 
| NeedsCompilation: | no | 
| Packaged: | 2023-08-11 21:07:22 UTC; tutu | 
| Author: | Yichuan Bai [aut, cre], Lynna Chu [aut] | 
| Maintainer: | Yichuan Bai <ycbai@iastate.edu> | 
| Repository: | CRAN | 
| Date/Publication: | 2023-08-14 11:40:02 UTC | 
get the approximate test statistic and p-value based on asymptotic theory using robust generalized edge-count test
Description
get the approximate test statistic and p-value based on asymptotic theory using robust generalized edge-count test
Usage
asy_gen(asy_res, R1_test, R2_test)
Arguments
| asy_res | analytic expressions of expectations, variances and covariances | 
| R1_test | weighted within-sample edge-counts of sample 1 | 
| R2_test | weighted within-sample edge-counts of sample 2 | 
Value
A list containing the following components:
| test_statistic | the asymptotic test statistic using robust generalized graph-based test. | 
| p_value | the asymptotic p-value using robust generalized graph-based test. | 
get the approximate test statistic and p-value based on asymptotic theory using robust max-type edge-count test
Description
get the approximate test statistic and p-value based on asymptotic theory using robust max-type edge-count test
Usage
asy_max(asy_res, R1_test, R2_test, n1, n2)
Arguments
| asy_res | analytic expressions of expectations, variances and covariances | 
| R1_test | weighted within-sample edge-counts of sample 1 | 
| R2_test | weighted within-sample edge-counts of sample 2 | 
| n1 | number of observations in sample 1 | 
| n2 | number of observations in sample 2 | 
Value
A list containing the following components:
| test_statistic | the asymptotic test statistic using robust max-type graph-based test. | 
| p_value | the asymptotic p-value using robust max-type graph-based test. | 
get the approximate test statistic and p-value based on asymptotic theory using robust weighted edge-count test
Description
get the approximate test statistic and p-value based on asymptotic theory using robust weighted edge-count test
Usage
asy_wei(asy_res, R1_test, R2_test, n1, n2)
Arguments
| asy_res | analytic expressions of expectations, variances and covariances | 
| R1_test | weighted within-sample edge-counts of sample 1 | 
| R2_test | weighted within-sample edge-counts of sample 2 | 
| n1 | number of observations in sample 1 | 
| n2 | number of observations in sample 2 | 
Value
A list containing the following components:
| test_statistic | the asymptotic test statistic using robust weighted graph-based test. | 
| p_value | the asymptotic p-value using robust weighted graph-based test. | 
Example
Description
These example contains a dataset, the label of the observations in the dataset, the distance matrix of the dataset using L2 distance, and the edge matrix generated by 5-MST.
Usage
example0
Format
An object of class list of length 4.
Details
- data
- pooled dataset of two samples sampling from two different t-distributions. 
- label
- label of the observations. 'sample 1' denotes the observations in sample 1. 'sample 2' denotes the observations in sample 2. 
- distance
- the distance matrix of the pooled dataset using L2 distance. 
- edge
- edge matrix generated by 5-MST. 
Get distance matrix
Description
This function returns the distance matrix using L2 distance.
Usage
getdis(y)
Arguments
| y | dataset of the pooled data | 
Value
A distance matrix based on the L2 distance.
Examples
data(example0)
data = as.matrix(example0$data)     # pooled dataset
getdis(data)
construct k-mst
Description
construct k-mst
Usage
kmst(y = NULL, dis = NULL, k = 1)
Arguments
| y | data | 
| dis | distance matrix | 
| k | parameter in K-MST, with default 1 | 
Value
An edge matrix representing a similarity graph. Each row represents an edge and records the indices of two ends of an edge in two columns.
get lists of permuted weighted within-sample edge-counts and between-sample edge-counts
Description
get lists of permuted weighted within-sample edge-counts and between-sample edge-counts
Usage
permu_edge(n_per, E, n1, n2, wei, progress_bar = FALSE)
Arguments
| n_per | number of permutations. | 
| E | an edge matrix representing a similarity graph. Each row represents an edge and records the indices of two ends of an edge in two columns. The indices of observations in sample 1 are from 1 to n1 and indices of observations in sample 2 are from 1+n1 to n1+n2. | 
| n1 | number of observations in sample 1. | 
| n2 | number of observations in sample 2. | 
| wei | a vector of weights of each edge. | 
| progress_bar | a logical evaluating to TRUE or FALSE indicating whether a progress bar of the permutation should be printed. | 
Value
| R1 | the permuted weighted within-sample edge-counts for sample 1. | 
| R2 | the permuted weighted within-sample edge-counts for sample 2. | 
| R | the permuted weighted between-sample edge-counts. | 
get the test statistic and p-value based on permutation using robust generalized edge-count test
Description
get the test statistic and p-value based on permutation using robust generalized edge-count test
Usage
permu_gen(R1_list, R2_list, R1_test, R2_test, n_per)
Arguments
| R1_list | list of permuted weighted within-sample edge-counts of sample 1 | 
| R2_list | list of permuted weighted within-sample edge-counts of sample 2 | 
| R1_test | weighted within-sample edge-counts of sample 1 | 
| R2_test | weighted within-sample edge-counts of sample 2 | 
| n_per | number of permutations | 
Value
The p-value based on permutation distribution using robust generalized graph-based test.
get the test statistic and p-value based on permutation using robust max-type edge-count test
Description
get the test statistic and p-value based on permutation using robust max-type edge-count test
Usage
permu_max(R1_list, R2_list, R1_test, R2_test, n1, n2, n_per)
Arguments
| R1_list | list of permuted weighted within-sample edge-counts of sample 1 | 
| R2_list | list of permuted weighted within-sample edge-counts of sample 2 | 
| R1_test | weighted within-sample edge-counts of sample 1 | 
| R2_test | weighted within-sample edge-counts of sample 2 | 
| n1 | number of observations in sample 1 | 
| n2 | number of observations in sample 2 | 
| n_per | number of permutations | 
Value
The p-value based on permutation distribution using robust max-type graph-based test.
get the test statistic and p-value based on permutation using robust weighted edge-count test
Description
get the test statistic and p-value based on permutation using robust weighted edge-count test
Usage
permu_wei(R1_list, R2_list, R1_test, R2_test, n1, n2, n_per)
Arguments
| R1_list | list of permuted weighted within-sample edge-counts of sample 1 | 
| R2_list | list of permuted weighted within-sample edge-counts of sample 2 | 
| R1_test | weighted within-sample edge-counts of sample 1 | 
| R2_test | weighted within-sample edge-counts of sample 2 | 
| n1 | number of observations in sample 1 | 
| n2 | number of observations in sample 2 | 
| n_per | number of permutations | 
Value
The p-value based on permutation distribution using robust weighted graph-based test.
Robust graph-based two sample test
Description
Performs robust graph-based two sample test.
Usage
rg.test(data.X, data.Y, dis = NULL, E = NULL, n1, n2, k = 5, weigh.fun, perm.num = 0, 
test.type = list("ori", "gen", "wei", "max"), progress_bar = FALSE)
Arguments
| data.X | a numeric matrix for observations in sample 1. | 
| data.Y | a numeric matrix for observations in sample 2. | 
| dis | a distance matrix of the pooled dataset of sample 1 and sample 2. The indices of observations in sample 1 are from 1 to n1 and indices of observations in sample 2 are from 1+n1 to n1+n2 in the pooled dataset. | 
| E | an edge matrix representing a similarity graph. Each row represents an edge and records the indices of two ends of an edge in two columns. The indices of observations in sample 1 are from 1 to n1 and indices of observations in sample 2 are from 1+n1 to n1+n2. | 
| n1 | number of observations in sample 1. | 
| n2 | number of observations in sample 2. | 
| k | parameter in K-MST, with default 5. | 
| weigh.fun | weighted function which returns weights of each edge and is a function of node degrees. | 
| perm.num | number of permutations used to calculate the p-value (default=1000). Use 0 for getting only the approximate p-value based on asymptotic theory. | 
| test.type | type of graph-based test. This must be a list containing elements chosen from "ori", "gen", "wei", and "max", with default 'list("ori", "gen", "wei", "max")'. "ori" refers to robust orignial edge-count test, "gen" refers to robust generalized edge-count test, "wei" refers to robust weighted edge-count test and "max" refers to robust max-type edge-count tests. | 
| progress_bar | a logical evaluating to TRUE or FALSE indicating whether a progress bar of the permutation should be printed. | 
Details
The input should be one of the following:
- datasets of the two samples; 
- the distance matrix of the pooled dataset; 
- the edge matrix generated from a similarity graph. 
Typical usages are:
rg.test(data.X, data.Y, n1, n2, weigh.fun, ...)
rg.test(dis, n1, n2, weigh.fun, ...)
rg.test(E, n1, n2, weigh.fun, ...)
If the data matrices or the distance matrix are used, the similarity graph is generated using K-MST.
Value
A list containing the following components:
| asy.ori.statistic | the asymptotic test statistic using robust original graph-based test. | 
| asy.ori.pval | the asymptotic p-value using robust original graph-based test. | 
| asy.gen.statistic | the asymptotic test statistic using robust generalized graph-based test. | 
| asy.gen.pval | the asymptotic p-value using robust generalized graph-based test. | 
| asy.wei.statistic | the asymptotic test statistic using robust weighted graph-based test. | 
| asy.wei.pval | the asymptotic p-value using robust weighted graph-based test. | 
| asy.max.statistic | the asymptotic test statistic using robust max-type graph-based test. | 
| asy.max.pval | the asymptotic p-value using robust max-type graph-based test. | 
| perm.ori.pval | the p-value based on permutation using robust original graph-based test. | 
| perm.gen.pval | the p-value based on permutation using robust generalized graph-based test. | 
| perm.wei.pval | the p-value based on permutation using robust weighted graph-based test. | 
| perm.max.pval | the p-value based on permutation using robust max-type graph-based test. | 
Examples
## Simulated from Student's t-distribution. 
## Observations for the two samples are from different distributions.
data(example0)
data = as.matrix(example0$data)     # pooled dataset
label = example0$label              # label of observations
s1 = data[label == 'sample 1', ]    # sample 1
s2 = data[label == 'sample 2', ]    # sample 2
num1 = nrow(s1)                     # number of observations in sample 1
num2 = nrow(s2)                     # number of observations in sample 2
## Graph-based two sample test using data as input
rg.test(data.X = s1, data.Y = s2, n1 = num1, n2 = num2, k = 5, weigh.fun = weiMax, perm.num = 0)
## Graph-based two sample test using distance matrix as input
dist = example0$distance
rg.test(dis = dist, n1 = num1, n2 = num2, k = 5, weigh.fun = weiMax, perm.num = 0)
## Graph-based two sample test using edge matrix of the similarity graph as input
E = example0$edge
rg.test(E = E, n1 = num1, n2 = num2, weigh.fun = weiMax, perm.num = 0)
get analytic expressions of expectations, variances and covariances
Description
get analytic expressions of expectations, variances and covariances
Usage
theo_mu_sig(E, n1, n2, weights)
Arguments
| E | an edge matrix representing a similarity graph. Each row represents an edge and records the indices of two ends of an edge in two columns. The indices of observations in sample 1 are from 1 to n1 and indices of observations in sample 2 are from 1+n1 to n1+n2. | 
| n1 | number of observations in sample 1 | 
| n2 | number of observations in sample 2 | 
| weights | weights assigned to each edges | 
Value
| mu | the expectation of the between-sample edge-count. | 
| mu1 | the expectation of the within-sample edge-count for sample 1. | 
| mu2 | the expectation of the within-sample edge-count for sample 2. | 
| sig | the variance of the between-sample edge-count. | 
| sig11 | the variance of the within-sample edge-count for sample 1. | 
| sig22 | the variance of the within-sample edge-count for sample 2. | 
| sig12 | the covariance of the within-sample edge-counts. | 
Weighted function
Description
This weight function returns the inverse of the arithmetic average of the node degrees of an edge.
Usage
weiArith(a, b)
Arguments
| a | node degree of one end of an edge | 
| b | node degree of another end of an edge | 
Value
The weight uses the arithmetic average of the node degrees of an edge.
Examples
# For an edge where one end has a node degree of 5
# another end has a node degree of 6
 weiArith(6, 5)
Weighted function
Description
This weight function returns the inverse of the geometric average of the node degrees of an edge.
Usage
weiGeo(a, b)
Arguments
| a | node degree of one end of an edge | 
| b | node degree of another end of an edge | 
Value
The weight uses the geometric average of the node degrees of an edge.
Examples
# For an edge where one end has a node degree of 5
# another end has a node degree of 6
weiGeo(6, 5)
Weighted function
Description
This weight function returns the inverse of the max node degree of an edge.
Usage
weiMax(a, b)
Arguments
| a | node degree of one end of an edge | 
| b | node degree of another end of an edge | 
Value
The weight uses the max node degrees of an edge.
Examples
# For an edge where one end has a node degree of 5
# another end has a node degree of 6
weiMax(6, 5)
get weighted within-sample edge-counts and between-sample edge-counts
Description
get weighted within-sample edge-counts and between-sample edge-counts
Usage
weighted_R1R2(E, n1, wei)
Arguments
| E | an edge matrix representing a similarity graph. Each row represents an edge and records the indices of two ends of an edge in two columns. The indices of observations in sample 1 are from 1 to n1 and indices of observations in sample 2 are from 1+n1 to n1+n2. | 
| n1 | number of observations in sample 1. | 
| wei | a vector of weights of each edge. | 
Value
| R1 | the weighted within-sample edge-count for sample 1. | 
| R2 | the weighted within-sample edge-count for sample 2. | 
| R | the weighted between-sample edge-count. |