Type: | Package |
Title: | LP Nonparametric High Dimensional K-Sample Comparison |
Version: | 2.1 |
Date: | 2020-05-31 |
Author: | Subhadeep Mukhopadhyay, Kaijun Wang |
Maintainer: | Kaijun Wang <kaijunwang.19@gmail.com> |
Description: | LP nonparametric high-dimensional K-sample comparison method that includes (i) confirmatory test, (ii) exploratory analysis, and (iii) options to output a data-driven LP-transformed matrix for classification. The primary reference is Mukhopadhyay, S. and Wang, K. (2020, Biometrika); <doi:10.48550/arXiv.1810.01724>. |
Depends: | R (≥ 2.10), apcluster, igraph, mclust, LPGraph |
License: | GPL-2 |
NeedsCompilation: | no |
Packaged: | 2020-06-01 17:24:18 UTC; AquinasUnit |
Repository: | CRAN |
Date/Publication: | 2020-06-02 00:40:12 UTC |
LP Nonparametric High Dimensional K-Sample Comparison
Description
This package performs high dimensional K-sample comparison using graph-based LP nonparametric (GLP) method.
Author(s)
Mukhopadhyay, S. and Wang, K.
Maintainer: Kaijun Wang <kaijunwang.19@gmail.com>
References
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
Mukhopadhyay, S. (2017+), "Unified Statistical Theory of Spectral Graph Analysis".
Mukhopadhyay, S. and Parzen, E. (2014), "LP Approach to Statistical Modeling", arXiv:1405.2601.
A function to perform K-sample test using GLP algorithm
Description
This function performs the GLP multivariate K-sample learning.
Usage
GLP(X,y,m.max=4,components=NULL,alpha=0.05,c.poly=0.5,clust.alg='kmeans',perm=0,
combine.criterion='pvalue',multiple.comparison=TRUE,
compress.algorithm=FALSE,nbasis=8, return.LPT=FALSE,return.clust=FALSE)
Arguments
X |
A |
y |
A length |
m.max |
An integer, maximum order of LP component to investigate, default: 4. |
components |
A vector specifying which components to test. If provided with any value other than NULL, the test will only examine the components mentioned in this argument, ignoring the m.max settings. |
alpha |
Numeric, confidence level |
c.poly |
Numeric, parameter for polynomial kernel, default: 0.5. |
perm |
Number of permutations for approximating p-value, set to 0 to use asymptotic p-value. |
combine.criterion |
How to obtain the overall testing result based on the component-wise results; 'pvalue' uses Fisher's method to combine the p-values from each component; 'kernel' computes an overall kernel |
multiple.comparison |
Set to TRUE to use adjustment for multiple comparisons when determining which components are significant. |
compress.algorithm |
Use the smooth compression of Laplacian spectra for testing the null hypothesis. Recommended for large |
nbasis |
Number of bases used for approximation when |
clust.alg |
|
return.LPT |
logical, whether or not to return the data driven covariate matrix, default: FALSE. |
return.clust |
logical, whether or not to return the class labels assigned by graph community detection, default: FALSE. |
Value
A list containing the following items:
GLP |
Overall GLP statistics. |
pval |
Overall P-value. |
table |
The GLP component table indicating the significance of each component. |
components |
significant eLP components for the data set. |
LPT |
(optional) matrix of data driven covariates. |
clust |
(optional) class labels assigned by graph community detection. |
Author(s)
Mukhopadhyay, S. and Wang, K.
References
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
Mukhopadhyay, S. and Wang, K. (2020). "Towards a unified statistical theory of spectralgraph analysis", arXiv:1901.07090,
Examples
##1.muiltivariate normal distribution with only mean difference:
##generate data, n1=n2=10, dimension 25
X1<-matrix(rnorm(250,mean=0,sd=1),10,25)
X2<-matrix(rnorm(250,mean=0.5,sd=1),10,25)
y<-c(rep(1,10),rep(2,10))
X<-rbind(X1,X2)
##GLP test:
locdiff.test<-GLP(X,y,m.max=4)
## Not run:
##2.Leukemia data example
data(leukemia)
attach(leukemia)
leukemia.test<-GLP(X,class,components=1:4)
##confirmatory results:
leukemia.test$GLP # overall statistic
#[1] 0.2092378
leukemia.test$pval # overall p-value
#[1] 0.0001038647
##exploratory outputs:
leukemia.test$table # rows as shown in Table 3 of reference
# component comp.GLP pvalue
#[1,] 1 0.209237826 0.0001038647
#[2,] 2 0.022145514 0.2066876581
#[3,] 3 0.002025545 0.7025436476
#[4,] 4 0.033361702 0.1211769396
## End(Not run)
Function to find LP-comeans
Description
The function computes the LP comeans between x
and y
.
Usage
LP.comean(x, y, perm=0)
Arguments
x |
vector, observations of an univariate random variable |
y |
vector, observations of another univariate random variable |
perm |
Number of permutations for approximating p-value, set to 0 to use asymptotic p-value. |
Value
A list containing:
LPINFOR |
The test statistics based on LP comeans |
p.val |
Test p-value |
LP.matrix |
LP comean matrix |
Author(s)
Mukhopadhyay, S. and Wang, K.
References
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
Parzen, E. and Mukhopadhyay, S. (2012) "Modeling, Dependence, Classification, United Statistical Science, Many Cultures".
Examples
#example: LP-comean for two simple vectors:
y<-c(1,2,3,4,5)
z<-c(0,-1,-1,3,4)
comeanYZ=LP.comean(y,z)
#sum square statistics of LP comean:
comeanYZ$LPINFOR
#p-value:
comeanYZ$p.val
#comean matrix:
comeanYZ$LP.matrix
eLP Transformation
Description
Empirical LP Transformation on the data
Usage
LPT(x, k);
LP.Poly(x, m);
Arguments
x |
A column vector of the data |
k |
An integer, order of LP component for transformation |
m |
An integer, maximum order of LP component for transformation |
Details
Given a vector of data x
, the LPT(x,k)
function computes the vector of eLP component of order specified by k
for x
. While the LP.Poly(x,m)
function computes all components up until m
.
Value
A vector containing the elements of k
-th order component of the eLP transformation on x
(LPT);
Or a matrix with columns of 1
to m
-th order component of the eLP transformation on x
(LP.Poly);
Author(s)
Mukhopadhyay, S. and Wang, K.
References
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
Mukhopadhyay, S. and Parzen, E. (2014) "LP Approach to Statistical Modeling", arXiv:1405.2601.
Examples
##
x<-runif(10)
LPT(x,1)
Similarity matrix based on eLP basis and polynomial kernel
Description
Given data matrix X
and eLP order k
, this function generate the similarity matrix W
for graph analysis.
Usage
W.Gen(X, k, c.poly = 0.5)
Arguments
X |
A |
k |
An integer, order of LP component |
c.poly |
Numeric, parameter for polynomial kernel |
Value
A n
-by-n
similarity matrix generated from k
-th order eLP transformation of X
Author(s)
Mukhopadhyay, S. and Wang, K.
References
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
See Also
Examples
#example: 6 observations on 3 features:
x<-rbind(matrix(runif(9),3,3),matrix(runif(9)+1,3,3))
#LP similarity matrix:
simmat<-W.Gen(x,1)$W
image(simmat)
Leukemia cancer gene expression data
Description
Gene expression data for two classes: Acute lymphoblastic leukemia (ALL) and Acute myeloid leukemia (AML), over n=72 observations, and d=7128 genes.
Usage
data("leukemia")
Format
A list containing the following items:
class
:a vector of class labels
X
:72 by 7128 matrix, gene expressions for each observation
Source
http://statweb.stanford.edu/~ckirby/brad/LSI/datasets-and-programs/datasets.html
Examples
data(leukemia)