| Type: | Package | 
| Title: | LP Nonparametric High Dimensional K-Sample Comparison | 
| Version: | 2.1 | 
| Date: | 2020-05-31 | 
| Author: | Subhadeep Mukhopadhyay, Kaijun Wang | 
| Maintainer: | Kaijun Wang <kaijunwang.19@gmail.com> | 
| Description: | LP nonparametric high-dimensional K-sample comparison method that includes (i) confirmatory test, (ii) exploratory analysis, and (iii) options to output a data-driven LP-transformed matrix for classification. The primary reference is Mukhopadhyay, S. and Wang, K. (2020, Biometrika); <doi:10.48550/arXiv.1810.01724>. | 
| Depends: | R (≥ 2.10), apcluster, igraph, mclust, LPGraph | 
| License: | GPL-2 | 
| NeedsCompilation: | no | 
| Packaged: | 2020-06-01 17:24:18 UTC; AquinasUnit | 
| Repository: | CRAN | 
| Date/Publication: | 2020-06-02 00:40:12 UTC | 
LP Nonparametric High Dimensional K-Sample Comparison
Description
This package performs high dimensional K-sample comparison using graph-based LP nonparametric (GLP) method.
Author(s)
Mukhopadhyay, S. and Wang, K.
Maintainer: Kaijun Wang <kaijunwang.19@gmail.com>
References
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
Mukhopadhyay, S. (2017+), "Unified Statistical Theory of Spectral Graph Analysis".
Mukhopadhyay, S. and Parzen, E. (2014), "LP Approach to Statistical Modeling", arXiv:1405.2601.
A function to perform K-sample test using GLP algorithm
Description
This function performs the GLP multivariate K-sample learning.
Usage
GLP(X,y,m.max=4,components=NULL,alpha=0.05,c.poly=0.5,clust.alg='kmeans',perm=0,
	combine.criterion='pvalue',multiple.comparison=TRUE,
	compress.algorithm=FALSE,nbasis=8, return.LPT=FALSE,return.clust=FALSE)
Arguments
| X |  A  | 
| y |  A length  | 
| m.max | An integer, maximum order of LP component to investigate, default: 4. | 
| components | A vector specifying which components to test. If provided with any value other than NULL, the test will only examine the components mentioned in this argument, ignoring the m.max settings. | 
| alpha |  Numeric, confidence level  | 
| c.poly | Numeric, parameter for polynomial kernel, default: 0.5. | 
| perm | Number of permutations for approximating p-value, set to 0 to use asymptotic p-value. | 
| combine.criterion | How to obtain the overall testing result based on the component-wise results; 'pvalue' uses Fisher's method to combine the p-values from each component; 'kernel' computes an overall kernel  | 
| multiple.comparison | Set to TRUE to use adjustment for multiple comparisons when determining which components are significant. | 
| compress.algorithm | Use the smooth compression of Laplacian spectra for testing the null hypothesis. Recommended for large  | 
| nbasis | Number of bases used for approximation when  | 
| clust.alg | 
 | 
| return.LPT | logical, whether or not to return the data driven covariate matrix, default: FALSE. | 
| return.clust | logical, whether or not to return the class labels assigned by graph community detection, default: FALSE. | 
Value
A list containing the following items:
| GLP | Overall GLP statistics. | 
| pval | Overall P-value. | 
| table | The GLP component table indicating the significance of each component. | 
| components | significant eLP components for the data set. | 
| LPT | (optional) matrix of data driven covariates. | 
| clust | (optional) class labels assigned by graph community detection. | 
Author(s)
Mukhopadhyay, S. and Wang, K.
References
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
Mukhopadhyay, S. and Wang, K. (2020). "Towards a unified statistical theory of spectralgraph analysis", arXiv:1901.07090,
Examples
  ##1.muiltivariate normal distribution with only mean difference:
  ##generate data, n1=n2=10, dimension 25
   X1<-matrix(rnorm(250,mean=0,sd=1),10,25)
   X2<-matrix(rnorm(250,mean=0.5,sd=1),10,25)
   y<-c(rep(1,10),rep(2,10))
   X<-rbind(X1,X2)
  ##GLP test:
   locdiff.test<-GLP(X,y,m.max=4)
  ## Not run: 
  ##2.Leukemia data example
   data(leukemia)
   attach(leukemia)
   leukemia.test<-GLP(X,class,components=1:4)
  ##confirmatory results:
   leukemia.test$GLP  # overall statistic
   #[1] 0.2092378
   leukemia.test$pval # overall p-value
   #[1] 0.0001038647
  ##exploratory outputs:
   leukemia.test$table  # rows as shown in Table 3 of reference
   #     component    comp.GLP       pvalue
   #[1,]         1 0.209237826 0.0001038647
   #[2,]         2 0.022145514 0.2066876581
   #[3,]         3 0.002025545 0.7025436476
   #[4,]         4 0.033361702 0.1211769396
  
## End(Not run)
Function to find LP-comeans
Description
The function computes the LP comeans between x and y.
Usage
LP.comean(x, y, perm=0)
Arguments
| x | vector, observations of an univariate random variable | 
| y | vector, observations of another univariate random variable | 
| perm | Number of permutations for approximating p-value, set to 0 to use asymptotic p-value. | 
Value
A list containing:
| LPINFOR | The test statistics based on LP comeans | 
| p.val | Test p-value | 
| LP.matrix | LP comean matrix | 
Author(s)
Mukhopadhyay, S. and Wang, K.
References
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
Parzen, E. and Mukhopadhyay, S. (2012) "Modeling, Dependence, Classification, United Statistical Science, Many Cultures".
Examples
#example: LP-comean for two simple vectors:
 y<-c(1,2,3,4,5)
 z<-c(0,-1,-1,3,4)
 comeanYZ=LP.comean(y,z)
#sum square statistics of LP comean:
 comeanYZ$LPINFOR
#p-value:
 comeanYZ$p.val
#comean matrix:
 comeanYZ$LP.matrix
eLP Transformation
Description
Empirical LP Transformation on the data
Usage
LPT(x, k);
LP.Poly(x, m);
Arguments
| x | A column vector of the data | 
| k | An integer, order of LP component for transformation | 
| m | An integer, maximum order of LP component for transformation | 
Details
Given a vector of data x, the LPT(x,k) function computes the vector of eLP component of order specified by k for x. While the LP.Poly(x,m) function computes all components up until m.  
Value
A vector containing the elements of k-th order component of the eLP transformation on x (LPT);
Or a matrix with columns of 1 to m-th order component of the eLP transformation on x (LP.Poly);
Author(s)
Mukhopadhyay, S. and Wang, K.
References
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
Mukhopadhyay, S. and Parzen, E. (2014) "LP Approach to Statistical Modeling", arXiv:1405.2601.
Examples
##
 x<-runif(10)
 LPT(x,1)
Similarity matrix based on eLP basis and polynomial kernel
Description
Given data matrix X and eLP order k, this function generate the similarity matrix W for graph analysis.
Usage
W.Gen(X, k, c.poly = 0.5)
Arguments
| X | A  | 
| k | An integer, order of LP component | 
| c.poly | Numeric, parameter for polynomial kernel | 
Value
A n-by-n similarity matrix generated from k-th order eLP transformation of X
Author(s)
Mukhopadhyay, S. and Wang, K.
References
Mukhopadhyay, S. and Wang, K. (2020), "A Nonparametric Approach to High-dimensional K-sample Comparison Problem", arXiv:1810.01724.
See Also
Examples
#example: 6 observations on 3 features:
 x<-rbind(matrix(runif(9),3,3),matrix(runif(9)+1,3,3))
#LP similarity matrix:
 simmat<-W.Gen(x,1)$W
 image(simmat)
Leukemia cancer gene expression data
Description
Gene expression data for two classes: Acute lymphoblastic leukemia (ALL) and Acute myeloid leukemia (AML), over n=72 observations, and d=7128 genes.
Usage
data("leukemia")Format
A list containing the following items:
- class:
- a vector of class labels 
- X:
- 72 by 7128 matrix, gene expressions for each observation 
Source
http://statweb.stanford.edu/~ckirby/brad/LSI/datasets-and-programs/datasets.html
Examples
data(leukemia)