Version: | 0.2-0 |
Date: | 2024-08-02 |
Title: | Fast Algorithms for Large Scale Generalized Distance Weighted Discrimination |
Depends: | R (≥ 3.5.0), Matrix |
Imports: | methods, stats |
Description: | Solving large scale distance weighted discrimination. The main algorithm is a symmetric Gauss-Seidel based alternating direction method of multipliers (ADMM) method. See Lam, X.Y., Marron, J.S., Sun, D.F., and Toh, K.C. (2018) <doi:10.48550/arXiv.1604.05473> for more details. |
License: | GPL-2 |
URL: | https://arxiv.org/pdf/1604.05473 |
NeedsCompilation: | no |
Packaged: | 2024-08-18 14:24:20 UTC; lotus |
Author: | Xin-Yee Lam [aut, cre], J.S. Marron [aut], Defeng Sun [aut], Kim-Chuan Toh [aut] |
Maintainer: | Xin-Yee Lam <matlxy@nus.edu.sg> |
Repository: | CRAN |
Date/Publication: | 2024-08-19 12:00:20 UTC |
Fast Algorithms for Large Scale Generalized Distance Weighted Discrimination
Description
Solving large scale distance weighted discrimination. The main algorithm is a symmetric Gauss-Seidel based alternating direction method of multipliers (ADMM) method.
Details
The package DWDLargeR
contains two main functions:
penaltyParameter
genDWD
Author(s)
Xin-Yee Lam, J.S. Marron, Defeng Sun, and Kim-Chuan Toh
References
Lam, X.Y., Marron, J.S., Sun, D.F., and Toh, K.C. (2018)
“Fast algorithms for large scale generalized distance weighted discrimination", Journal of Computational and Graphical Statistics, forthcoming.
https://arxiv.org/abs/1604.05473
Internal DWDLargeR functions
Description
Internal DWDLargeR functions.
Usage
sigma_update(iter)
linsysolve(R,r)
smw(R,Z,ZT,yunit,schurmat,schurvec,normy,const,r)
fnorm(x)
polyRootsNewton(c,q,sigma,x0)
psqmr(ff,b,L,x0,tol,maxit)
precondfun(L,r)
vecMultiply(ff,x)
Details
These are not intended for use by users.
Author(s)
Xin-Yee Lam, J.S. Marron, Defeng Sun, and Kim-Chuan Toh
References
Lam, X.Y., Marron, J.S., Sun, D.F., and Toh, K.C. (2018)
“Fast algorithms for large scale generalized distance weighted discrimination", Journal of Computational and Graphical Statistics, forthcoming.
https://arxiv.org/abs/1604.05473
Solve the generalized distance weighted discrimination (DWD) model.
Description
Solve the generalized DWD model by using a symmetric Gauss-Seidel based alternating direction method of multipliers (ADMM) method.
Usage
genDWD(X,y,C,expon, tol = 1e-5, maxIter = 2000, method = 1, printDetails = 0,
rmzeroFea = 1, scaleFea = 1)
Arguments
X |
A |
y |
A vector of length |
C |
A number representing the penalty parameter for the generalized DWD model. |
expon |
A positive number representing the exponent |
tol |
The stopping tolerance for the algorithm. (Default = 1e-5) |
maxIter |
Maximum iteration allowed for the algorithm. (Default = 2000) |
method |
Method for solving generalized DWD model. The default is set to be 1 for the highly efficient sGS-ADMM algorithm. User can also select |
printDetails |
Switch for printing details of the algorithm. Default is set to be 0 (not printing). |
rmzeroFea |
Switch for removing zero features in the data matrix. Default is set to be 1 (removing zero features). |
scaleFea |
Switch for scaling features in the data matrix. This is to make the features having roughly similar magnitude. Default is set to be 1 (scaling features). |
Details
This is a symmetric Gauss-Seidel based alternating method of multipliers (sGS-ADMM) algorithm for solving the generalized DWD model of the following formulation:
\min \sum_i \theta_q (r_i) + C e^T x_i
subject to the constraints
Z^T w + \beta y + \xi - r = 0, ||w||<=1, \xi>=0,
where Z = X diag(y)
, e
is a given positive vector such that ||e||_\infty = 1
, and \theta_q
is a function defined by \theta_q(t) = 1/t^q
if t>0
and \theta_q(t)=\infty
if t<=0
.
Value
A list consists of the result from the algorithm.
w |
The unit normal of hyperplane that distinguishes the two classes. |
beta |
The distance of the hyperplane to the origin ( |
xi |
A slack variable of length |
r |
The residual |
alpha |
Dual variable of the generalized DWD model. |
info |
A list consists of the information from the algorithm. |
runhist |
A list consists of the run history throughout the iterations. |
Author(s)
Xin-Yee Lam, J.S. Marron, Defeng Sun, and Kim-Chuan Toh
References
Lam, X.Y., Marron, J.S., Sun, D.F., and Toh, K.C. (2018)
“Fast algorithms for large scale generalized distance weighted discrimination", Journal of Computational and Graphical Statistics, forthcoming.
https://arxiv.org/abs/1604.05473
Examples
# load the data
data("mushrooms")
# calculate the best penalty parameter
C = penaltyParameter(mushrooms$X,mushrooms$y,expon=1)
# solve the generalized DWD model
result = genDWD(mushrooms$X,mushrooms$y,C,expon=1)
Classification data from Audobon Society Field Guide (1981).
Description
This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like “leaflets three, let it be” for Poisonous Oak and Ivy.
Usage
data(mushrooms)
Format
List containing a 112x8124 matrix of 8124 training samples with 112 features; and a vector of length 8124 training labels.
Source
The data could be downloaded from the UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/Mushroom
References
Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
Compute the penalty parameter for the model.
Description
Find the best penalty parameter C
for the generalized distance weighted discrimination (DWD) model.
Usage
penaltyParameter(X,y,expon,rmzeroFea = 1, scaleFea = 1)
Arguments
X |
A |
y |
A vector of length |
expon |
A positive number representing the exponent |
rmzeroFea |
Switch for removing zero features in the data matrix. Default is set to be 1 (removing zero features). |
scaleFea |
Switch for scaling features in the data matrix. This is to make the features having roughly similar magnitude. Default is set to be 1 (scaling features). |
Details
The best parameter is empirically found to be inversely proportional to the typical distance between different samples raised to the power of (expon+1
).
It is also dependent on the sample size n
and feature dimension d
.
Value
A number which represents the best penalty parameter for the generalized DWD model.
Author(s)
Xin-Yee Lam, J.S. Marron, Defeng Sun, and Kim-Chuan Toh
References
Lam, X.Y., Marron, J.S., Sun, D.F., and Toh, K.C. (2018)
“Fast algorithms for large scale generalized distance weighted discrimination", Journal of Computational and Graphical Statistics, forthcoming.
https://arxiv.org/abs/1604.05473
Examples
# load the data
data("mushrooms")
# calculate the best penalty parameter
C = penaltyParameter(mushrooms$X,mushrooms$y,expon=1)