Help for package ISR

Title:

The Iterated Score Regression-Based Estimation

Date:

2025-05-16

Version:

2025.5.16

Description:

We use the ISR to handle with PCA-based missing data with high correlation, and the DISR to handle with distributed PCA-based missing data. The philosophy of the package is described in Guo G. (2024) <doi:10.1080/03610918.2022.2091779>.

Encoding:

UTF-8

License:

MIT + file LICENSE

LazyData:

true

RoxygenNote:

7.3.2

Imports:

MASS, stats

Suggests:

testthat (≥ 3.0.0)

Config/testthat/edition:

Depends:

R (≥ 3.5.0)

NeedsCompilation:

Packaged:

2025-05-16 09:21:11 UTC; Administrator

Author:

Guangbao Guo

[aut, cre], Haoyue Song [aut], Lixing Zhu [aut]

Maintainer:

Guangbao Guo <ggb11111111@163.com>

Repository:

CRAN

Date/Publication:

2025-05-16 09:40:20 UTC

CKD

Description

chronic kidney disease

Usage

data("CKD")

Format

The format is: num [1:400, 1:18] 48 7 62 48 51 60 68 24 52 53 ... - attr(*, "dimnames")=List of 2 ..$ : NULL ..$ : chr [1:18] "age" "bp" "sg" "al" ...

Details

There are 1010 missing values in the data set, accounting for 14.03 percent.

Source

Dr.P.Soundarapandian.M.D.,D.M (Senior Consultant Nephrologist), Apollo Hospitals, Managiri, Madurai Main Road, Karaikudi, Tamilnadu, Indi

References

Polat, H., Danaei-Mehr, H., and Cetin, A. (2017). Diagnosis of chronic kidney disease based on support vector machine by feature selection methods. Journal of Medical Systems, 41(4), 1-11.

Examples

data(CKD)
## maybe str(CKD) ; plot(CKD) ...

Caculate the estimator with the DISR method

Description

Caculate the estimator with the DISR method

Usage

DISR(data, data0, real = TRUE, example = FALSE, D)

Arguments

data

is the orignal data set

data0

is the missing data set

real

is to judge whether the data set is a real missing data set

example

is to judge whether the data set is a simulation example

D

is the number of nodes

Value

XDISR

is the estimator on the DISR method

MSEDISR

is the MSE value of the DISR method

MAEDISR

is the MAE value of the DISR method

REDISR

is the RE value of the DISR method

GCVDISR

is the GCV value of the DISR method

timeDISR

is the time cost of the DISR method

Examples

 library(MASS)
 n=100;p=10;per=0.1
 X0=data=matrix(mvrnorm(n*p,0,1),n,p)
 m=round(per*n*p,digits=0)
 mr=sample(1:(n*p),m,replace=FALSE)
 X0[mr]=NA;data0=X0
 DISR(data=data,data0=data0,real=FALSE,example=FALSE,D=2)

HCV

Description

Hepatitis C virus

Usage

data("HCV")

Format

The format is: num [1:615, 1:13] 1 1 1 1 1 1 1 1 1 1 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:615] "1" "2" "3" "4" ... ..$ : chr [1:13] "Category" "Age" "Sex" "ALB" ...

Details

There are 31 missing values in the data set, accounting for 0.39 percent.

Source

UCI repository

References

Lichtinghagen, R., Pietsch, D., Bantel, H., Manns, M., Brand, K. and Bahr, Matthias. (2013). The Enhanced Liver Fibrosis (ELF) Score: Normal Values, Influence Factors and Proposed Cut-Off Values.. Journal of hepatology. 59. 236-242.

Examples

data(HCV)
## maybe str(HCV) ; plot(HCV) ...

Caculate the estimator with the ISR method

Description

Caculate the estimator with the ISR method

Usage

ISR(data, data0, real = TRUE, example = FALSE)

Arguments

data

is the orignal data set

data0

is the missing data set

real

is to judge whether the data set is a real missing data set

example

is to judge whether the data set is a simulation example.

Value

XISR

is the estimator on the ISR method

MSEISR

is the MSE value of the ISR method

MAEISR

is the MAE value of the ISR method

REISR

is the RE value of the ISR method

GCVISR

is the GCV value of the ISR method

timeISR

is the time cost of the ISR method

Examples

 library(MASS)
 n=100;p=10;per=0.1
 X0=data=matrix(mvrnorm(n*p,0,1),n,p)
 m=round(per*n*p,digits=0)
 mr=sample(1:(n*p),m,replace=FALSE)
 X0[mr]=NA;data0=X0
 ISR(data=data,data0=data0,real=FALSE,example=FALSE)

Caculate the estimator on the MMLPCA method

Description

Caculate the estimator on the MMLPCA method

Usage

MMLPCA(data, data0, real = TRUE, example = FALSE)

Arguments

data

is the orignal data set

data0

is the missing data set

real

is to judge whether the data set is a real missing data set

example

is to judge whether the data set is a simulation example.

Value

XMMLPCA

is the estimator on the MMLPCA method

MSEMMLPCA

is the MSE value of the MMLPCA method

MAEMMLPCA

is the MAE value of the MMLPCA method

REMMLPCA

is the RE value of the MMLPCA method

GCVMMLPCA

is the GCV value of the MMLPCA method

timeMMLPCA

is the time cost of the MMLPCA method

Examples

 library(MASS)
 n=100;p=10;per=0.1
 X0=data=matrix(mvrnorm(n*p,0,1),n,p)
 m=round(per*n*p,digits=0)
 mr=sample(1:(n*p),m,replace=FALSE)
 X0[mr]=NA;data0=X0
 MMLPCA(data=data,data0=data0,real=FALSE,example=FALSE)

Caculate the estimator on the MNIPALS method

Description

Caculate the estimator on the MNIPALS method

Usage

MNIPALS(data, data0, real = TRUE, example = FALSE)

Arguments

data

is the orignal data set

data0

is the missing data set

real

is to judge whether the data set is a real missing data set

example

is to judge whether the data set is a simulation example.

Value

XMNIPALS

is the estimator on the MNIPALS method

MSEMNIPALS

is the MSE value of the MNIPALS method

MAEMNIPALS

is the MAE value of the MNIPALS method

REMNIPALS

is the RE value of the MNIPALS method

GCVMNIPALS

is the GCV value of the MNIPALS method

timeMNIPALS

is the time cost of the MNIPALS method

Examples

 library(MASS)
 n=100;p=10;per=0.1
 X0=data=matrix(mvrnorm(n*p,0,1),n,p)
 m=round(per*n*p,digits=0)
 mr=sample(1:(n*p),m,replace=FALSE)
 X0[mr]=NA;data0=X0
 MNIPALS(data=data,data0=data0,real=FALSE,example=FALSE)

Caculate the estimator on the MRPCA method

Description

Caculate the estimator on the MRPCA method

Usage

MRPCA(data, data0, real = TRUE, example = FALSE)

Arguments

data

is the orignal data set

data0

is the missing data set

real

is to judge whether the data set is a real missing data set

example

is to judge whether the data set is a simulation example

Value

XMRPCA

is the estimator on the MRPCA method

MSEMRPCA

is the MSE value of the MRPCA method

MAEMRPCA

is the MAE value of the MRPCA method

REMRPCA

is the RE value of the MRPCA method

GCVMRPCA

is the GCV value of the MRPCA method

timeMRPCA

is the time cost of the MRPCA method

Examples

 library(MASS)
 library(MASS)
 n=100;p=10;per=0.1
 X0=data=matrix(mvrnorm(n*p,0,1),n,p)
 m=round(per*n*p,digits=0)
 mr=sample(1:(n*p),m,replace=FALSE)
 X0[mr]=NA;data0=X0
 MRPCA(data=data,data0=data0,real=FALSE,example=FALSE)

Caculate the estimator on the Mean method

Description

Caculate the estimator on the Mean method

Usage

Mean(data, data0, real = TRUE, example = FALSE)

Arguments

data

is the orignal data set

data0

is the missing data set

real

is to judge whether the data set is a real missing data set

example

is to judge whether the data set is a simulation example.

Value

XMean

is the estimator on the Mean method

MSEMean

is the MSE value of the Mean method

MAEMean

is the MAE value of the Mean method

REMean

is the RE value of the Mean method

GCVMean

is the GCV value of the Mean method

timeMean

is the time cost of the Mean method

Examples

 library(MASS)
 n=100;p=10;per=0.1
 X0=data=matrix(mvrnorm(n*p,0,1),n,p)
 m=round(per*n*p,digits=0)
 mr=sample(1:(n*p),m,replace=FALSE)
 X0[mr]=NA;data0=X0
 Mean(data=data,data0=data0,real=FALSE,example=FALSE)

PM2.5

Description

Beijing PM2.5

Usage

data("PM2.5")

Format

The format is: num [1:43824, 1:12] 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:43824] "1" "2" "3" "4" ... ..$ : chr [1:12] "year" "month" "day" "hour" ...

Details

It records 43824 daily measurements on 12 variables and there are 2067 missing values on 2067 measurements, accounting for 0.00393.

Source

UCI repository

References

X. Liang, T. Zou, B. Guo, S. Li, H. Zhang, S. Zhang, H. Huang, and S. Chen. Assessing Beijing's PM2.5 pollution: severity, weather impact, APEC and winter heating. Proceedings of the Royal Society A, 471(2182):1–20, 2015.

Examples

data(PM2.5)
## maybe str(PM2.5) ; plot(PM2.5) ...

Caculate the estimator on the SR method

Description

Caculate the estimator on the SR method

Usage

SR(data, data0, real = TRUE, example = FALSE)

Arguments

data

is the orignal data set

data0

is the missing data set

real

is to judge whether the data set is a real missing data set

example

is to judge whether the data set is a simulation example.

Value

XSR

is the estimator on the SR method

MSESR

is the MSE value of the SR method

MAESR

is the MAE value of the SR method

RESR

is the RE value of the SR method

GCVSR

is the GCV value of the SR method

Examples

 library(MASS)
 n=100;p=10;per=0.1
 X0=data=matrix(mvrnorm(n*p,0,1),n,p)
 m=round(per*n*p,digits=0)
 mr=sample(1:(n*p),m,replace=FALSE)
 X0[mr]=NA;data0=X0
 SR(data=data,data0=data0,real=FALSE,example=FALSE)

orange

Description

orange

Usage

data("orange")

Format

The format is: num [1:12, 1:8] 4.79 4.58 4.71 6.58 NA ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:12] "1" "2" "3" "4" ... ..$ : chr [1:8] "Color.intensity" "Odor.intensity" "Attack.intensity" "Sweet" ...

Details

There are 19 missing values in the data set, accounting for 19.79 percent.

Source

http://factominer.free.fr/missMDA/index.html

References

Josse J, Husson F (2016). missMDA: A Package for Handling Missing Values in Multivariate Data Analysis. Journal of Statistical Software, 70(1), 1–31.

Examples

data(orange)
## maybe str(orange) ; plot(orange) ...

ozone

Description

ozone

Usage

data("ozone")

Format

A data frame with 112 observations on the following 11 variables.

maxO3: a numeric vector
T9: a numeric vector
T12: a numeric vector
T15: a numeric vector
Ne9: a numeric vector
Ne12: a numeric vector
Ne15: a numeric vector
Vx9: a numeric vector
Vx12: a numeric vector
Vx15: a numeric vector
maxO3v: a numeric vector

Details

There are 115 missing values in it, accounting for 9.96 percent.

Source

http://factominer.free.fr/missMDA/index.html

References

Audigier, V., Husson, F., and Josse, J. (2014). A principal components method to impute missing values for mixed data. Advances in Data Analysis and Classification, 10(1), 5-26.

Examples

data(ozone)
## maybe str(ozone) ; plot(ozone) ...

review

Description

Travel reviews

Usage

data("review")

Format

The format is: num [1:980, 1:10] 0.93 1.02 1.22 0.45 0.51 0.99 0.9 0.74 1.12 0.7 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:980] "User_1" "User_2" "User_3" "User_4" ... ..$ : chr [1:10] "Category_1" "Category_2" "Category_3" "Category_4" ...

Details

980 travelers' reviews of 10 different types of travel facilities in East Asia

Source

UCI repository

References

Renjith, S., Sreekumar, A., and Jathavedan, M. (2018). Evaluation of partitioning clustering algorithms for processing social media data in tourism domain. 2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS), 127-131.

Examples

 
data(review) 
## maybe str(review) ; plot(review) ...