Type: | Package |
Title: | Simultaneous Semi-Parametric Estimation of Clustering and Regression |
Version: | 1.1.0 |
Maintainer: | Matthieu Marbac <matthieu.marbac-lourdelle@ensai.fr> |
Description: | Parameter estimation of regression models with fixed group effects, when the group variable is missing while group-related variables are available. Parametric and semi-parametric approaches described in Marbac et al. (2020) <doi:10.48550/arXiv.2012.14159> are implemented. |
Imports: | Rcpp, parallel, ALDqr, ald, quantreg, VGAM |
LinkingTo: | Rcpp, RcppArmadillo |
ByteCompile: | true |
URL: | https://arxiv.org/abs/2012.14159 |
LazyLoad: | yes |
Author: | Matthieu Marbac [aut, cre, cph], Mohammed Sedki [aut], Christophe Biernacki [aut], Vincent Vandewalle [aut] |
Collate: | 'cluspred.R' 'RcppExports.R' 'Singleblock_algo_NP.R' 'Singleblock_algo_Param.R' 'Singleblock_prediction.R' 'tool.R' 'TwoSteps_algo.R' 'Twosteps_computelogPDFwithZ.R' 'TwoSteps_Mstep.R' |
License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] |
Encoding: | UTF-8 |
Depends: | R (≥ 3.5) |
RoxygenNote: | 7.1.0 |
NeedsCompilation: | yes |
Packaged: | 2021-12-02 13:17:15 UTC; matt |
Repository: | CRAN |
Date/Publication: | 2021-12-02 13:50:09 UTC |
ClusPred.
Description
Parameter estimation of regression models with fixed group effects, when the group variable is missing while group-related variables are available.
Details
Package: | ClusPred |
Type: | Package |
Version: | 1.1.0 |
Date: | 2021-12-01 |
License: | GPL-3 |
LazyLoad: | yes |
References
Simultaneous semi-parametric estimation of clustering and regression, Matthieu Marbac and Mohammed Sedki and Christophe Biernacki and Vincent Vandewalle (2020) <arXiv:2012.14159>.
Function used for clustering and fitting the regression model
Description
Estimation of the group-variable Z based on covariates X and estimation of the parameters of the regression of Y on (U, Z)
Usage
cluspred(
y,
x,
u = NULL,
K = 2,
model.reg = "mean",
tau = 0.5,
simultaneous = TRUE,
np = TRUE,
nbinit = 20,
nbCPU = 1,
tol = 0.01,
band = (length(y)^(-1/5)),
seed = 134
)
Arguments
y |
numeric vector of the traget variable (must be numerical) |
x |
matrix used for clustering (can contain numerical and factors) |
u |
matrix of the covariates used for regression (can contain numerical and factors) |
K |
number of clusters |
model.reg |
indicates the type of the loss ("mean", "quantile", "expectile", "logcosh", "huber"). Only the losses "mean" and "quantile" are implemented if simultaneous=FALSE or np=FALSE |
tau |
specifies the level for the loss (quantile, expectile or huber) |
simultaneous |
oolean indicating whether the clustering and the regression are performed simultaneously (TRUE) or not (FALSE) |
np |
boolean indicating whether nonparameteric model is used (TRUE) or not (FALSE) |
nbinit |
number of random initializations |
nbCPU |
number of CPU only used for linux |
tol |
to specify the stopping rule |
band |
bandwidth selection |
seed |
value of the seed (used for drawing the starting points) |
Value
cluspred returns a list containing the model parameters (param), the posterior probabilities of cluster memberships (tik), the partition (zhat) and the (smoothed) loglikelihood)
References
Simultaneous semi-parametric estimation of clustering and regression, Matthieu Marbac and Mohammed Sedki and Christophe Biernacki and Vincent Vandewalle (2020) <arXiv:2012.14159>.
Examples
require(ClusPred)
# data loading
data(simdata)
# mean regression with two latent groups in parametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
np=FALSE, nbCPU = 1, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# loglikelihood
res$loglike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = FALSE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat
# median regression with two latent groups in nonparametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
model.reg = "quantile", tau = 0.5, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# smoothed loglikelihood
res$logSmoothlike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = TRUE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat
Prediction (clustering and target variable)
Description
Prediction for new observations
Usage
predictboth(x, u = NULL, result, np = FALSE)
Arguments
x |
covariates used for clustering |
u |
covariates of the regression (can be null) |
result |
results provided by function cluspred |
np |
boolean indicating whether nonparametric estimation is used (TRUE) or not (FALSE) |
Value
predictboth returns a list containing the predicted cluster membership (zhat) and the predicted value of the target variable (yhat).
Examples
require(ClusPred)
# data loading
data(simdata)
# mean regression with two latent groups in parametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
np=FALSE, nbCPU = 1, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# loglikelihood
res$loglike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = FALSE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat
# median regression with two latent groups in nonparametric framework and two covariates
res <- cluspred(simdata$y, simdata$x, simdata$u, K=2,
model.reg = "quantile", tau = 0.5, nbinit = 10)
# coefficient of the regression
res$param$beta
# proportions of the latent groups
res$param$pi
# posterior probability of the group memberships
head(res$tik)
# partition
res$zhat
# smoothed loglikelihood
res$logSmoothlike
# prediction (for possible new observations)
pred <- predictboth(simdata$x, simdata$u, res, np = TRUE)
# predicted cluster membreships
pred$zhat
# predicted value of the target variable
pred$yhat
Simulated data
Description
simulated data used for the pacakge examples.
Examples
data(simdata)