Type: | Package |
Title: | Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms |
Version: | 3.0.1 |
Date: | 2024-9-27 |
Maintainer: | Jacob Seedorff <jacob-seedorff@uiowa.edu> |
URL: | https://github.com/JacobSeedorff21/BranchGLM |
BugReports: | https://github.com/JacobSeedorff21/BranchGLM/issues |
Description: | Performs efficient and scalable glm best subset selection using a novel implementation of a branch and bound algorithm. To speed up the model fitting process, a range of optimization methods are implemented in 'RcppArmadillo'. Parallel computation is available using 'OpenMP'. |
License: | Apache License (≥ 2) |
Depends: | R (≥ 3.3.0) |
Imports: | Rcpp (≥ 1.0.7), methods, stats, graphics |
LinkingTo: | Rcpp, RcppArmadillo, BH |
RoxygenNote: | 7.3.2 |
Encoding: | UTF-8 |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
NeedsCompilation: | yes |
Packaged: | 2024-09-28 03:37:04 UTC; Jacob |
Author: | Jacob Seedorff [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2024-09-28 04:00:02 UTC |
BranchGLM: Efficient Best Subset Selection for GLMs via Branch and Bound Algorithms
Description
Performs efficient and scalable glm best subset selection using a novel implementation of a branch and bound algorithm. To speed up the model fitting process, a range of optimization methods are implemented in 'RcppArmadillo'. Parallel computation is available using 'OpenMP'.
Author(s)
Maintainer: Jacob Seedorff jacob-seedorff@uiowa.edu
References
Seedorff J, Cavanaugh JE. Assessing Variable Importance for Best Subset Selection. Entropy. 2024; 26(9):801. doi:10.3390/e26090801
See Also
Useful links:
Report bugs at https://github.com/JacobSeedorff21/BranchGLM/issues
Examples
# Using iris data to demonstrate package usage
Data <- iris
# Fitting linear regression model
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
Fit
# Doing branch and bound best subset selection
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC",
showprogress = FALSE, bestmodels = 10)
VS
## Plotting results
plot(VS, ptype = "variables")
Fits GLMs
Description
Fits generalized linear models (GLMs) via RcppArmadillo with the ability to perform some computation in parallel with OpenMP.
Usage
BranchGLM(
formula,
data = NULL,
family,
link,
offset = NULL,
method = "Fisher",
grads = 10,
parallel = FALSE,
nthreads = 8,
tol = 1e-06,
maxit = NULL,
init = NULL,
fit = TRUE,
contrasts = NULL,
keepData = TRUE,
keepY = TRUE
)
BranchGLM.fit(
x,
y,
family,
link,
offset = NULL,
method = "Fisher",
grads = 10,
parallel = FALSE,
nthreads = 8,
init = NULL,
maxit = NULL,
tol = 1e-06
)
Arguments
formula |
a formula for the model. |
data |
an optional data.frame, list or environment (or object coercible by
as.data.frame to a data.frame), containing the variables in formula.
Neither a matrix nor an array will be accepted. If not found in |
family |
the distribution used to model the data, one of "gaussian",
"gamma", "binomial", or "poisson". A |
link |
the link used to link the mean structure to the linear predictors. One of
"identity", "logit", "probit", "cloglog", "sqrt", "inverse", or "log". The accepted
links depend on the specified family, see more in details. This only needs to be
supplied if a string is supplied for |
offset |
the offset vector, by default the zero vector is used. |
method |
one of "Fisher", "BFGS", or "LBFGS". BFGS and L-BFGS are quasi-newton methods which are typically faster than Fisher's scoring when there are many covariates (at least 50). |
grads |
a positive integer to denote the number of gradients used to
approximate the inverse information with, only for |
parallel |
a logical value to indicate if parallelization should be used. |
nthreads |
a positive integer to denote the number of threads used with OpenMP,
only used if |
tol |
a positive number to denote the tolerance used to determine model convergence. |
maxit |
a positive integer to denote the maximum number of iterations performed. The default for Fisher's scoring is 50 and for the other methods the default is 200. |
init |
a numeric vector of initial values for the betas, if not specified then they are automatically selected via linear regression with the transformation specified by the link function. This is ignored for linear regression models. |
fit |
a logical value to indicate whether to fit the model or not. |
contrasts |
see |
keepData |
a logical value to indicate whether or not to store a copy of
data and the design matrix, the default is TRUE. If this is FALSE, then the
results from this cannot be used inside of |
keepY |
a logical value to indicate whether or not to store a copy of y,
the default is TRUE. If this is FALSE, then the binomial GLM helper functions
may not work and this cannot be used inside of |
x |
design matrix used for the fit, must be numeric. |
y |
outcome vector, must be numeric. |
Details
Fitting
Can use BFGS, L-BFGS, or Fisher's scoring to fit the GLM. BFGS and L-BFGS are
typically faster than Fisher's scoring when there are at least 50 covariates
and Fisher's scoring is typically best when there are fewer than 50 covariates.
This function does not currently support the use of weights. In the special
case of gaussian regression with identity link the method
argument is ignored
and the normal equations are solved directly.
The models are fit in C++ by using Rcpp and RcppArmadillo. In order to help
convergence, each of the methods makes use of a backtracking line-search using
the strong Wolfe conditions to find an adequate step size. There are
three conditions used to determine convergence, the first is whether there is a
sufficient decrease in the negative log-likelihood, the second is whether
the l2-norm of the score is sufficiently small, and the last condition is
whether the change in each of the beta coefficients is sufficiently
small. The tol
argument controls all of these criteria. If the algorithm fails to
converge, then iterations
will be -1.
All observations with any missing values are removed before model fitting.
BranchGLM.fit
can be faster than calling BranchGLM
if the
x matrix and y vector are already available, but doesn't return as much information.
The object returned by BranchGLM.fit
is not of class BranchGLM
, so
all of the methods for BranchGLM
objects such as predict
or
VariableSelection
cannot be used.
Dispersion Parameter
The dispersion parameter for binomial and Poisson regression is always fixed to be 1. For gaussian and gamma regression, the MLE of the dispersion parameter is used for the calculation of the log-likelihood and the Pearson estimator of the dispersion parameter is used for the calculation of standard errors for the coefficient estimates.
Families and Links
The binomial family accepts "cloglog", "log", "logit", and "probit" as possible link functions. The gamma and gaussian families accept "identity", "inverse", "log", and "sqrt" as possible link functions. The Poisson family accepts "identity", "log", and "sqrt" as possible link functions.
Value
BranchGLM
returns a BranchGLM
object which is a list with the following components
coefficients |
a data.frame with the coefficient estimates, SEs, Wald test statistics, and p-values |
iterations |
number of iterations it took the algorithm to converge, if the algorithm failed to converge then this is -1 |
dispersion |
a vector of length 2 with the MLE of the dispersion parameter first and the Pearson estimator of the dispersion parameter second |
logLik |
the log-likelihood of the fitted model |
vcov |
the variance-covariance matrix of the fitted model |
resDev |
the residual deviance of the fitted model |
AIC |
the AIC of the fitted model |
preds |
predictions from the fitted model |
linpreds |
linear predictors from the fitted model |
residuals |
a numeric vector with the Pearson residuals |
variance |
a numeric vector with the variance evaluated at the final coefficient estimates |
tol |
tolerance used to fit the model |
maxit |
maximum number of iterations used to fit the model |
formula |
formula used to fit the model |
method |
iterative method used to fit the model |
grads |
number of gradients used to approximate inverse information for L-BFGS |
y |
y vector used in the model, not included if |
x |
design matrix used to fit the model, not included if |
rownames |
rownames taken from the design matrix |
offset |
offset vector in the model, not included if |
fulloffset |
supplied offset vector, not included if |
data |
original |
mf |
the model frame, not included if |
numobs |
number of observations in the design matrix |
names |
names of the predictor variables |
yname |
name of y variable |
parallel |
whether parallelization was employed to speed up model fitting process |
missing |
number of missing values removed from the original dataset |
link |
link function used to model the data |
family |
family used to model the data |
ylevel |
the levels of y, only included for binomial glms |
xlev |
the levels of the factors in the dataset |
terms |
the terms object used |
BranchGLM.fit
returns a list with the following components
coefficients |
a data.frame with the coefficients estimates, SEs, Wald test statistics, and p-values |
iterations |
number of iterations it took the algorithm to converge, if the algorithm failed to converge then this is -1 |
dispersion |
a vector of length 2 with the MLE of the dispersion parameter first and the Pearson estimator of the dispersion parameter second |
logLik |
the log-likelihood of the fitted model |
vcov |
the variance-covariance matrix of the fitted model |
resDev |
the residual deviance of the fitted model |
AIC |
the AIC of the fitted model |
preds |
predictions from the fitted model |
linpreds |
linear predictors from the fitted model |
residuals |
a numeric vector with the Pearson residuals |
variance |
a numeric vector with the variance evaluated at the final coefficient estimates |
tol |
tolerance used to fit the model |
maxit |
maximum number of iterations used to fit the model |
References
McCullagh, P., & Nelder, J. A. (1989). Generalized Linear Models (2nd ed.). Chapman & Hall.
See Also
predict.BranchGLM, coef.BranchGLM, VariableSelection, confint.BranchGLM, logLik.BranchGLM
Examples
Data <- iris
# Linear regression
## Using BranchGLM
BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
## Using BranchGLM.fit
x <- model.matrix(Sepal.Length ~ ., data = Data)
y <- Data$Sepal.Length
BranchGLM.fit(x, y, family = "gaussian", link = "identity")
# Gamma regression
## Using BranchGLM
BranchGLM(Sepal.Length ~ ., data = Data, family = "gamma", link = "log")
### init
BranchGLM(Sepal.Length ~ ., data = Data, family = "gamma", link = "log",
init = rep(0, 6), maxit = 50, tol = 1e-6, contrasts = NULL)
### method
BranchGLM(Sepal.Length ~ ., data = Data, family = "gamma", link = "log",
init = rep(0, 6), maxit = 50, tol = 1e-6, contrasts = NULL, method = "LBFGS")
### offset
BranchGLM(Sepal.Length ~ ., data = Data, family = "gamma", link = "log",
init = rep(0, 6), maxit = 50, tol = 1e-6, contrasts = NULL,
offset = Data$Sepal.Width)
## Using BranchGLM.fit
x <- model.matrix(Sepal.Length ~ ., data = Data)
y <- Data$Sepal.Length
BranchGLM.fit(x, y, family = "gamma", link = "log", init = rep(0, 6),
maxit = 50, tol = 1e-6, offset = Data$Sepal.Width)
Defunct functions in BranchGLM
Description
These functions are defunct and no longer available.
Usage
fit.summary.BranchGLMVS(...)
fit.BranchGLMVS(...)
Details
Defunct functions are: fit.BranchGLMVS and fit.summary.BranchGLMVS
Internal BranchGLM Functions
Description
Internal BranchGLM Functions.
Details
These are not intended for use by users, these are Rcpp functions that do not check the arguments, so improper usage may result in R crashing.
Cindex/AUC
Description
Calculates the c-index/AUC.
Usage
Cindex(object, ...)
AUC(object, ...)
## S3 method for class 'numeric'
Cindex(object, y, ...)
## S3 method for class 'BranchGLM'
Cindex(object, ...)
## S3 method for class 'BranchGLMROC'
Cindex(object, ...)
Arguments
object |
a |
... |
further arguments passed to other methods. |
y |
Observed values, can be a numeric vector of 0s and 1s, a two-level factor vector, or a logical vector. |
Details
Uses trapezoidal rule to calculate AUC when given a BranchGLMROC object and uses Mann-Whitney U to calculate it otherwise. The trapezoidal rule method is less accurate, so the two methods may give different results.
Value
A number corresponding to the c-index/AUC.
Examples
Data <- ToothGrowth
Fit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "logit")
Cindex(Fit)
AUC(Fit)
Plotting Multiple ROC Curves
Description
Plotting Multiple ROC Curves
Usage
MultipleROCCurves(
...,
legendpos = "bottomright",
title = "ROC Curves",
colors = NULL,
names = NULL,
lty = 1,
lwd = 1
)
Arguments
... |
any number of |
legendpos |
a keyword to describe where to place the legend, such as "bottomright". The default is "bottomright" |
title |
title for the plot. |
colors |
vector of colors to be used on the ROC curves. |
names |
vector of names used to create a legend for the ROC curves. |
lty |
vector of linetypes used to create the ROC curves or a single linetype to be used for all ROC curves. |
lwd |
vector of linewidths used to create the ROC curves or a single linewidth to be used for all ROC curves. |
Value
This only produces a plot, nothing is returned.
Examples
Data <- ToothGrowth
### Logistic ROC
LogisticFit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "logit")
LogisticROC <- ROC(LogisticFit)
### Probit ROC
ProbitFit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "probit")
ProbitROC <- ROC(ProbitFit)
### Cloglog ROC
CloglogFit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "cloglog")
CloglogROC <- ROC(CloglogFit)
### Plotting ROC curves
MultipleROCCurves(LogisticROC, ProbitROC, CloglogROC,
names = c("Logistic ROC", "Probit ROC", "Cloglog ROC"))
ROC Curve
Description
Creates an ROC curve.
Usage
ROC(object, ...)
## S3 method for class 'numeric'
ROC(object, y, ...)
## S3 method for class 'BranchGLM'
ROC(object, ...)
Arguments
object |
a |
... |
further arguments passed to other methods. |
y |
observed values, can be a numeric vector of 0s and 1s, a two-level factor vector, or a logical vector. |
Value
A BranchGLMROC
object which can be plotted with plot()
. The AUC can also
be calculated using AUC()
.
Examples
Data <- ToothGrowth
Fit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "logit")
MyROC <- ROC(Fit)
plot(MyROC)
Confusion Matrix
Description
Creates a confusion matrix and calculates related measures.
Usage
Table(object, ...)
## S3 method for class 'numeric'
Table(object, y, cutoff = 0.5, ...)
## S3 method for class 'BranchGLM'
Table(object, cutoff = 0.5, ...)
Arguments
object |
a |
... |
further arguments passed to other methods. |
y |
observed values, can be a numeric vector of 0s and 1s, a two-level factor vector, or a logical vector. |
cutoff |
cutoff for predicted values, the default is 0.5. |
Value
A BranchGLMTable
object which is a list with the following components
table |
a matrix corresponding to the confusion matrix |
accuracy |
a number corresponding to the accuracy |
sensitivity |
a number corresponding to the sensitivity |
specificity |
a number corresponding to the specificity |
PPV |
a number corresponding to the positive predictive value |
levels |
a vector corresponding to the levels of the response variable |
Examples
Data <- ToothGrowth
Fit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "logit")
Table(Fit)
Computes Exact or Approximate L0-penalization based Variable Importance for GLMs
Description
Gets exact or approximate L0-penalization based variable importance values for generalized linear models (GLMs). More details about what the variable importance values are can be found in the details section.
Usage
VariableImportance(
object,
VIMethod = "simultaneous",
parallel = FALSE,
nthreads = 8,
showprogress = TRUE
)
Arguments
object |
an object of class |
VIMethod |
one of "separate" or "simultaneous" to denote the method used to find
the variable importance values. This is ignored if the type of variable selection
employed in |
parallel |
a logical value to indicate if parallelization should be used. |
nthreads |
number of threads used with OpenMP, only used if |
showprogress |
a logical value to indicate whether or not to show progress updates. |
Details
Note that variable importance values can only be found for sets of variables that are not kept through the model selection process. More details about the variable importance values will be made available in an upcoming paper.
When a branch and bound algorithm is used in object
, then the exact variable
importance values are computed. When a heuristic method is used, then approximate
variable importance values are computed based on the specified heuristic method.
Value
A BranchGLMVI
object which is a list with the following components
results |
a data.frame with the variable importance values and degrees of freedom |
metric |
metric used to select the best models |
numchecked |
number of models fit |
VS |
the supplied |
with |
a numeric matrix with the best models that include each set of variables |
withmetrics |
a numeric vector with the metric values for the best models with each set of variables |
without |
a numeric matrix with the best models that exclude each set of variables |
withoutmetrics |
a numeric vector with the metric values for the best models without each set of variables |
References
Seedorff J, Cavanaugh JE. Assessing Variable Importance for Best Subset Selection. Entropy. 2024; 26(9):801. doi:10.3390/e26090801
See Also
VariableImportance.boot, barplot.BranchGLMVI
Examples
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
# Doing branch and bound selection
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC",
showprogress = FALSE)
# Getting variable importance
VI <- VariableImportance(VS, showprogress = FALSE)
VI
# Plotting variable importance
oldmar <- par("mar")
par(mar = c(4, 6, 3, 1) + 0.1)
barplot(VI)
par(mar = oldmar)
Performs Parametric Bootstrap for Modified Variable Importance
Description
Performs a version of the parametric bootstrap to create an approximate null distribution for the modified variable importance values in order to get approximate p-values.
Usage
VariableImportance.boot(object, ...)
## S3 method for class 'BranchGLMVS'
VariableImportance.boot(
object,
nboot = 100,
parallel = FALSE,
nthreads = 8,
showprogress = TRUE,
...
)
## S3 method for class 'BranchGLMVI'
VariableImportance.boot(
object,
nboot = 100,
parallel = FALSE,
nthreads = 8,
showprogress = TRUE,
...
)
Arguments
object |
a |
... |
further arguments to VariableImportance when |
nboot |
the number of bootstrap replications to perform. |
parallel |
a logical value to indicate if parallelization should be used. |
nthreads |
number of threads used with OpenMP, only used if |
showprogress |
a logical value to indicate if a progress bar should be displayed. |
Details
This performs a version of the parametric bootstrap with the modified variable importance values to generate approximate p-values for the sets of variables. We are currently working on a paper that describes this function in further detail.
Value
a BranchGLMVI.boot
object which is a list with the following components
summary |
a data.frame with the observed modified variable importance values and approximate p-values |
results |
a numeric matrix with the modified variable importance values for each set of bootstrap replications |
pvals |
a numeric vector with the approximate p-values based on modified variable importance |
nboot |
the number of bootstrap replications performed |
metric |
the metric used to calculate the modified variable importance values |
VI |
the supplied |
References
Seedorff J, Cavanaugh JE. Assessing Variable Importance for Best Subset Selection. Entropy. 2024; 26(9):801. doi:10.3390/e26090801
See Also
hist.BranchGLMVI.boot, boxplot.BranchGLMVI.boot, VariableImportance
Examples
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
# Doing branch and bound selection
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC",
showprogress = FALSE)
# Getting approximate null distributions
set.seed(40174)
myBoot <- VariableImportance.boot(VS, showprogress = FALSE)
myBoot
# Plotting histogram of results for second set of variables
hist(myBoot, which = 2)
# Plotting boxplots of results
oldmar <- par("mar")
par(mar = c(4, 6, 3, 1) + 0.1)
boxplot(myBoot, las = 1)
par(mar = oldmar)
Variable Selection for GLMs
Description
Performs forward selection, multiple different variants of backward elimination, and efficient best subset variable selection with information criterion for generalized linear models (GLMs). Best subset selection is performed with branch and bound algorithms to greatly speed up the process and backward elimination can be performed with bounding algorithms to speed it up.
Usage
VariableSelection(object, ...)
## S3 method for class 'formula'
VariableSelection(
object,
data = NULL,
family,
link,
offset = NULL,
method = "Fisher",
type = "switch branch and bound",
metric = "AIC",
bestmodels = NULL,
cutoff = NULL,
keep = NULL,
keepintercept = TRUE,
maxsize = NULL,
grads = 10,
parallel = FALSE,
nthreads = 8,
tol = 1e-06,
maxit = NULL,
contrasts = NULL,
showprogress = TRUE,
...
)
## S3 method for class 'BranchGLM'
VariableSelection(
object,
type = "switch branch and bound",
metric = "AIC",
bestmodels = NULL,
cutoff = NULL,
keep = NULL,
keepintercept = TRUE,
maxsize = NULL,
parallel = FALSE,
nthreads = 8,
showprogress = TRUE,
...
)
Arguments
object |
a formula or a |
... |
further arguments. |
data |
a data.frame, list or environment (or object coercible by
as.data.frame to a data.frame), containing the variables in formula.
Neither a matrix nor an array will be accepted. If not found in |
family |
the distribution used to model the data, one of "gaussian",
"gamma", "binomial", or "poisson". A |
link |
the link used to link the mean structure to the linear predictors. One of
"identity", "logit", "probit", "cloglog", "sqrt", "inverse", or "log". This only needs to be
supplied if a string is supplied for |
offset |
the offset vector, by default the zero vector is used. |
method |
one of "Fisher", "BFGS", or "LBFGS". Fisher's scoring is recommended for forward selection and the branch and bound algorithms since they will typically fit many models with a small number of covariates. |
type |
one of "forward", "backward", "fast backward", "double backward", "fast double backward", "branch and bound", "backward branch and bound", or "switch branch and bound" to indicate the type of variable selection to perform. The default value is "switch branch and bound". See more about these algorithms in details |
metric |
the metric used to choose the best models, the default is "AIC", but "BIC" and "HQIC" are also available. AIC is the Akaike information criterion, BIC is the Bayesian information criterion, and HQIC is the Hannan-Quinn information criterion. |
bestmodels |
a positive integer to indicate the number of the best models to find according to the chosen metric or NULL. If this is NULL, then cutoff is used instead. This is only used for the branch and bound algorithms. |
cutoff |
a non-negative number which indicates that the function should return all models that have a metric value within cutoff of the best metric value or NULL. Only one of this or bestmodels should be specified and when both are NULL a cutoff of 0 is used. This is only used for the branch and bound algorithms. |
keep |
a character vector of names to denote variables that must be in the models. |
keepintercept |
a logical value to indicate whether to keep the intercept in all models, only used if an intercept is included in the formula. |
maxsize |
a positive integer to denote the maximum number of variables to
consider in a single model, the default is the total number of variables.
This number adds onto any variables specified in keep. This argument only works
for |
grads |
a positive integer to denote the number of gradients used to
approximate the inverse information with, only for |
parallel |
a logical value to indicate if parallelization should be used. |
nthreads |
a positive integer to denote the number of threads used with OpenMP,
only used if |
tol |
a positive number to denote the tolerance used to determine model convergence. |
maxit |
a positive integer to denote the maximum number of iterations performed. The default for Fisher's scoring is 50 and for the other methods the default is 200. |
contrasts |
see |
showprogress |
a logical value to indicate whether to show progress updates for branch and bound algorithms. |
Details
Variable Selection Details
The supplied formula or the formula from the fitted model is treated as the upper model. The variables specified in keep along with an intercept (if included in formula and keepintercept = TRUE) is the lower model. Factor variables are either kept in their entirety or entirely removed and interaction terms are properly handled. All observations that have any missing values in the upper model are removed.
Stepwise Methods
There are 5 different stepwise variable selection algorithms that are available. These are forward selection, backward elimination, fast backward elimination, double backward elimination, and fast double backward elimination. All of these are heuristic algorithms, so the best model found by them may not be the optimal model.
Backward Elimination
Fast backward elimination should give the same results as backward elimination, but it makes use of the bounding techniques used by the branch and bound algorithms to make it faster. Fast backward elimination can give slightly different results than backward elimination if the GLM solver has difficulties fitting some of the larger models.
Double Backward Elimination
Double backward elimination and fast double backward elimination are a variant of backward elimination where up to 2 variables can be removed in one step instead of just 1. This typically results in higher quality models, but can also be much slower. The bounding algorithm used in fast double backward elimination makes it much faster.
Branch and Bound Methods
The branch and bound algorithm is an efficient algorithm used to find the optimal models. The backward branch and bound algorithm is very similar to the branch and bound algorithm, except it tends to be faster when the best models contain most of the variables. The switch branch and bound algorithm is a combination of the two algorithms and is typically the fastest of the 3 branch and bound algorithms. All of the branch and bound algorithms are guaranteed to find the optimal models (up to numerical precision).
GLM Fitting
Fisher's scoring is recommended for branch and bound selection and forward selection. L-BFGS may be faster for the backward elimination and double backward elimination algorithms, especially when there are many variables.
Value
A BranchGLMVS
object which is a list with the following components
initmodel |
the |
numchecked |
number of models fit |
names |
character vector of the names of the predictor variables |
order |
the order the variables were added to the model or removed from the model, this is only included for the stepwise algorithms |
type |
type of variable selection employed |
optType |
whether the type specified used a heuristic or exact algorithm |
metric |
metric used to select the best models |
bestmodels |
numeric matrix used to describe the best models for the branch and bound algorithms or a numeric matrix describing the models along the path taken for stepwise algorithms |
bestmetrics |
numeric vector with the best metrics found in the search for the branch and bound algorithms or a numeric vector with the metric values along the path taken for stepwise algorithms |
beta |
numeric matrix of beta coefficients for the models in bestmodels |
cutoff |
the cutoff that was used, this is set to -1 if bestmodels was used instead or if a stepwise algorithm was used |
keep |
vector of which variables were kept through the selection process |
keepintercept |
a boolean value denoting whether to keep the intercept through the selection process or not |
See Also
plot.BranchGLMVS, coef.BranchGLMVS, predict.BranchGLMVS, summary.BranchGLMVS
Examples
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian",
link = "identity")
# Doing branch and bound selection
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC",
bestmodels = 10, showprogress = FALSE)
VS
## Plotting the BIC of the best models
plot(VS, type = "b")
## Getting the coefficients of the best model according to BIC
FinalModel <- coef(VS, which = 1)
FinalModel
# Now doing it in parallel (although it isn't necessary for this dataset)
parVS <- VariableSelection(Fit, type = "branch and bound", parallel = TRUE,
metric = "BIC", bestmodels = 10, showprogress = FALSE)
## Getting the coefficients of the best model according to BIC
FinalModel <- coef(parVS, which = 1)
FinalModel
# Using a formula
formVS <- VariableSelection(Sepal.Length ~ ., data = Data, family = "gaussian",
link = "identity", metric = "BIC", type = "branch and bound", bestmodels = 10,
showprogress = FALSE)
## Getting the coefficients of the best model according to BIC
FinalModel <- coef(formVS, which = 1)
FinalModel
# Using the keep argument
keepVS <- VariableSelection(Fit, type = "branch and bound",
keep = c("Species", "Petal.Width"), metric = "BIC", bestmodels = 4,
showprogress = FALSE)
keepVS
## Getting the coefficients from the fourth best model according to BIC when
## keeping Petal.Width and Species in every model
FinalModel <- coef(keepVS, which = 4)
FinalModel
# Treating categorical variable beta parameters separately
## This function automatically groups together parameters from a categorical variable
## to avoid this, you need to create the indicator variables yourself
x <- model.matrix(Sepal.Length ~ ., data = iris)
Sepal.Length <- iris$Sepal.Length
Data <- cbind.data.frame(Sepal.Length, x[, -1])
VSCat <- VariableSelection(Sepal.Length ~ ., data = Data, family = "gaussian",
link = "identity", metric = "BIC", bestmodels = 10, showprogress = FALSE)
VSCat
## Plotting results
plot(VSCat, cex.names = 0.75)
Bar Plot Method for BranchGLMVI Objects
Description
Creates a bar plot with the L0-penalization based variable importance values.
Usage
## S3 method for class 'BranchGLMVI'
barplot(
height,
modified = FALSE,
horiz = TRUE,
decreasing = FALSE,
which = "all",
las = ifelse(horiz, 1, 2),
main = NULL,
lab = NULL,
...
)
Arguments
height |
a |
modified |
a logical value indicating if the modified variable importance values should be plotted. |
horiz |
a logical value to indicate whether bars should be horizontal. |
decreasing |
a logical value to indicate whether variables should be sorted in decreasing order. Can use NA if no ordering is desired. |
which |
which variable importance values to plot, can use a numeric vector of indices, a character vector of names, or "all" for all variables. |
las |
the style of axis labels, see par for more details. |
main |
the overall title for the plot. |
lab |
the title for the axis corresponding to the variable importance values. |
... |
further arguments passed to barplot.default. |
Value
This only produces a plot, nothing is returned.
Examples
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
# Doing branch and bound selection
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC",
showprogress = FALSE)
# Getting variable importance
VI <- VariableImportance(VS, showprogress = FALSE)
VI
# Plotting variable importance
oldmar <- par("mar")
par(mar = c(4, 6, 3, 1) + 0.1)
barplot(VI)
par(mar = oldmar)
Box Plot Method for BranchGLMVI.boot Objects
Description
Creates box-and-whisker plots of approximate null distributions for the modified variable importance values.
Usage
## S3 method for class 'BranchGLMVI.boot'
boxplot(
x,
which = "all",
linecol = "red",
linelwd = 2,
horizontal = TRUE,
lim = NULL,
show.names = TRUE,
lab = "Modified Variable Importance",
main = NULL,
las = ifelse(horizontal, 1, 2),
...
)
Arguments
x |
a |
which |
which approximate null distributions to plot, can use a numeric vector of indices, a character vector of names, or "all" for all variables. The default is to create box-and-whisker plots for each set of variables that are not kept in each model. |
linecol |
the color of the line which indicates the observed modified variable importance values. |
linelwd |
the width of the line which indicates the observed modified variable importance values. |
horizontal |
a logical value indicating if the boxplots should be horizontal. |
lim |
a numeric vector of length 2, giving the coordinates range. |
show.names |
set to TRUE or FALSE to override the defaults on whether an axis label is printed for each group. |
lab |
a label for the axis corresponding to the modified variable importance values. |
main |
a main title for the plot. |
las |
the style of axis labels, see more at par. |
... |
further arguments passed to boxplot.default. |
Value
This only produces a plot, nothing is returned.
See Also
Examples
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
# Doing branch and bound selection
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC",
showprogress = FALSE)
# Getting approximate null distributions
set.seed(40174)
myBoot <- VariableImportance.boot(VS, showprogress = FALSE)
# Plotting boxplots of selected sets of variables
oldmar <- par("mar")
par(mar = c(4, 6, 3, 1) + 0.1)
boxplot(myBoot, las = 1)
par(mar = oldmar)
# Plotting boxplots of selected sets of variables
boxplot(myBoot, las = 1, cex.axis = 0.55)
Extract Coefficients from BranchGLM Objects
Description
Extracts beta coefficients from BranchGLM
objects.
Usage
## S3 method for class 'BranchGLM'
coef(object, type = "estimates", ...)
Arguments
object |
a |
type |
"estimates" to only return the coefficient estimates or "all" to return the estimates along with SEs, test statistics, and p-values. |
... |
further arguments passed to or from other methods. |
Value
A named vector with the corresponding coefficient estimates or a data.frame with the coefficient estimates along with SEs, test statistics, and p-values.
Examples
Data <- iris
# Linear regression model
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
Fit
# Getting only coefficient estimates
coef(Fit, type = "estimates")
# Getting coefficient estimates along with SEs, tests, and p-values
coef(Fit, type = "all")
Extract Coefficients from BranchGLMVS or summary.BranchGLMVS Objects
Description
Extracts beta coefficients from BranchGLMVS or summary.BranchGLMVS objects.
Usage
## S3 method for class 'BranchGLMVS'
coef(object, which = 1, ...)
## S3 method for class 'summary.BranchGLMVS'
coef(object, which = 1, ...)
Arguments
object |
a |
which |
a numeric vector of indices or "all" to indicate which models to get coefficients from, the default is 1 which is used for the best model. For the branch and bound algorithms the number k is used for the kth best model and for the stepwise algorithms the number k is used for the model that is k - 1 steps away from the final model. |
... |
ignored. |
Value
A numeric matrix with the corresponding coefficient estimates.
Examples
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data,
family = "gaussian", link = "identity")
# Doing branch and bound selection
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC",
bestmodels = 10, showprogress = FALSE)
## Getting coefficients from best model
coef(VS, which = 1)
## Getting coefficients from all best models
coef(VS, which = "all")
Likelihood Ratio Confidence Intervals for Beta Coefficients for BranchGLM Objects
Description
Finds profile likelihood ratio confidence intervals for beta coefficients with the ability to calculate the intervals in parallel.
Usage
## S3 method for class 'BranchGLM'
confint(object, parm, level = 0.95, parallel = FALSE, nthreads = 8, ...)
Arguments
object |
a |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
the confidence level required. |
parallel |
a logical value to indicate if parallelization should be used. |
nthreads |
a positive integer to denote the number of threads used with OpenMP,
only used if |
... |
further arguments passed from other methods. |
Details
Endpoints of the confidence intervals that couldn't be found by the algorithm are filled in with NA. When there is a lot of multicollinearity in the data the algorithm may have problems finding many of the intervals.
Value
An object of class BranchGLMCIs
which is a list with the following components.
CIs |
a numeric matrix with the confidence intervals |
level |
the supplied level |
MLE |
a numeric vector of the MLEs of the coefficients |
See Also
Examples
Data <- iris
### Fitting linear regression model
mymodel <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
### Getting confidence intervals
CIs <- confint(mymodel, level = 0.95)
CIs
### Plotting CIs
plot(CIs, mary = 7, cex.y = 0.9)
Extract the Deviance
Description
Extracts the deviance from BranchGLM
objects.
Usage
## S3 method for class 'BranchGLM'
deviance(object, ...)
Arguments
object |
a |
... |
further arguments passed to or from other methods. |
Value
A single number denoting the deviance.
Extract AIC from BranchGLM Objects
Description
Computes the (generalized) Akaike An Information Criterion for BranchGLM objects.
Usage
## S3 method for class 'BranchGLM'
extractAIC(fit, scale, k = 2, ...)
Arguments
fit |
a |
scale |
ignored. |
k |
a non-negative number specifying the ‘weight’ of the equivalent degrees of freedom (= edf) part in the AIC formula. |
... |
further arguments passed to or from other methods. |
Value
A numeric vector of length 2, with first and second elements giving
edf |
the ‘equivalent degrees of freedom’ for |
AIC |
the (generalized) Akaike Information Criterion for |
Examples
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
extractAIC(Fit)
Extract Family from BranchGLM Objects
Description
Extracts family from BranchGLM
objects.
Usage
## S3 method for class 'BranchGLM'
family(object, ...)
Arguments
object |
a |
... |
further arguments passed to or from other methods. |
Value
An object of class family
.
Note
Only the family
and link
components of family
objects are used by the
BranchGLM
package. The other components of the family
object may not
be the same as what is used in the fitting process for BranchGLM
.
Extract Model Formula from BranchGLM Objects
Description
Extracts model formula from BranchGLM
objects.
Usage
## S3 method for class 'BranchGLM'
formula(x, ...)
Arguments
x |
a |
... |
further arguments passed to or from other methods. |
Value
A formula representing the model used to obtain x
.
Histogram Method for BranchGLMVI.boot Objects
Description
Creates histograms of approximate null distributions for the modified variable importance values.
Usage
## S3 method for class 'BranchGLMVI.boot'
hist(
x,
which = "all",
linecol = "red",
linelwd = 2,
xlim = NULL,
xlab = "Modified Variable Importance",
main = NULL,
...
)
Arguments
x |
a |
which |
which approximate null distributions to plot, can use a numeric vector of indices, a character vector of names, or "all" for all variables. The default is to create histograms for each set of variables that are not kept in each model. |
linecol |
the color of the line which indicates the observed modified variable importance values. |
linelwd |
the width of the line which indicates the observed modified variable importance values. |
xlim |
a numeric vector of length 2, giving the x coordinates range. |
xlab |
a label for the x axis. |
main |
a main title for the plot. |
... |
further arguments passed to hist.default. |
Value
This only produces a plot, nothing is returned.
See Also
Examples
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
# Doing branch and bound selection
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC",
showprogress = FALSE)
# Getting approximate null distributions
set.seed(40174)
myBoot <- VariableImportance.boot(VS, showprogress = FALSE)
# Plotting histograms of second set of variables
hist(myBoot, which = 2)
# Plotting histograms of third set of variables
hist(myBoot, which = 3, linecol = "blue", linelwd = 5)
Extract Log-Likelihood from BranchGLM Objects
Description
Extracts log-likelihood from BranchGLM
objects.
Usage
## S3 method for class 'BranchGLM'
logLik(object, ...)
Arguments
object |
a |
... |
further arguments passed to or from other methods. |
Value
An object of class logLik
which is a number corresponding to
the log-likelihood with the following attributes: "df" (degrees of freedom)
and "nobs" (number of observations).
Extract Model Frame from a BranchGLM Object
Description
Extracts model frame from BranchGLM
objects.
Usage
## S3 method for class 'BranchGLM'
model.frame(formula, ...)
Arguments
formula |
a |
... |
further arguments passed to or from other methods. |
Value
A data.frame used for the fitted model.
Extract Number of Observations from BranchGLM Objects
Description
Extracts number of observations from BranchGLM
objects.
Usage
## S3 method for class 'BranchGLM'
nobs(object, ...)
Arguments
object |
a |
... |
further arguments passed to or from other methods. |
Value
A single number indicating the number of observations used to fit the model.
Plot Method for BranchGLM Objects
Description
Creates a plot to help visualize fitted values from BranchGLM
objects.
Usage
## S3 method for class 'BranchGLM'
plot(x, ...)
Arguments
x |
a |
... |
further arguments passed to plot.default. |
Value
This only produces a plot, nothing is returned.
Examples
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
plot(Fit)
Plot Method for BranchGLMCIs Objects
Description
Creates a plot to visualize confidence intervals from BranchGLMCIs objects.
Usage
## S3 method for class 'BranchGLMCIs'
plot(x, which = "all", mary = 5, ...)
Arguments
x |
a |
which |
which intervals to plot, can use a numeric vector of indices, a character vector of names of desired variables, or "all" to plot all intervals. |
mary |
a numeric value used to determine how large to make margin of y-axis. If variable names are cut-off, consider increasing this from the default value of 5. |
... |
further arguments passed to plotCI. |
Value
This only produces a plot, nothing is returned.
See Also
Examples
Data <- iris
### Fitting linear regression model
mymodel <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
### Getting confidence intervals
CIs <- confint(mymodel, level = 0.95)
CIs
### Plotting CIs
plot(CIs, mary = 7, cex.y = 0.9)
Plot Method for BranchGLMROC Objects
Description
This plots a ROC curve.
Usage
## S3 method for class 'BranchGLMROC'
plot(x, xlab = "1 - Specificity", ylab = "Sensitivity", type = "l", ...)
Arguments
x |
a |
xlab |
label for the x-axis. |
ylab |
label for the y-axis. |
type |
what type of plot to draw, see more details at plot.default. |
... |
further arguments passed to plot.default. |
Value
This only produces a plot, nothing is returned.
Examples
Data <- ToothGrowth
Fit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "logit")
MyROC <- ROC(Fit)
plot(MyROC)
Plot Method for summary.BranchGLMVS and BranchGLMVS Objects
Description
Creates plots to help visualize variable selection results from BranchGLMVS or summary.BranchGLMVS objects.
Usage
## S3 method for class 'BranchGLMVS'
plot(
x,
ptype = "both",
marnames = 7,
addLines = TRUE,
type = "b",
horiz = FALSE,
cex.names = 1,
cex.lab = 1,
cex.axis = 1,
cex.legend = 1,
cols = c("deepskyblue", "indianred", "forestgreen"),
...
)
## S3 method for class 'summary.BranchGLMVS'
plot(
x,
ptype = "both",
marnames = 7,
addLines = TRUE,
type = "b",
horiz = FALSE,
cex.names = 1,
cex.lab = 1,
cex.axis = 1,
cex.legend = 1,
cols = c("deepskyblue", "indianred", "forestgreen"),
...
)
Arguments
x |
a |
ptype |
the type of plot to produce, look at details for more explanation. |
marnames |
a numeric value used to determine how large to make margin of axis with variable names, this is only for the "variables" plot. If variable names are cut-off, consider increasing this from the default value of 7. |
addLines |
a logical value to indicate whether or not to add black lines to separate the models for the "variables" plot. This is typically useful for smaller amounts of models, but can be annoying if there are many models. |
type |
what type of plot to draw for the "metrics" plot, see more details at plot.default. |
horiz |
a logical value to indicate whether models should be displayed horizontally for the "variables" plot. |
cex.names |
how big to make variable names in the "variables" plot. |
cex.lab |
how big to make axis labels. |
cex.axis |
how big to make axis annotation. |
cex.legend |
how big to make legend labels. |
cols |
the colors used to create the "variables" plot. Should be a character vector of length 3, the first color will be used for included variables, the second color will be used for excluded variables, and the third color will be used for kept variables. |
... |
further arguments passed to plot.default for the "metrics" plot and image.default for the "variables" plot. |
Details
The different values for ptype are as follows
"metrics" for a plot that displays the metric values ordered by rank for the branch and bound algorithms or a plot which displays the metric values in the path taken by the stepwise algorithms
"variables" for a plot that displays which variables are in each of the top models for the branch and bound algorithms or a plot which displays the path taken by the stepwise algorithms
"both" for both plots
If there are so many models that the "variables" plot appears to be entirely black, then set addLines to FALSE.
Value
This only produces plots, nothing is returned.
Examples
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
# Doing branch and bound selection
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC", bestmodels = 10,
showprogress = FALSE)
VS
## Getting summary of the process
Summ <- summary(VS)
Summ
## Plotting the BIC of best models
plot(Summ, type = "b", ptype = "metrics")
## Plotting the BIC of the best models
plot(Summ, ptype = "variables")
### Alternative colors
plot(Summ, ptype = "variables",
cols = c("yellowgreen", "purple1", "grey50"))
### Smaller text size for names
plot(Summ, ptype = "variables", cex.names = 0.75)
Plot Confidence Intervals
Description
Creates a plot to display confidence intervals.
Usage
plotCI(
CIs,
points = NULL,
ylab = "",
ylas = 2,
cex.y = 1,
decreasing = FALSE,
...
)
Arguments
CIs |
a numeric matrix of confidence intervals, must have exactly 2 columns. The variable names displayed in the plot are taken from the column names. |
points |
points to be plotted in the middle of the CIs, typically means or medians. The default is to plot the midpoints of the intervals. |
ylab |
a label for the y-axis. |
ylas |
the style of the y-axis label, see more about this at |
cex.y |
font size used for variable names on y-axis. |
decreasing |
a logical value indicating if confidence intervals should be displayed in decreasing or increasing order according to points. Can use NA if no ordering is desired. |
... |
further arguments passed to plot.default. |
Value
This only produces a plot, nothing is returned.
Examples
Data <- iris
### Fitting linear regression model
mymodel <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
### Getting confidence intervals
CIs <- confint.default(mymodel, level = 0.95)
xlim <- c(min(CIs), max(CIs))
### Plotting CIs
par(mar = c(5, 7, 3, 1) + 0.1)
plotCI(CIs, main = "95% Confidence Intervals", xlim = xlim, cex.y = 0.9,
xlab = "Beta Coefficients")
abline(v = 0)
Predict Method for BranchGLM Objects
Description
Obtains predictions from BranchGLM
objects.
Usage
## S3 method for class 'BranchGLM'
predict(
object,
newdata = NULL,
offset = NULL,
type = "response",
na.action = na.pass,
...
)
Arguments
object |
a |
newdata |
a data.frame, if not specified then the data the model was fit on is used. |
offset |
a numeric vector containing the offset variable, this is ignored if newdata is not supplied. |
type |
one of "linpreds" which is on the scale of the linear predictors or "response" which is on the scale of the response variable. If not specified, then "response" is used. |
na.action |
a function which indicates what should happen when the data
contains NAs. The default is |
... |
further arguments passed to or from other methods. |
Value
A numeric vector of predictions.
Examples
Data <- airquality
# Example without offset
Fit <- BranchGLM(Temp ~ ., data = Data, family = "gaussian", link = "identity")
## Using default na.action
predict(Fit)
## Using na.omit
predict(Fit, na.action = na.omit)
## Using new data
predict(Fit, newdata = Data[1:20, ], na.action = na.pass)
# Using offset
FitOffset <- BranchGLM(Temp ~ . - Day, data = Data, family = "gaussian",
link = "identity", offset = Data$Day * -0.1)
## Getting predictions for data used to fit model
### Don't need to supply offset vector
predict(FitOffset)
## Getting predictions for new dataset
### Need to include new offset vector since we are
### getting predictions for new dataset
predict(FitOffset, newdata = Data[1:20, ], offset = Data$Day[1:20] * -0.1)
Predict Method for BranchGLMVS or summary.BranchGLMVS Objects
Description
Obtains predictions from BranchGLMVS or summary.BranchGLMVS objects.
Usage
## S3 method for class 'BranchGLMVS'
predict(object, which = 1, ...)
## S3 method for class 'summary.BranchGLMVS'
predict(object, which = 1, ...)
Arguments
object |
a |
which |
a positive integer to indicate which model to get predictions from, the default is 1 which is used for the best model. For the branch and bound algorithms the number k is used for the kth best model and for the stepwise algorithms the number k is used for the model that is k - 1 steps away from the final model. |
... |
further arguments passed to predict.BranchGLM. |
Value
A numeric vector of predictions.
See Also
Examples
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data,
family = "gamma", link = "log")
# Doing branch and bound selection
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC",
bestmodels = 10, showprogress = FALSE)
## Getting predictions from best model
predict(VS, which = 1)
## Getting linear predictors from 5th best model
predict(VS, which = 5, type = "linpreds")
Print Method for BranchGLM Objects
Description
Print method for BranchGLM
objects.
Usage
## S3 method for class 'BranchGLM'
print(x, coefdigits = 4, digits = 4, ...)
Arguments
x |
a |
coefdigits |
number of digits to display for coefficients table. |
digits |
number of significant digits to display for information after table. |
... |
further arguments passed to or from other methods. |
Value
The supplied BranchGLM
object.
Print Method for BranchGLMCIs Objects
Description
Print method for BranchGLMCIs objects.
Usage
## S3 method for class 'BranchGLMCIs'
print(x, digits = 4, ...)
Arguments
x |
a |
digits |
number of significant digits to display. |
... |
further arguments passed from other methods. |
Value
The supplied BranchGLMCIs
object.
Print Method for BranchGLMROC Objects
Description
Print method for BranchGLMROC objects.
Usage
## S3 method for class 'BranchGLMROC'
print(x, ...)
Arguments
x |
a |
... |
further arguments passed to other methods. |
Value
The supplied BranchGLMROC
object.
Print Method for BranchGLMTable Objects
Description
Print method for BranchGLMTable objects.
Usage
## S3 method for class 'BranchGLMTable'
print(x, digits = 4, ...)
Arguments
x |
a |
digits |
number of digits to display. |
... |
further arguments passed to other methods. |
Value
The supplied BranchGLMTable
object.
Print Method for BranchGLMVI Objects
Description
Print method for BranchGLMVI objects.
Usage
## S3 method for class 'BranchGLMVI'
print(x, digits = 3, ...)
Arguments
x |
a |
digits |
number of significant digits to display. |
... |
further arguments passed to other methods. |
Value
The supplied BranchGLMVI
object.
Print Method for BranchGLMVI.boot Objects
Description
Print method for BranchGLMVI.boot objects.
Usage
## S3 method for class 'BranchGLMVI.boot'
print(x, digits = 3, ...)
Arguments
x |
a |
digits |
number of significant digits to display. |
... |
further arguments passed to other methods. |
Value
The supplied BranchGLMVI.boot
object.
Print Method for BranchGLMVS Objects
Description
Print method for BranchGLMVS objects.
Usage
## S3 method for class 'BranchGLMVS'
print(x, digits = 2, ...)
Arguments
x |
a |
digits |
number of digits to display. |
... |
further arguments passed to other methods. |
Value
The supplied BranchGLMVS
object.
Print Method for summary.BranchGLMVS Objects
Description
Print method for summary.BranchGLMVS objects.
Usage
## S3 method for class 'summary.BranchGLMVS'
print(x, digits = 2, ...)
Arguments
x |
a |
digits |
number of digits to display. |
... |
further arguments passed to other methods. |
Value
The supplied summary.BranchGLMVS
object.
Extract the Pearson Residuals from BranchGLM Objects
Description
Extracts the Pearson residuals from BranchGLM
objects.
Usage
## S3 method for class 'BranchGLM'
residuals(object, ...)
Arguments
object |
a |
... |
further arguments passed to or from other methods. |
Value
A vector of the Pearson residuals from the supplied BranchGLM
object.
Extract Square Root of the Dispersion Parameter Estimates
Description
Extracts the square root of the dispersion parameter estimates from BranchGLM
objects.
Usage
## S3 method for class 'BranchGLM'
sigma(object, ...)
Arguments
object |
a |
... |
further arguments passed to or from other methods. |
Value
A numeric vector of length 2 with first and second elements giving
mle |
the MLE of the dispersion parameter |
pearson |
the Pearson estimator of the dispersion parameter |
Note
The dispersion parameter for binomial and Poisson regression is always fixed to be 1. The MLE of the dispersion parameter is used in the calculation of the log-likelihood while the Pearson estimator of the dispersion parameter is used to calculate standard errors for the coefficient estimates.
Examples
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
sigma(Fit)
Summary Method for BranchGLMVS Objects
Description
Summary method for BranchGLMVS objects.
Usage
## S3 method for class 'BranchGLMVS'
summary(object, ...)
Arguments
object |
a |
... |
further arguments passed to or from other methods. |
Value
An object of class summary.BranchGLMVS
which is a list with the
following components
results |
a data.frame which has the metric values for the best models along with the sets of variables included in each model |
VS |
the supplied |
formulas |
a list containing the formulas of the best models |
metric |
the metric used to perform variable selection |
See Also
plot.summary.BranchGLMVS, coef.summary.BranchGLMVS, predict.summary.BranchGLMVS
Examples
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
# Doing branch and bound selection
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC",
bestmodels = 10, showprogress = FALSE)
VS
## Getting summary of the process
Summ <- summary(VS)
Summ
## Plotting the BIC of the best models
plot(Summ, type = "b")
## Plotting the variables in the best models
plot(Summ, ptype = "variables")
## Getting coefficients
coef(Summ)
Extract covariance matrix from BranchGLM Objects
Description
Extracts covariance matrix of beta coefficients from BranchGLM
objects.
Usage
## S3 method for class 'BranchGLM'
vcov(object, ...)
Arguments
object |
a |
... |
further arguments passed to or from other methods. |
Value
A numeric matrix which is the covariance matrix of the beta coefficients.