Help for package ClassificationEnsembles

Title:

Automatically Builds 20 Classification Models

Version:

0.5.0

Description:

Automatically builds 20 classification models from data. The package returns 26 plots, 5 tables and a summary report. The package automatically builds 12 individual classification models, including error (RMSE) and predictions. That data is used to create an ensemble, which is then modeled using 8 methods. The process is repeated as many times as the user requests. The mean of the results are presented in a summary table. The package returns the confusion matrices for all 20 models, tables of the correlation of the numeric data, the results of the variance inflation process, the head of the ensemble and the head of the data frame.

License:

MIT + file LICENSE

Depends:

C50, car, caret, corrplot, doParallel, dplyr, e1071, ggplot2, gt, ipred, MachineShop, magrittr, parallel, pls, purrr, R (≥ 2.10), randomForest, ranger, reactable, reactablefmtr, scales, tidyr, tree

Encoding:

UTF-8

RoxygenNote:

7.3.2

LazyData:

true

Suggests:

knitr, rmarkdown

VignetteBuilder:

knitr

URL:

https://github.com/InfiniteCuriosity/ClassificationEnsembles

BugReports:

https://github.com/InfiniteCuriosity/ClassificationEnsembles/issues

NeedsCompilation:

Packaged:

2025-03-30 22:25:56 UTC; russellconte

Author:

Russ Conte [aut, cre, cph]

Maintainer:

Russ Conte <russconte@mac.com>

Repository:

CRAN

Date/Publication:

2025-04-01 16:10:05 UTC

Carseats data

Description

This is the Carseats data as shown in the ISLR package.

Usage

Carseats

Format

Carseats A simulated data set with 400 observations and 11 rows

Sales: Unit sales (in thousands) at each location
CompPrice: Price charged by competitor at each location
Income: Community income level (in thousands of dollars)
Advertising: Local advertising budget for company at each location (in thousands of dollars)
Population: Population size in region (in thousands)
Price: Price company charges for car seats at each site
ShelveLoc: A factor with levels Bad, Good and Medium indicating the quality of the shelving location for the car seats at each site
Age: Average age of the local population
Urban: A factor with levels No and Yes to indicate whether the store is in an urban or rural location
US: A factor with levels No and Yes to indicate whether the store is in the US or not

Source

ISLR data set, https://www.rdocumentation.org/packages/ISLR/versions/1.4/topics/Carseats

classification—function to perform classification analysis and return results to the user.

Description

classification—function to perform classification analysis and return results to the user.

Usage

Classification(
  data,
  colnum,
  numresamples,
  predict_on_new_data = c("Y", "N"),
  remove_VIF_above,
  scale_all_numeric_predictors_in_data,
  how_to_handle_strings = c(0("No strings"), 1("Strings as factors")),
  save_all_trained_models = c("Y", "N"),
  save_all_plots,
  use_parallel = c("Y", "N"),
  train_amount,
  test_amount,
  validation_amount
)

Arguments

data

a data set that includes classification data. For example, the Carseats data in the ISLR package

colnum

the number of the column. For example, in the Carseats data this is column 7, ShelveLoc with three values, Good, Medium and Bad

numresamples

the number of times to resample the analysis

predict_on_new_data

Gives the user the opportunity to use the trained models to predict on new and untrained data

remove_VIF_above

Removes columns with Variance Inflaction Factors above the level chosen by the user

scale_all_numeric_predictors_in_data

Scales all numeric predictors in the original data

how_to_handle_strings

Converts strings to factor levels

save_all_trained_models

Gives the user the option to save all trained models in the Environment

save_all_plots

Saves all plots in the user's chosen format

use_parallel

"Y" or "N" for parallel processing

train_amount

set the amount for the training data

test_amount

set the amount for the testing data

validation_amount

Set the amount for the validation data

Value

a full analysis, including data visualizations, statistical summaries, and a full report on the results of 35 models on the data

Maternal Health Risk

Description

Data has been collected from different hospitals, community clinics, maternal health cares from the rural areas of Bangladesh through the IoT based risk monitoring system.

Usage

Maternal_Health_Risk

Format

Maternal_Health_Risk Age, Systolic Blood Pressure as SystolicBP, Diastolic BP as DiastolicBP, Blood Sugar as BS, Body Temperature as BodyTemp, HeartRate and RiskLevel. All these are the responsible and significant risk factors for maternal mortality, that is one of the main concern of SDG of UN.

Age: Any ages in years when a women during pregnant.
SystolicBP: Upper value of Blood Pressure in mmHg, another significant attribute during pregnancy.
DiastolicBP: Lower value of Blood Pressure in mmHg, another significant attribute during pregnancy.
BS: Blood glucose levels is in terms of a molar concentration
BodyTemp: Body temperature in Farenheit
HeartRate: A normal resting heart rate
RiskLevel: Predicted Risk Intensity Level during pregnancy considering the previous attribute.

Dry Beans small

Description

This is a stratified version of the full dry beans data set. This is about 7 percent of the full data set

Usage

dry_beans_small

Format

dry_beans_small A reduced version with 813 rows and 17 columns of the full data set available on UCI: https://archive.ics.uci.edu/dataset/602/dry+bean+dataset

Area: The area of a bean zone and the number of pixels within its boundaries
Perimeter: Bean circumference is defined as the length of its border
MajorAxisLength: The distance between the ends of the longest line that can be drawn from a bean
MinorAxisLength: The longest line that can be drawn from the bean while standing perpendicular to the main axis
AspectRatio: Defines the relationship between MajorAxisLength and MinorAxisLength
Eccentricity: Eccentricity of the ellipse having the same moments as the region
ConvexArea: Number of pixels in the smallest convex polygon that can contain the area of a bean seed
EquivDiameter: Equivalent diameter: The diameter of a circle having the same area as a bean seed area
Extent: The ratio of the pixels in the bounding box to the bean area
Solidity: Also known as convexity. The ratio of the pixels in the convex shell to those found in beans.
Roundness: Calculated with the following formula: (4piA)/(P^2)
Compactness: Measures the roundness of an object
ShapeFactor1: Continuous value
ShapeFactor2: Continuous value
ShapeFactor3: Continuous value
ShapeFactor4: Continuous value
Class: (Seker, Barbunya, Bombay, Cali, Dermosan, Horoz and Sira)

@source https://archive.ics.uci.edu/dataset/602/dry+bean+dataset