First, a word of caution. The examples shown in this section are meant to simply show what the functions do and not what the best model is. For a specific use case, please perform the necessary model checks, post-hoc analyses, and/or choose predictor variables and model types as appropriate based on domain knowledge.
With this in mind, let us look at how we can perform modeling tasks
using manymodelr.
multi_model_1This is one of the core functions of the package.
multi_model_1 aims to allow model fitting, prediction, and
reporting with a single function. The multi part of the
function’s name reflects the fact that we can fit several model types
with one function. An example follows next.
For purposes of this report, we create a simple dataset to use.
library(manymodelr)
set.seed(520)
# Create a simple dataset with a binary target
# Here normal is a fictional target where we assume that it meets 
# some criterion means 
data("yields", package = "manymodelr")set.seed(520)
train_set<-caret::createDataPartition(yields$normal,p=0.6,list=FALSE)
valid_set<-yields[-train_set,]
train_set<-yields[train_set,]
ctrl<-caret::trainControl(method="cv",number=5)
m<-multi_model_1(train_set,"normal",".",c("knn","rpart"), 
                 "Accuracy",ctrl,new_data =valid_set)
The above returns a list containing metrics, predictions, and a model summary. These can be extracted as shown below.
m$metric
#> # A tibble: 1 × 2
#>   knn_accuracy rpart_accuracy
#>          <dbl>          <dbl>
#> 1        0.872           0.68
head(m$predictions)
#> # A tibble: 6 × 2
#>   knn   rpart
#>   <chr> <chr>
#> 1 Yes   Yes  
#> 2 No    Yes  
#> 3 No    No   
#> 4 No    Yes  
#> 5 No    No   
#> 6 Yes   YesThis is similar to multi_model_1 with one difference: it
does not use metrics such as RMSE, accuracy and the like. This function
is useful if one would like to fit and predict “simpler models” like
generalized linear models or linear models. Let’s take a look:
# fit a linear model and get predictions
lin_model <- multi_model_2(mtcars[1:16,],mtcars[17:32,],"mpg","wt","lm")
lin_model[c("predicted", "mpg")]
#>                     predicted  mpg
#> Mazda RX4            10.17314 21.0
#> Mazda RX4 Wag        24.32264 21.0
#> Datsun 710           26.95458 22.8
#> Hornet 4 Drive       25.96479 21.4
#> Hornet Sportabout    23.13039 18.7
#> Valiant              18.38390 18.1
#> Duster 360           18.76632 14.3
#> Merc 240D            16.94420 24.4
#> Merc 230             16.92171 22.8
#> Merc 280             25.51488 19.2
#> Merc 280C            24.59258 17.8
#> Merc 450SE           27.41348 16.4
#> Merc 450SL           19.95856 17.3
#> Merc 450SLC          21.75818 15.2
#> Cadillac Fleetwood   18.15895 10.4
#> Lincoln Continental  21.71319 10.4From the above, we see that wt alone may not be a great
predictor for mpg. We can fit a multi-linear model with
other predictors. Let’s say disp and drat are
important too, then we add those to the model.
multi_lin <- multi_model_2(mtcars[1:16, ], mtcars[17:32,],"mpg", "wt + disp + drat","lm")
multi_lin[,c("predicted", "mpg")]
#>                     predicted  mpg
#> Mazda RX4            10.43041 21.0
#> Mazda RX4 Wag        24.39765 21.0
#> Datsun 710           25.56629 22.8
#> Hornet 4 Drive       25.38957 21.4
#> Hornet Sportabout    23.15234 18.7
#> Valiant              17.36908 18.1
#> Duster 360           17.67102 14.3
#> Merc 240D            15.59802 24.4
#> Merc 230             14.96161 22.8
#> Merc 280             25.05592 19.2
#> Merc 280C            23.66222 17.8
#> Merc 450SE           25.95326 16.4
#> Merc 450SL           17.05637 17.3
#> Merc 450SLC          21.97756 15.2
#> Cadillac Fleetwood   17.22593 10.4
#> Lincoln Continental  22.17872 10.4fit_modelThis function allows us to fit any kind of model without necessarily returning predictions.
lm_model <- fit_model(mtcars,"mpg","wt","lm")
lm_model
#> 
#> Call:
#> lm(formula = mpg ~ wt, data = use_df)
#> 
#> Coefficients:
#> (Intercept)           wt  
#>      37.285       -5.344fit_modelsThis is similar to fit_model with the ability to fit
many models with many predictors at once. A simple linear model for
instance:
models<-fit_models(df=yields,yname=c("height", "weight"),xname="yield",
                   modeltype="glm") 
One can then use these models as one may wish. To add residuals from these models for example:
res_residuals <- lapply(models[[1]], add_model_residuals,yields)
res_predictions <- lapply(models[[1]], add_model_predictions, yields, yields)
# Get height predictions for the model height ~ yield 
head(res_predictions[[1]])
#>   normal    height     weight    yield predicted
#> 1    Yes 0.2849090 0.13442312 520.2837 0.5028866
#> 2     No 0.2427826 0.37484971 504.4754 0.4943626
#> 3    Yes 0.2579432 0.47134828 515.6463 0.5003860
#> 4     No 0.5175604 0.50143592 522.2247 0.5039331
#> 5    Yes 0.4026023 0.47171755 502.6406 0.4933732
#> 6     No 0.9789886 0.04191937 509.4663 0.4970537If one would like to drop non-numeric columns from the analysis, one
can set drop_non_numeric to TRUE as follows.
The same can be done for fit_model above:
m_models<-fit_models(df=yields,yname=c("height","weight"),
           xname=".",modeltype=c("lm","glm"), drop_non_numeric = TRUE)
m_models[[1]]
#> [[1]]
#> 
#> Call:
#> lm(formula = height ~ ., data = use_df)
#> 
#> Coefficients:
#> (Intercept)       weight        yield  
#>   0.2176942   -0.2185572    0.0006712  
#> 
#> 
#> [[2]]
#> 
#> Call:
#> lm(formula = weight ~ ., data = use_df)
#> 
#> Coefficients:
#> (Intercept)       height        yield  
#>   0.0112753   -0.1463926    0.0006827One can generate a very simple model report using
report_model as follows:
report_model(m_models[[2]][[1]])
#>              Type      Estimate      P_Value Exp_Estimate  Effect
#> 1 Estimated Score  0.2176942039 5.487736e-01    1.2432068    1.24
#> 2          weight -0.2185572088 1.252007e-08    0.8036775 -19.63%
#> 3           yield  0.0006711689 3.369693e-01    1.0006714  +0.07%To extract information about a given model, we can use
extract_model_info as follows.
extract_model_info(lm_model, "r2")
#> [1] 0.7528328To extract the adjusted R squared:
extract_model_info(lm_model, "adj_r2")
#> [1] 0.7445939For the p value:
extract_model_info(lm_model, "p_value")
#>  (Intercept)           wt 
#> 8.241799e-19 1.293959e-10To extract multiple attributes:
extract_model_info(lm_model,c("p_value","response","call","predictors"))
#> $p_value
#>  (Intercept)           wt 
#> 8.241799e-19 1.293959e-10 
#> 
#> $response
#> [1] "mpg"
#> 
#> $call
#> lm(formula = mpg ~ wt, data = use_df)
#> 
#> $predictors
#> [1] "wt"This is not restricted to linear models but will work for most model
types. See help(extract_model_info) to see currently
supported model types.