in which formula describes the model to be fitted; data is a data frame containing the variables in the model; mtry
is the number of variables randomly sampled; ntree is the number of decision trees; na.action specifies the action to be taken if NAs are found.
The fitness function of a given mtry
value and OOB error is constructed, the AFSA is used to find the optimal mtry
value, and the QC model is constructed with the optimal mtry
To avoid the correlation among the different trees, RF increases the diversity of the trees by making them grow from different bootstrap samples created by a procedure called bagging (bagging = mtry
= number of predictors) .
A grid search with 10-fold cross-validation is used to determine the best ntree and mtry. The optimal (ntree, mtry) pair is (1050,10).
Classifier Parameters Step size Search Optimal in search range value KNN K 1 1:20 7 SVM C 1 1:500 25 g 0.000001 10-6:1 0.000012 Random ntree 50 50:2000 1000 forest mtry 1 1:91 91 CForest ntree 50 50:2000 1050 mtry 1 1:91 10 XGBoost eta 0.1 0.1:1 0.5 maxjdepth 1 1:10 4 Table 5: Classification performance of different classifiers.
(1) No predictor selection and no tuning (R default values for mtry and sampsize).
Internal effects refer to the bootstrapping and predictor selection procedure (mtry) implemented within Random Forest; external effects refer to the sample attribution to cross-validation groups.
Soto y Raul Llanes Toro (TEC), Omar Ortiz (MTRY), Daniel Deeke (MOR), Javier Lozano (PUM), Jaime Ruiz (CA).
* GUADALAJARA.- Compro a "Gonzo" Gonzalez (PUM), presto a Jair Garcia (MTRY), Israel Lopez (PUM), Gustavo Napoles (ATLAN), Antonio Torres Serrin (SAN).
Random forest contains several tuning parameters, some of which control internal random processes: number of randomly selected predictors used to fit each tree ("mtry"), minimum node size ("nodesize"), size of the bootstrap sample ("sampsize"), and number of trees fitted ("ntree").
(4) No predictor selection, but tuning of mtry. mtry is suggested as a potentially sensitive parameter by Breiman and Cutler  and thus is used for regular tuning [38, 55, 56].
Trees are split to many nodes using random subsets of variables (mtry
), and the default mtry
value is the square root of the total number of variables.