Popular Posts

Monday, July 6, 2015

Extensive evaluation of different classifiers


TLDR: Random Forests is best thing common in both references.

From abstract of first reference "Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?"

"We  evaluate 179  classifiers arising from 17  families (discriminant  analysis,  Bayesian, neural networks, support vector machines, decision trees, rule-based classifiers, boosting, bagging, stacking, random forests and other ensembles, generalized linear models, nearest-neighbors, partial least squares and principal component regression, logistic and multinomial regression, multiple adaptive regression splines and other methods). We use 121 data sets, which represent the whole UCI database (excluding the large-scale problems) and other own real problems, in order to achieve significant  conclusions  about  the  classifier  behavior,  not  dependent  on  the  data  set  collection. The classifiers most likely to be the bests are the random forest(RF) versions."

From second reference:
"With excellent performance on all eight metrics, calibrated boosted trees were the best learning algorithm overall. Random forests are close second, followed by uncalibrated bagged trees, calibrated SVMs, and uncalibrated neural nets. The models that performed poorest were naive bayes, logistic regression, decision trees, and boosted stumps. Although some methods clearly perform better or worse than other methods on average, there is significant variability across the problems and metrics. Even the best models sometimes perform poorly, and models with poor average performance occasionally perform exceptionally well."

The two references perform an extensive evaluation of different classifiers across datasets and across performance metrics. 

Do we Need Hundreds of Classifiers to Solve Real World Classifi cation Problems?
An Empirical Comparison of Supervised Learning Algorithms