January 10, 2021

What is the difference between Bagging and Random forest? Why do we use Random forest more commonly than Bagging?

Random Forest works similar to bagging except for the fact that not all features(independent variables) are selected in a subset. Secondly Random Forest works only with Decision Trees, Whereat in bagging any algorithm can be used. In bagging the subsets differ from original data only in terms of number of rows but in Random forest the subsets differ from the original data both in terms of number of rows as well as number of columns. Thus making it an even more optimized approach of using Decision Trees for than bagging.

Since the all the subsets in bagging share the same columns hence, the models created from them are also correlated. Let me elaborate, say you have a dataset with p independent variable and out of them one independent variable, say P₅, is quite strong. Then this variable, in bagging approach, will be present in all the subsets, hence dominate the learning process in of all the models. Thus leading to creation correlated models. Random forest checks this tendency by randomly selecting a subset of columns, therefore P₅ is present in only few subsets and during aggregation the polarity caused by it is averaged out. Thus facilitating better learning and more accurate predictions.

Therefore Random Forest is preferred over bagging when it comes to using ensemble technique on Decision Trees.

by : Monis Khan

Quick Summary: