January 8, 2021

How bias and variance varies for each CV method?

Cross validation is of three types:

Hold Out: Here you split you original data into training and test(hold out) sets. The model is trained on training set and then overfitting is checked on test set. The disadvantage of this method is that in case of smaller datasets if the randomly selected test set is biased then it will permeate the bias to the model as well. Thus the overfitting estimates provided by this method have huge variance depending upon the manner in which data was split. The bias is relatively low in this case.
Leave One Out: This is taking K Fold method to its extreme where k is equal to number of observations. The overfitting estimate provided by this method is good but it requires huge computation power. The disadvantage of this method is that it has high bias as all the models are trained on almost similar dataset. For the same reason the variance associated with this method is low
K Fold: This is performed in addition to ‘Hold Out’ method. Here the training set is divided into k subsets and the model is trained on (k-1) subsets and tested on the remaining subset. This process is repeated k time, i.e. on all the subsets, and the final value is the average of the k iterations. This is essentially repeating the ‘Hold Out’ method k times on the training set. Since there is overlap in training sets, this method has a moderate bias. Since the validation sets differ significantly from one other the variance is lower than LOO method.

Quick Summary:

Cross validation is of three types: Hold Out: Here you split you original data into training and test(hold out) sets. The model is trained on training set and then overfitting is checked on test set. The disadvantage of this method is that in case of smaller datasets if the randomly selected test set is biased […]