January 8, 2021
Cross validation essentially means checking the prediction accuracy of your model on unseen data i.e. the data on which model wasn’t trained on. Cross validation is performed to ensure that the model doesn’t overfits. If the prediction accuracy of your model on training data and unseen data have similar value, then your model is good to go. if the prediction accuracy on unseen data is significantly less than that on training data, then your model is overfitting.
Cross validation is of three types:
- Hold Out: Here you split you original data into training and test(hold out) sets. The model is trained on training set and then overfitting is checked on test set. The disadvantage of this method is that in case of smaller datasets if the randomly selected test set is biased then it will permeate the bias to the model as well. Thus the overfitting estimates provided by this method have huge variance depending upon the manner in which data was split.
- K Fold: This is performed in addition to ‘Hold Out’ method. Here the training set is divided into k subsets and the model is trained on (k-1) subsets and tested on the remaining subset. This process is repeated k time, i.e. on all the subsets, and the final value is the average of the k iterations. This is essentially repeating the ‘Hold Out’ method k times on the training set. The disadvantage of this method is that the algorithm is to run k times and thus is computationally intensive.
- Leave One Out: This is taking K Fold method to its extreme where k is equal to number of observations. The overfitting estimate provided by this method is good but it requires huge computation power.
by : Monis Khan
Cross validation essentially means checking the prediction accuracy of your model on unseen data i.e. the data on which model wasn’t trained on. Cross validation is performed to ensure that the model doesn’t overfits. If the prediction accuracy of your model on training data and unseen data have similar value, then your model is good […]