December 24, 2020

How to improve generalization performance?

Following methods can be used to reduce.

Train with more data : Easier said than done, but this method improves model generalization significantly. So if you could get arrange for more data economically viable manner, for ex purchasing data, you should go for it. You can also leverage similar dataset from public repositories. Research has showed that beyond a certain point adding more data to conventional machine learning models wont help with generalization.
Feature Selection : Remove irrelevant variables from the training data. They don’t help in explaining the variance of dependent variable and introduce unnecessary information.
Early Stopping : Limit the number of iteration. Model generalization and number of iterations have a parabolic relationship i.e. positively correlated to a certain point and beyond that are negatively correlated.
Regularization : Regularization caps the coefficient of independent variables and thus checks the tendency of overfitting.
Ensembling : This should be applied last to extract maximum benefit out of it. It can help with improving generalization when all other methods fail. Chose the ensembling technique as per the problem at hand. Bagging helps in case of complex models, while boosting helps with simpler models.

Quick Summary:

There methods to improve generalization performance vary from equation based algorithms to neural networks based algorithms. In this post we limit ourselves to equation based algorithms i.e. conventional machine learning. Following methods can be used to reduce. Train with more data : Easier said than done, but this method improves model generalization significantly. So if […]