How to measure generalization performance?

There are three methods that are used, at different stages of model life cycle, to measure the generalization performance of the model.

  1. Cross Validation: This is usually performed during data preprocessing and EDA phase.
  2. Test-Train split : Although this is one of the first steps in model creation, we use it to assess generalization performance during algorithm selection phase.
  3. Accuracy Monitoring: This is done after model deployment. It is used to check if the new data/real-time data is similar to the data on which we created the model. In other words, with this step we try to measure if the model generalizes to new data in a similar manner as it generalized to test data.

What do we mean by generalization performance?

In machine learning ‘generalization performance’ is the measure of accuracy of predictions made by model on the test data i.e. unseen data. Other way to describe is it that, it is the measure of goodness of fit of the curve described by the machine learning model on test/new data.

When you feed data to a machine learning model, it learns the underlying patterns that describe the relationship among the data points of the given dataset. Some of these are general patterns, while other are inherent to the data points of the training dataset. The capability of the model to distinguish the general patterns in the training dataset from noise is termed as generalization performance.

One of the main reason behind doing the the test-train split is to measure the generalization performance of the model. If the accuracy metric, say r-squared statistics, is nearly same for both training and test data then the model has high generalization performance.

When is the machine learning model said to suffer from generalization failure?

In machine learning ‘Generalization Failure’ is synonymous with model overfitting. When you feed data to a machine learning model, it learns the underlying patterns that describe the relationship among the data points of the given dataset. Some of these are general patterns, while other are inherent to the data points of the training dataset. When the model distinguish the general patterns in the training dataset from noise, then it is said to be suffering from generalization failure.

One of the main reason behind doing the the test-train split is to identify if the model is suffering from generalization failure. When a model performs exceedingly well of training data in comparison to test data, it is a clear sign that the model has failed to generalize. In other words generalization failure occurs when the model ails from high variance.