December 29, 2020
Multi class logistic regression constitutes those problem statements where the target variable has more than two categories. Logistic regression algorithm is designed for binary classification problems, thus we need to do some data engineering for applying the algorithm on the multiclass problem i.e. to get a workaround. Following are the two methods that can be employed:
- One vs Rest approach: Here, if there are n classes then we create n duplicates of the original dataset. Each duplicate has two classes, one from original dataset and one created by clubbing rest of the classes together. Let me illustrate my point with an example, let’s say we had three classes – mango, apple & banana – in the original dataset. Then each of the duplicate datasets would have two classes- mango & other fruits, apple & other fruits, banana & other fruits. We’ll apply logistic regression algorithm to each of the datasets and collate the probability score for each fruit. The fruit which gets the highest probability score for the given row is selected i.e. if score of mango, apple & banana are [0.3, 0.4, 0.3] respectively then apple would be chosen.
- One vs One approach: Here if there are n classes then create n(n-1)/2 number of datasets by splitting the original dataset. Each dataset contains two classes from the original dataset. Applying our previous example of fruits to this approach, we’ll get the 3 datasets with the following classes – apple & banana, banana & mango, mango & apple. We’ll apply logistic regression on each of the datasets and the class that receives maximum umber of votes would be selected. Using our example let’s say that for the three datasets following were the scores [apple 0.6, banana 0.4], [banana 0.7, mango 0.3] and [mango 0.7, apple 0.3], then mango would be chosen.
by : Monis Khan
Multi class logistic regression constitutes those problem statements where the target variable has more than two categories. Logistic regression algorithm is designed for binary classification problems, thus we need to do some data engineering for applying the algorithm on the multiclass problem i.e. to get a workaround. Following are the two methods that can be […]