Today we are here to enlighten you on the topic of Overfitting and Underfitting in machine learning. We have received a lot of requests on this topic, so we are presenting this article. Before we start, you should know what the meanings of these two are. Overfitting means the learning model depends so much on training data.
In contrast, Underfitting indicates that the machine learning model has a poor relationship with the training data. Both the conditions are not good for the outcome of machine learning. Here we will discuss in detail the problems caused by these conditions. Also, we will talk about the method to face them. So let’s start.
What is Overfitting?
Let’s understand the overfitting in detail. Overfitting is a modeling error that occurs when a function fit too closely into a limited set of data points. Overfitting happens due to the overtraining of the model. We have researched this topic and got two main reasons for overfitting. Also, we would like to share these with you.
- Overfitting could be happed due to the noise in the data which gets prioritized while training.
- The use of too little data compared to the required level can cause overfitting.
Now let’s make the topic clear with an example. Let’s say we are using machine learning to train a model to analyze a resume in an interview. There are 10,000 resumes, and we train the machine with those resumes and their outcomes. Then we try the model out on the original dataset, and the prediction of outcome is 99% accurate.
But the problem occurs when we run the model on a new dataset of resume. The new model is unseen to the model, and it provides only 50% of accuracy. This is known as overfitting. The model is unable to generalize the well from our training data to unseen data.
Why is Overfitting Bad?
According to the experts, overfitting limits the capacity of generalization of the machine. So, it is clear that overfitting stops the model from improving after a certain complexity.
How to Detect Overfitting?
Hope we have made the topic clear to you. Now we would like to explain the process of detection of overfitting. To detect this, you should split the initial dataset into two parts: training and test dataset. Train the model with the training dataset and then test it with the testing dataset.
If your model shows the same level of accuracy, then it is free from overfitting. If the model does much better on the training set than the on the test set, then the model is likely overfitting.
For example, if your model shows 95% accuracy on the training set but shows only 55% accuracy on the test set, then you must have to take some action against overfitting.
Prevention of Overfitting
We would like to share some simple yet useful solutions for overfitting. Let’s start it.
- Use a simpler model. You can do this by either reducing parameters or by using a simpler model. As an example, you can use a linear model instead of a complex higher degree polynomial model.
- You can train the model with more training data.
- Reduce noise in the data by removing outliers or dealing with missing values.
- You should do cross-validation as it is a powerful preventive measure against overfitting. You have to use your initial training data to generate multiple mini-train–test splits. Then you will use those splits to tune your model. This process is known as cross-validation.
- Regularization of your model is another good solution to beat overfitting. It refers to a total category of techniques for artificial forcing of your model to look simpler.
What is Underfitting?
It is the opposite term of overfitting. A machine learning model is said to have Underfitting when a machine learning model is so simple that it is unable to capture the underlying trend of data with accuracy. When Underfitting presents in a machine model, it means that the model or algorithm does not fit the data well enough.
Underfitting usually happens when you use fewer data to build a model. It may also happen when you try to build a linear model with non-linear data. In these cases, the machine learning models have too easy and flexible rules.
Thus, the model will tend to make a lot of wrong predictions. In a nutshell, Underfitting denotes a high bias and low variance situation. Hope we have made this topic clear to you. Now let’s have quick look into how under fitting effects the machine learning.
Why is Underfitting Bad?
Underfitting destroys the accuracy of the machine learning model as well as the overfitting. It differs only in the process. In Underfitting, the machine learning model is very biased, and it fails to give a generalized outcome. It happens due to the training with an inappropriate amount of data.
How to Detect Underfitting?
We are going to share the simplest way to determine the Underfitting. You have to split the initial dataset into two parts: training and test dataset. Then you will train your model with the training dataset.
After training, you have to test the model with the testing dataset. If you can see your model is not performing well in testing data, but it is giving very good performance in training data, then you can say that your model is suffering from Underfitting.
How to Reduce Underfitting?
Here we will share some effective methods to lower down Underfitting. These are
- Increase your model complexity.
- Increase the number of features. You should perform feature engineering.
- Remove noise from the data.
- You can increase the number of epochs or the duration of training to get better results.
Both overfitting and Underfitting destroys the machine accuracy, only in different ways. In this article, we have mentioned all the possible information you should have about this topic. You must have to keep the complexity of the training data to avoid these situations.