What is Underfitting in Machine Learning?
Underfitting describes a model which does not capture the underlying relationship in the dataset on which it’s trained. An example of underfitting would be a linear regression model which is trained on a dataset that exhibits a polynomial relationship between the input and output variables. Such a model will never be able to adequately capture this relationship (assuming polynomial data features are not used), so the model will underfit and will neither perform well on the training set nor generalize well to unseen data. Underfitting is more common than you may think, especially in certain business contexts where labeled training data may be sparse.
Why is my Model Underfitting?
Complex models such as neural networks may underfit to data if they are not trained for long enough or are trained with poorly chosen hyperparameters. Certain models may also underfit if they are not provided with a sufficient number of training samples. In this case, the underfitting may occur because there is too much uncertainty in the training data, leading the model to be unable to discern an underlying relationship between inputs and outputs. However, by far the most common reason that models underfit is because they exhibit too much bias. For example, linear regression biases the model to learn linear relationships in data, so linear regression models will underfit to non-linear datasets. Similarly, under-parameterized models, i.e. those with few parameters, are likely to bias the model towards simplistic relationships which do not sufficiently model the complexity present in real-world datasets.
What Happens if Your Model Experiences Underfitting?
Specifying what will happen if you push an underfit model to production is simple. Your model will not perform well. It will produce incorrect predictions that disappoint customers or lead to unwise business decisions predicated on inaccurate information. Therefore, addressing underfitting in your models is absolutely crucial from a business perspective. From a technical standpoint, an underfit model will exhibit high bias and low variance. In layman’s terms, it will generate reliably inaccurate predictions, and while reliability is desirable, inaccuracy is certainly not. On the other hand, when addressing underfitting it’s important to not go too far in the other direction and cause your model to overfit. This leads us to the discussion of a concept called the bias-variance tradeoff.
Overfitting vs. Underfitting: The Bias-Variance Tradeoff
Overfitting is the counterpoint to underfitting – the two concepts are diametrically opposed. Tuning a model away from underfitting pushes it closer towards overfitting and vice versa. This concept is neatly encapsulated in a principle referred to as the bias-variance tradeoff. In the early days of machine learning, theorists showed that the error in a model can be broken down into a sum of two components: bias and variance. Bias describes the amount of underfitting in the model, the extent to which the model is inherently incapable of modeling a given dataset. Variance, on the other hand, describes the component of error due to overfitting. Overfit models are too tightly tied to a single dataset. While they may perform remarkably well on the training set, perhaps even achieving 100% predictive accuracy, they will not generalize to unseen data. Often this happens due to too much flexibility in the model. An over-parameterized neural network may exactly fit the noise in a training set, but doing so will cause it to fail spectacularly on unseen data drawn from the same distribution. Thus, devising a model which performs ideally is very much a balancing act of trading off between the bias and variance components of error so that neither becomes overwhelming.
How to Address Underfit Models
There are a few ways to address underfit models. The first step is usually to take a closer look at your training data and the modeling assumptions that you are making. Is your model sufficiently complex enough to capture the underlying relationships in the data? A simple way to test this is to add more parameters to your model or add more complex features such as polynomial combinations of existing features and then retrain the model. Does the accuracy increase? If so, your model is likely limited in its predictive power and you can proceed by either adding more features/parameters or choosing a different model entirely.
If the high bias is not due to modeling assumptions, then it may be due to insufficient training data. There are a number of approaches you can take to get around this. First, you could simply gather more training data, perhaps by scraping the web, paying crowdsourced workers to label data you already have, or pay third-party data providers to license their proprietary datasets. Alternatively, you can utilize modeling techniques such as transfer learning to take pre-trained models that have already been trained on large datasets and fine-tune them to your own, unique data. In doing so, you’ll be able to use the prior assumptions baked into the pre-trained model while enabling it to specialize to your own data.
Finally, if neither of these approaches help, be sure that you are training your model properly. You might need to let it train for longer or do a hyperparameter sweep to discover better hyperparameter configurations. Model training can often be as much an art as a science, and it can take some experimentation to figure out how to best fit the model to your data. If your model is Bayesian, you can also try working with a different prior as the prior controls your predefined assumptions about the data. Ultimately, addressing underfitting takes time and patience but is quite within reach for any capable data scientist.