Model Metrics
Predibase offers a variety of out-of-the-box metrics to measure and analyze model performance. This page will explain these metrics in more detail for those who may be unfamiliar.
Introduction
Test set metrics provide a way to evaluate and compare models, identify areas for improvement, and determine when a model is ready for deployment.
Metrics are important in machine learning for several reasons:
- Model evaluation: Test set metrics provide a way to evaluate the performance of a machine learning model on new, unseen data. By using metrics like accuracy, precision, recall, F1 score, and others, we can determine how well the model generalizes to new data, and whether it is overfitting or underfitting.
- Model comparison: Test set metrics allow us to compare different models and select the best one for our problem. By comparing metrics like mean squared error, mean absolute error, R2, or others, we can determine which model is the most accurate, reliable, and generalizable.
- Model improvement: Test set metrics help us identify areas where the model needs improvement. If the model is underperforming on certain metrics, we can use this information to refine the model, add or remove features, or adjust the hyperparameters.
- Model deployment: Test set metrics are critical for determining when a model is ready for deployment. If the model's performance is poor on the test set, it is unlikely to perform well on real-world data and should not be deployed.
Different metrics are used to understand the performance of a model based on the type of ML task.
Classification
Introduction
Classification is a process of identifying to which category a new observation belongs to, based on a set of training data. It is a popular task in machine learning, with many applications such as image recognition, spam filtering, and fraud detection. When evaluating classification models, it is essential to use the right metrics that provide insights into the model's performance. In this blog post, we will discuss three of the most common classification metrics in machine learning, namely accuracy, precision, and recall.
Accuracy
Accuracy is the one of the most popular metrics to evaluate classification models. It measures the proportion of correctly classified observations out of the total number of observations. In other words, it answers the question: "What is the percentage of correctly classified instances?" While accuracy is a useful metric, it can be misleading when the classes are imbalanced. For example, if we have 95% of observations in class A and only 5% in class B, a model that always predicts class A will have 95% accuracy, even though it is not useful at identifying class B. Therefore, we need to use additional metrics to evaluate the model's performance.
Precision
Precision measures the proportion of true positive predictions out of all positive predictions made by the model. In other words, it answers the question: "Of all the instances the model predicted as positive, how many were correct?" Precision is an essential metric when we want to minimize false positives. False positives occur when the model predicts an instance as positive when it is actually negative. For example, in a medical diagnosis scenario, we want to minimize the number of false positive predictions to avoid unnecessary treatments.
Recall
Recall measures the proportion of true positive predictions out of all positive instances in the dataset. In other words, it answers the question: "Of all the positive instances in the dataset, how many did the model correctly predict as positive?" Recall is an essential metric when we want to minimize false negatives. False negatives occur when the model predicts an instance as negative when it is actually positive. For example, in a fraud detection scenario, we want to minimize the number of false negative predictions to avoid missing fraudulent transactions.
F1 Score
The F1 score is a metric that combines precision and recall into a single value. It is the harmonic mean of precision and recall and ranges from 0 to 1. A high F1 score indicates that the model has both high precision and high recall. The F1 score is useful when we want to balance both precision and recall, and when the classes are imbalanced.
ROC-AUC
ROC-AUC is a metric commonly used for binary classification problems. ROC stands for Receiver Operating Characteristic, and AUC stands for Area Under the Curve. ROC-AUC measures the trade-off between the true positive rate (TPR) and the false positive rate (FPR) of a model. The TPR is the proportion of true positive predictions out of all positive instances in the dataset, while the FPR is the proportion of false positive predictions out of all negative instances in the dataset. The ROC curve is a plot of TPR against FPR for different threshold values. The AUC represents the area under the ROC curve, which ranges from 0.5 (random guessing) to 1 (a perfect classifier). A higher AUC indicates a better performance of the model. ROC-AUC is a useful metric when the classes are imbalanced, and when we want to evaluate the model's performance across different threshold values.
Conclusion
Classification metrics are essential to evaluate the performance of classification models. Accuracy, precision, and recall are three common metrics used in machine learning. While accuracy is a useful metric, it can be misleading when the classes are imbalanced. Precision and recall are essential metrics to minimize false positives and false negatives, respectively. When evaluating a classification model, it is important to consider all three metrics to obtain a comprehensive understanding of the model's performance.
Regression
Regression is a task in machine learning that involves predicting a continuous numerical value for a given input. It is used in various applications, such as stock price prediction, sales prediction, and weather prediction.
Let’s consider a simple regression problem where we want to predict the prices of houses in a certain area. Here are 10 observations of actual house prices vs the predicted prices of houses made by our model:
True House Prices | Predicted House Prices |
---|---|
200,000 | 180,000 |
300,000 | 320,000 |
250,000 | 260,000 |
350,000 | 375,000 |
400,000 | 390,000 |
450,000 | 460,000 |
250,000 | 260,000 |
150,000 | 145,000 |
100,000 | 110,000 |
200,000 | 220,000 |
We will use these values to calculate and understand regression metrics in Predibase.
Here are a few general guidelines to help determine whether a regression metric value is good or bad using Mean Squared Error (MSE) as an example:
- Baseline comparison: A good starting point is to compare the MSE value to a baseline metric, such as the average of the target variable or a simple benchmark model. For example, if the average of the target variable is 200,000 and the MSE value is 225,000, then the model is underperforming.
- Problem type: The interpretation of MSE also depends on the type of regression problem. For example, in a highly predictable problem, an MSE of 10,000 might be considered good, while in a highly unpredictable problem, an MSE of 50,000 might be considered good.
- Problem domain: The interpretation of MSE also depends on the domain of the problem. For example, in a financial forecasting problem, an MSE of 10,000 might be considered high, while in a weather forecasting problem, an MSE of 50,000 might be considered low.
- Data distribution of target: The interpretation of MSE also depends on the distribution of the target variable. For example, in a problem where the target variable has a high variance, an MSE of 50,000 might be considered good, while in a problem where the target variable has a low variance, an MSE of 50,000 might be considered high.
Loss
Explanation
Loss is a measure of the difference between the predicted values and the actual values in a regression problem. The lower the loss, the better the performance of the model. The goal of a regression model is to minimize the loss in order to achieve the most accurate predictions.
The expected value range of loss is typically between 0 and ∞, with 0 being a perfect prediction and higher values indicating worse predictions.
Example
To calculate the loss, we first find the difference between the predicted and actual values for each observation (predict house price - true house price):
True House Prices | Predicted House Prices | Differences |
---|---|---|
200,000 | 180,000 | - 20,000 |
300,000 | 320,000 | 20,000 |
250,000 | 260,000 | 10,000 |
350,000 | 375,000 | 25,000 |
400,000 | 390,000 | - 10,000 |
450,000 | 460,000 | 10,000 |
250,000 | 260,000 | 10,000 |
150,000 | 145,000 | - 5,000 |
100,000 | 110,000 | 10,000 |
200,000 | 220,000 | 20,000 |
There are several ways to calculate the loss, but one of the most commonly used methods is mean squared error (MSE), which is the average of the squared differences. In Predibase, the loss is equivalent to the MSE, which is explained below.
Mean Squared Error (MSE)
Explanation
MSE is the average of the squared differences between the predicted values and the actual values. It is expressed as the sum of the squared differences divided by the total number of observations.
The expected value range of MSE is between 0 and ∞, with 0 being a perfect prediction and higher values indicating worse predictions.
Example
To calculate the MSE, we first find the difference between the predicted and actual values for each observation (predict house price - true house price). Then, we square these differences.
True House Prices | Predicted House Prices | Differences | Squared Differences |
---|---|---|---|
200,000 | 180,000 | - 20,000 | 400,000 |
300,000 | 320,000 | 20,000 | 400,000 |
250,000 | 260,000 | 10,000 | 100,000 |
350,000 | 375,000 | 25,000 | 625,000 |
400,000 | 390,000 | - 10,000 | 100,000 |
450,000 | 460,000 | 10,000 | 100,000 |
250,000 | 260,000 | 10,000 | 100,000 |
150,000 | 145,000 | - 5,000 | 25,000 |
100,000 | 110,000 | 10,000 | 100,000 |
200,000 | 220,000 | 20,000 | 400,000 |
Finally, we take the average of the squared differences:
MSE = (400,000 + 400,000 + 100,000 + 625,000 + 100,000 + 100,000 + 100,000 + 25,000 + 100,000 + 400,000) / 10 = 235,000
So, in this example, the MSE is 235,000, which indicates the average squared difference between the predicted and actual values. A lower MSE value represents a better-fit model, while a higher MSE value represents a poorer fit.
Mean Absolute Error (MAE)
Explanation
MAE is the average of the absolute differences between the predicted values and the actual values. It is expressed as the sum of the absolute differences divided by the total number of observations.
The expected value range of MAE is between 0 and ∞, with 0 being a perfect prediction and higher values indicating worse predictions.
Example
To calculate the MAE, we first find the absolute difference between the predicted and actual values for each observation using the housing prices table above:
True House Prices | Predicted House Prices | Absolute Differences |
---|---|---|
200,000 | 180,000 | 20,000 |
300,000 | 320,000 | 20,000 |
250,000 | 260,000 | 10,000 |
350,000 | 375,000 | 25,000 |
400,000 | 390,000 | 10,000 |
450,000 | 460,000 | 10,000 |
250,000 | 260,000 | 10,000 |
150,000 | 145,000 | 5,000 |
100,000 | 110,000 | 10,000 |
200,000 | 220,000 | 20,000 |
Next, we sum the absolute differences and divide by the total number of observations:
MAE = (20,000 + 20,000 + 10,000 + 25,000 + 10,000 + 10,000 + 10,000 + 5,000 + 10,000 + 20,000) / 10 = 14,000
So, in this example, the MAE is 14,000, which means that the average absolute difference between the predicted and actual values is 14,000. This provides a measure of the accuracy of the model, with a lower MAE indicating a better fit between the predicted and actual values.
R2
Explanation
R2 is a measure of the goodness of fit of the regression model. It ranges between 0 and 1, with 1 being a perfect fit and 0 indicating no fit. A high R2 value means that the regression model is explaining a large amount of the variance in the data, while a low R2 value means that the model is not explaining much of the variance in the data.
On occasion, R2 values can be negative as well and fall between -1 and 0. This happens when the regression model is worse than a simple model that always predicts the mean. This can occur when the model is overfitting the data and making predictions that are far from the actual values. The model is likely not a good fit for the data and may need to be refined or a different model may need to be tried
Example
- The model's predictions are used to calculate the difference (also called residuals) between the predicted value and the true value, which represent the difference between the actual and predicted values.
- These residuals are then squared (just like when we compute MSE).
True House Prices | Predicted House Prices | Residuals | Squared Residuals |
---|---|---|---|
200,000 | 180,000 | - 20,000 | 400,000 |
300,000 | 320,000 | 20,000 | 400,000 |
250,000 | 260,000 | 10,000 | 100,000 |
350,000 | 375,000 | 25,000 | 625,000 |
400,000 | 390,000 | - 10,000 | 100,000 |
450,000 | 460,000 | 10,000 | 100,000 |
250,000 | 260,000 | 10,000 | 100,000 |
150,000 | 145,000 | - 5,000 | 25,000 |
100,000 | 110,000 | 10,000 | 100,000 |
200,000 | 220,000 | 20,000 | 400,000 |
Sum of the squared residuals = (400,000 + 400,000 + 100,000 + 625,000 + 100,000 + 100,000 + 100,000 + 25,000 + 100,000 + 400,000) = 2,350,000
Next, we calculate the mean of the true house prices.
Mean True House Price =
(200,000 + 300,000 + 250,000 + 350,000 + 400,000 + 450,000 + 250,000 + 150,000 + 100,000 + 200,000) / 10 = 265,000
Calculate the differences between the actual values and the mean of the actual values
Square the differences of the values produced in step 5
True House Prices | Actual Value - Mean | Square of Actual Value - Mean |
---|---|---|
200,000 | - 65,000 | 4,225,000,000 |
300,000 | 35,000 | 1,225,000,000 |
250,000 | - 15,000 | 225,000,000 |
350,000 | 85,000 | 7,225,000,000 |
400,000 | 135,000 | 18,225,000,000 |
450,000 | 185,000 | 34,225,000,000 |
250,000 | - 15,000 | 225,000,000 |
150,000 | - 115,000 | 13,225,000,000 |
100,000 | - 165,000 | 27,225,000,000 |
200,000 | - 65,000 | 4,225,000,000 |
Sum of squared differences:
4,225,000,000 + 1,225,000,000 + 225,000,000 + 7,225,000,000 + 18,225,000,000 + 34,225,000,000 + 225,000,000 + 13,225,000,000 + 27,225,000,000 + 4,225,000,000 = 110,250,000,000
Divide the sum of the squared residuals by the sum of the squared differences and subtract it from 1 to calculate R2:
Sum of Squared Residuals:
2,350,000
Sum of Squared Differences:
110,250,000,000
R2 = 1 - (2,350,000/ 110,250,000,000) = 0.99
The R2 score is 0.99, which is close to 1. This indicates that the model is explaining almost all of the variability in the target variable (house prices).
Root Mean Squared Error (RMSE)
Explanation
RMSE is the square root of the mean squared error, and it provides a measure of the average magnitude of the differences between the predicted values and the actual values.
The expected value range of RMSE is between 0 and ∞, with 0 being a perfect prediction and higher values indicating worse predictions.
Example
We calculated MSE for this example to be 195,000.
Since RMSE is just the square root of MSE, we can compute it as follows:
RMSE = √(235000) = 484.77
This indicates that there is an average deviation of $484.77 of the predicted house prices from the actual house prices.
Root Mean Squared Percentage Error (RMSPE)
Explanation
RMSPE is the root mean squared percentage error, which is a measure of the accuracy of the model in terms of the percentage error between the predicted values and the actual values.
The expected value range of RMSPE is between 0 and 100%, with 0% being a perfect prediction and higher values indicating worse predictions.
Example
The formula for RMSPE is:
RMSPE = √(1/n) * ∑(|actual_value - predicted_value|/actual_value)^2
where n is the number of data points and actual_value and predicted_value are the actual and predicted house prices, respectively.
- Calculate the absolute percentage residuals, i.e,
|actual_value - predicted_value|/actual_value
. This captures the percentage difference between what was predicted and what the actual value was. - Calculate the squared percentage residuals:
True House Prices | Predicted House Prices | Absolute Percentage Residuals | Squared Absolute Percentage Residuals |
---|---|---|---|
200,000 | 180,000 | 10% | 100% |
300,000 | 320,000 | ~ 6.67% | 44.44% |
250,000 | 260,000 | 4% | 16% |
350,000 | 375,000 | 7.14% | 50.97% |
400,000 | 390,000 | 2.5% | 6.25% |
450,000 | 460,000 | 2.22% | 4.93% |
250,000 | 260,000 | 4% | 16% |
150,000 | 145,000 | 3.33% | 11.09% |
100,000 | 110,000 | 10% | 100% |
200,000 | 220,000 | 10% | 100% |
Next, we sum the squared residuals and divide the result by the number of data points to get the mean squared absolute percentage error:
MSPE =
(100% + 44.44% + 16% + 50.97% + 6.25% + 4.93% + 16% + 11.09% + 100% + 100%) / 10 = 1.45%
Finally, we take the square root of the MSPE to get the RMSPE:
RMSPE =
√(1.45) = 1.20%
The RMSPE is 1.20%, which is the average deviation of the predicted house prices from the actual house prices in percentage terms. A lower RMSPE value indicates that the model is making more accurate predictions.