Mean absolute percentage error
Template:Short description The mean absolute percentage error (MAPE), also known as mean absolute percentage deviation (MAPD), is a measure of prediction accuracy of a forecasting method in statistics. It usually expresses the accuracy as a ratio defined by the formula:
where Template:Math is the actual value and Template:Math is the forecast value. Their difference is divided by the actual value Template:Math. The absolute value of this ratio is summed for every forecasted point in time and divided by the number of fitted points Template:Math.
MAPE in regression problems
Mean absolute percentage error is commonly used as a loss function for regression problems and in model evaluation, because of its very intuitive interpretation in terms of relative error.
Definition
Consider a standard regression setting in which the data are fully described by a random pair with values in , and Template:Mvar i.i.d. copies of . Regression models aim at finding a good model for the pair, that is a measurable function Template:Mvar from to such that is close to Template:Mvar.
In the classical regression setting, the closeness of to Template:Mvar is measured via the Template:Math risk, also called the mean squared error (MSE). In the MAPE regression context,[1] the closeness of to Template:Mvar is measured via the MAPE, and the aim of MAPE regressions is to find a model such that:
where is the class of models considered (e.g. linear models).
In practice
In practice can be estimated by the empirical risk minimization strategy, leading to
From a practical point of view, the use of the MAPE as a quality function for regression model is equivalent to doing weighted mean absolute error (MAE) regression, also known as quantile regression. This property is trivial since
As a consequence, the use of the MAPE is very easy in practice, for example using existing libraries for quantile regression allowing weights.
Consistency
The use of the MAPE as a loss function for regression analysis is feasible both on a practical point of view and on a theoretical one, since the existence of an optimal model and the consistency of the empirical risk minimization can be proved.[1]
WMAPE
WMAPE (sometimes spelled wMAPE) stands for weighted mean absolute percentage error.[2] It is a measure used to evaluate the performance of regression or forecasting models. It is a variant of MAPE in which the mean absolute percent errors is treated as a weighted arithmetic mean. Most commonly the absolute percent errors are weighted by the actuals (e.g. in case of sales forecasting, errors are weighted by sales volume).[3] Effectively, this overcomes the 'infinite error' issue.[4] Its formula is:[4]
Where is the weight, is a vector of the actual data and is the forecast or prediction. However, this effectively simplifies to a much simpler formula:
Confusingly, sometimes when people refer to wMAPE they are talking about a different model in which the numerator and denominator of the wMAPE formula above are weighted again by another set of custom weights . Perhaps it would be more accurate to call this the double weighted MAPE (wwMAPE). Its formula is:
Issues
Although the concept of MAPE sounds very simple and convincing, it has major drawbacks in practical application,[5] and there are many studies on shortcomings and misleading results from MAPE.[6][7]
- It cannot be used if there are zero or close-to-zero values (which sometimes happens, for example in demand data) because there would be a division by zero or values of MAPE tending to infinity.[8]
- For forecasts which are too low the percentage error cannot exceed 100%, but for forecasts which are too high there is no upper limit to the percentage error.
- MAPE puts a heavier penalty on negative errors, than on positive errors.[9] As a consequence, when MAPE is used to compare the accuracy of prediction methods it is biased in that it will systematically select a method whose forecasts are too low. This little-known but serious issue can be overcome by using an accuracy measure based on the logarithm of the accuracy ratio (the ratio of the predicted to actual value), given by . This approach leads to superior statistical properties and also leads to predictions which can be interpreted in terms of the geometric mean.[5]
- People often think the MAPE will be optimized at the median. But for example, a log normal has a median of where as its MAPE is optimized at .
To overcome these issues with MAPE, there are some other measures proposed in literature:
- Mean Absolute Scaled Error (MASE)
- Symmetric Mean Absolute Percentage Error (sMAPE)
- Mean Directional Accuracy (MDA)
- Mean Arctangent Absolute Percentage Error (MAAPE): MAAPE can be considered a slope as an angle, while MAPE is a slope as a ratio.[7]
See also
- Least absolute deviations
- Mean absolute error
- Mean percentage error
- Symmetric mean absolute percentage error
Template:Machine learning evaluation metrics
External links
- Mean Absolute Percentage Error for Regression Models
- Mean Absolute Percentage Error (MAPE)
- Errors on percentage errors - variants of MAPE
- Mean Arctangent Absolute Percentage Error (MAAPE)
References
- ↑ 1.0 1.1 de Myttenaere, B Golden, B Le Grand, F Rossi (2015). "Mean absolute percentage error for regression models", Neurocomputing 2016 Template:ArXiv
- ↑ Template:Cite web
- ↑ Template:Cite web
- ↑ 4.0 4.1 Template:Cite web
- ↑ 5.0 5.1 Tofallis (2015). "A Better Measure of Relative Prediction Accuracy for Model Selection and Model Estimation", Journal of the Operational Research Society, 66(8):1352-1362. archived preprint
- ↑ Hyndman, Rob J., and Anne B. Koehler (2006). "Another look at measures of forecast accuracy." International Journal of Forecasting, 22(4):679-688 doi:10.1016/j.ijforecast.2006.03.001.
- ↑ 7.0 7.1 Kim, Sungil and Heeyoung Kim (2016). "A new metric of absolute percentage error for intermittent demand forecasts." International Journal of Forecasting, 32(3):669-679 doi:10.1016/j.ijforecast.2015.12.003.
- ↑ Template:Cite journal
- ↑ Makridakis, Spyros (1993) "Accuracy measures: theoretical and practical concerns." International Journal of Forecasting, 9(4):527-529 doi:10.1016/0169-2070(93)90079-3