Many observing the takeover by machine learning in the world of data science are starting to caution against the misuse and overuse of deep learning algorithms. From concerns around the “black box” nature of deep learning making models hard to interpret or explain, to the large amounts of data required for deep learning algorithms to be effective, to skeptical suggestions that deep learning approaches may have hit their limits, the original hype over deep learning that fuelled the recent mass interest in AI is giving way to a trough of disillusionment over the practice.

At Insite.ai, we are convinced that only by utilising deep learning approaches in forecasting will we see a paradigm shift in the accuracy of demand forecasting. This is the first article of two which explains our position. In this post I will provide a summary of deep learning and provide two examples of how deep learning was recently used to beat traditional forecasting methods. In the second part I provide a list of reasons why we believe that deep learning is poised to dominate in forecasting applications henceforth.

In neural networks (computerised networks comprised of nodes intended to mimic the human brain’s neural structure), data is transformed from an input to an output, via a series of transformations through layers. When there are many layers between the input and output data, the neural network is said to be deep. When machine learning mechanisms are applied to a deep neural network, so that the output of a model is influenced by (i.e. learns from) the quality of previous outputs and optimised, deep learning occurs.

Forecasting is an area that is particularly ripe for improving using deep learning models. Historically (right up to the last 18 months), major forecasting competitions have thrown up the somewhat surprising finding that complicated models aren’t necessarily more accurate than simple ones, particularly when forecasting a general set of time series. Until last year, in the M-competitions, the world’s most renowned forecasting competition (M for Spyros Makridakis, the organiser of the competition and a godfather of the forecasting industry generally), this was the established logic, with a basic model in the most recent previous competition, the M3 competition in 2000, comprised of a combination of three common exponential smoothing methods (Comb) outperforming all more complicated entrants (with a lone exception, which only just beat Comb).

Time and again, econometricians and forecasting practitioners had to head home with their tails between their legs in the knowledge that their complicated forecasting methods are inferior to basic ones.

This all changed in 2018 when the fourth M-competition was run, aptly named the M4 competition. In it, seventeen models outperformed the Comb benchmark, with most making use of machine learning in some way. The top two models beat Comb by 6.6% and 9.4% respectively, a large margin. Both included deep learning to do so.

The second best performing model was developed by Rob Hyndman of Monash University, an expert in forecasting (Hyndman edited the International Journal of Forecasting between 2005 and 2018) along with Pablo Montero-Manso and George Athanasopoulos. Hyndman’s model effectively took nine reasonably common time series methods (all available through the R Forecasting package), and combined them to generate a forecast. A machine learning technique known as gradient boosting (a type of decision tree) was used to calculate an optimised weighting for each of the nine models for each time series. In using a gradient boosting method, Hyndman was able to leverage peripheral features of the data that aren’t utilised by traditional time series methods. He was also able to predict which time series model worked well for each time series, and use this knowledge to optimise the weightings each model received.

- Split dataset into a training dataset and a testing dataset.
- Apply each of nine time series models to the training period and generate forecasts for each of the 9 models over each of the 100,000 time series in the test period.
- Calculate a set of features of the time series in the training period.
- Calculate forecast losses (the error term, in this case an average of the mean absolute scaled error and the symmetric mean absolute percentage error) of each forecast produced for the test period based on actual losses in each time series model’s forecast during the training period.
- Create a set of features of each time series (for example length of time series, strength of trend, strength of seasonality on time series, lumpiness of time series, to other features relating to autocorrelation and partial autocorrelation functions of the time series).
- Train the gradient boosting model to optimise weights of each time series to minimise the loss function (using forecasted losses and features of the time series).
- Make new predictions for prediction period by combining time series model forecasts for prediction period and weightings on each model generated by trained gradient boosting model using features of time series at prediction date.

The winner of the competition, Slawek Smyl of Uber Labs, focused even further on deep neural networks in his solution, by developing a hybrid model that combined an exponential smoothing model (in this case the Holt-Winters method with multiplicative seasonality) with a recurrent neural network in a hybrid model. Per Smyl:

This allowed for cross-learning across time series in extracting time series features (particularly seasonality). Ultimately, this method far outperformed the benchmark. In part two I provide eight reasons why we believe that deep learning based forecasting methods will become the de facto standard for serious forecasting problems.

Schedule a Demo