Hockey Fans |
ARIMA ModelsThe erratic movements in the time series plot seen in the preliminary analysis section suggest modelling the data using ARIMA models. Also, with the absence of any trend or seasonality in the time series plot, an ARIMA model again seems like a logical choice. To model the data using the ad hoc ARIMA method, a stationary mean is necessary by definition. The auto-correlation function plot (ACF) above shows large positive auto-correlations that dominate the plot. This suggests a non-stationary mean. Achieving a Stationary Mean ModelSeveral differencing techniques were examined in order to obtain a non-stationary mean that included: › 1st order (yt - yt-1)› 2nd order (yt - 2 yt-1 + yt-2) To try and achieve a constant variance in the above plot posed a challenge. Many transformations of the original variable (goals per game) were attempted in order to eliminate the irregular variance before differencing. Such transformations that were employed included taking the log of goals per game, the square root of goals per game and raising goals per game to the power of 0.25. After examination of these time series graphs, it was concluded that a first order differencing without any transformations best represented a stationary mean model. Determining an appropriate ARIMA modelTo determine an appropriate ARIMA model it necessary to examine the ACF and PACF plots of the adjusted model (i.e. yt - yt-1 ). Based on the ACF and PACF plots, it is not immediately clear what model is most appropriate for this data. The possibilities include an ARIMA model with a differencing of 1 and a moving average of 4 (MA(4)), or an ARIMA model with differencing of 1 and an autoregressive component of level 4 (AR(4)). Each of these techniques require a cut-off of the correlation (i.e. spikes below the confidence lines) at lag 4 on either the PACF or ACF plots and exponential decay on the other, as observed in the above plots.
The two models were fitted to the data and criteria measuring goodness of fit were examined (see Appendix D-2). The p-value tests the hypothesis that the variable is zero, i.e is not included in the model. In the AR(4) and MA(4) model the probability that the variables are zero are 0.8% and 2.6% respectively. The sum of squared errors (SSE) calculates the squared error terms between the fitted model and the actual data. Lastly, the Akaike information criteria (AIC) and Scwarz-Bayesian criteria (SBC) both measure goodness of fit and account for model complexity. The AR(4) model seems to be the best ARIMA model based on this criteria. The output in Appendix D-2 suggests that the constant should not be included in the model. This is based on a p-value of .5288 or 52.88%. Hence, the final model, based on the analysis, is yt = y t-1 + 4 (yt-4 - yt-5) + et . The model validation procedure, located in Appendix D-3, shows that this model is acceptable. The scatterplot of residuals versus the predicted values shows no evidence of non-constant variance. The error ACD and PACF plots show that there is no autocorrelative pattern. Finally, the Q-Q plot shows that the residuals resemble a normal distribution. Using the estimated parameters we have,yt = y t-1 + .38982( yt-4 - yt-5) + et
Based on this model, the sequence plot below was created, showing upper and lower confidence levels, as well as predictions to the year 2005. Interpreting the ARIMA (4,1,0) ModelWhile the ARIMA (4,1,0) model has the best theoretical fit (the lowest standard error, AIC, and SBC values), it is fairly difficult to interpret logically. There does not seem to be any reasonable explanation for the correlation between the difference in average goals per game in a season, four time periods apart. A possible explanation for this occurrence could simply be random noise. Although number of goals per season is one of the longest series' of data in the NHL, it is still not large enough to reasonably eliminate the possibility of overfitting the random errors. Since the 4th lag in the ACF plot was only slightly above the confidence limits, we can reasonably suggest that this was due only to random noise. If this were true, it would likely hinder our ability to make future predictions. Therefore, we suggest fitting an ARIMA (0,1,0) model, and the results are below.
Thus, the ARIMA (0,1,0) model makes more intuitive sense and has only a slightly worse theoretical fit than the ARIMA (4,1,0)
model. So we will use the ARIMA (0,1,0) model to forecast average goals per game in a season, in the future. This model is: yt = yt-1 + et.
|
|