Journal of Forecasting. Whittemore School of Business and Economics, The University of New Hampshire, USA, страница 7

Stage IIÐtraining

NN2 GRP1

NN3 GRP2

NN4 GRP3

Number of data sets

150

150

150

Training tolerance

0.5

0.5

0.5

Number of good classi®cations achieved

110

109

111

Percentage of good classi®cations achieved

74%

73%

74%

Learning rate used

0.9, 0.016

0.9, 0.1

0.9, 0.1

Number of hidden neurons

11

11

11

Table VI. Testing results for stage II networks (N

N2, NN3, and NN4)

Stage IIÐtesting

NN2

NN3

NN4

Size of data set

20, 20, 20

20, 20, 20

20, 20, 20

Number of good classi®cations achieved

9, 11, 9

7, 12, 9

8, 10, 12

Percentage of good classi®cations achieved

45, 55, 45%

35, 60, 45%

40, 50, 60%

Testing tolerance

0.5

0.5

0.5

representing the correct selection group gets a value of at least 0.5, at the same time the other two output neurons (representing the remaining two groups) get values less than 0.5. The violation of either or both of these conditions will result in a bad classi®cation. We believe this criterion provides a reasonable level of accuracy in distinguishing a good classi®cation from a bad one.

Table VI presents the testing results for the stage II networks. The average testing accuracy (at 0.5 test tolerance) achieved are 48% for NN2, 47% for NN3, and 50% for NN4, thus giving a grand average test accuracy of 48%.

While the results shown in Table VI re¯ect the accuracy of the stage II networks, additional statistical tests are necessary to evaluate the performance of the neural network approach in the model selection process. To conduct a statistical test to measure the accuracy of the neural networks, a method similar to the one adopted by Hill, O'Connor and Remus (1996) is used in this research. Using MAPE as the basis for forecast accuracy, four randomly selected groups of data sets, each of size 20, are used to evaluate the performance of the stage I and stage II networks. For each data set in each group the best forecasting method (out of the nine methods) is identi®ed using MAPE as the basis. The results of stage I and stage II networks are obtained for each of the data sets. Paired t-tests are then conducted on the MAPE values for the best methods and the network selected methods for the four data sets (Iman and Conover, 1983).

In order to verify whether paired t-test is appropriate for comparing the MAPE values, a test for normality is conducted on the di€erences in the MAPE values for each time series in the four datasets (Shapiro, 1990). Shapiro±Wilk W tests conducted suggest that three of the four data sets are normal (test statistics are: 0.895, 0.937, and 0.930) and therefore verify that the use of paired ttests is indeed appropriate for testing equality of MAPE values for the best methods and the network selected methods for three of the four data sets.

Table VII gives the results of the paired t-tests conducted to test equality of MAPE for the best forecasting methods and the neural network selected methods for the three groups of data. The results indicate that for two of the three test data sets, there is no signi®cant di€erence (at the 0.001 level) in the mean values of MAPE for the best methods and the network selected methods.

Table VII. Means (and standard deviations) of MAPE for the best methods and the neural network selected methods

Data set

Size (n)

Mean of MAPEs for best methods

mean of MAPEs for

NNÐselected methods

Results of paired t-test

1

20

13.69 (15.36)

14.67 (15.80)

Signi®cant at 0.05

2

20

15.48 (18.18)

16.91 (18.59)

No signi®cance at 0.001

3

20

10.75 (11.96)

11.69 (13.37)

No signi®cance at 0.001