Intelligent systems in accounting, finance and managementassessing predictive performance of ann-based classifiers, страница 15

In order to test the impact of data distribution and preprocessing method on the predictive performances of the classifiers we generated in addition to the real dataset (REAL) four new datasets with different distributions: uniform (UNIF), normal (NORM), logistic (LOG) and Laplace (LAP). We chose these four distributions as they roughly correspond to the four kurtosis values: −1, 0, 1 and 3 respectively. We estimated the distributions’ parameters using the means and variances of the telecom dataset ratios. Regarding standardization, three approaches were undertaken: one was to keep the data unstandardized (‘no preprocessing’ (PR1)); the second was to normalize data to zero mean and unit standard deviation (‘normalization’ (PR2)); the third was to divide the data by the maximum of absolute values (‘maximum of absolute values’ (PR3)). We used these three preprocessing approaches to cope gradually with the comparability issues of input variables raised in Section 4. We obtained 15 datasets in total, one for each distribution–preprocessing method combination: (REAL, PR1), (REAL, PR2), (REAL, PR3), (UNIF, PR1), . . . , (LAP, PR3).

6.  EXPERIMENTS

For each one of the 15 datasets obtained we applied the following methodological steps:

1.  For the RT-based ANN we repeated the procedure (described in Section 3.3) 30 times, obtainingfour vectors (30 elements in size) of different accuracy rates for each RT mechanism type (RT1, RT2, RT3): RT_VEC_ACRTRe, a vector of effective training accuracy rates; RT_VEC_ACRVAL, a vector of validation accuracy rates; RT_VEC_ACRTR, a vector of total training (effective training plus validation) accuracy rates; and RT_VEC_ACRTS, a vector of test accuracy rates. Correspondingly, we obtained four vectors with the mean-square errors RT_VEC_MSETRe, RT_VEC_MSEVAL, RT_VEC_MSETR, and RT_VEC_MSETS. The total time needed for RTbased training was approximately 675 h = 1.5 (h/experiment) × 30 (experiments) × 15 (input datasets).

2.  For the GA-based ANN we applied the procedure (described in Section 3.4) 10 times for eachtype of crossover (one-point, GAO; multipoint, GAM; arithmetic, GAA; uniform, GAU). The other GA parameters used were as follows: Ngen = 1000, PS = 20, Nelite = 3, max_split = 5, Pc = 0.8, Pm = 0.01 and max_lim = 1. We obtained two vectors (10 elements in size) for each type of crossover operator: a vector of training accuracy rates (GA_VEC_ACRTR) and a vector of test accuracy rates (GA_VEC_ACRTS) and, correspondingly, two vectors with mean-square errors, i.e. GAT_VEC_MSETR and GA_VEC_MSETS. The total time needed for GA-based training was approximately 1200 h = 8 (h/experiment) × 10 (experiments) × 15 (input datasets).

3.  We used statistical tests to compare the vectors of the two training mechanisms in order tovalidate our hypotheses.

The following experiments differ from two perspectives: the hypothesis that they try to validate and/ or the type of statistical test used (non-parametric versus parametric).

6.1.  Experiment 1

In the first experiment we try to validate the first hypothesis using non-parametric tests (Siegel and Castellan, 1988). We used the real dataset (the original telecom data) without preprocessing the data (first preprocessing approach). After we separated the data into training (90%) and test (10%) sets, we generated the ANN architecture. Then, in order to refine our solution, we applied the two training mechanisms (RT-based ANN and GA-based ANN). We applied the methodological steps described above and we compared statistically the results’ vectors of both training mechanisms in order to validate our first hypothesis (Tables IV and V). We used Mann–Whitney–Wilcoxon and Kolmogorov–Smirnov non-parametric tests to avoid the assumptions of the parametric tests.