Intelligent systems in accounting, finance and managementassessing predictive performance of ann-based classifiers, страница 11

The final step in constructing the new population is to reduce it in size to 20 chromosomes. We select from PS″ the best 20 chromosomes in terms of ACRTR satisfying the condition that one chromosome can have no more than max_lim duplicates. We use the mutation operator to generate more chromosomes in the case that the number of best chromosomes that satisfy the above condition is less than 20.

As a summary, excluding the crossover, the parameters of our GA models are as follows: number of generations Ngen, population size PS, number of elite chromosomes Nelite, maximum number of splitting points max_split in the case of multipoint crossover, probability of crossover Pc, probability of mutation Pm, and maximum number of duplicates for the chromosomes max_lim. There were around 1000 generations (Ngen = 1000), which took approximately 2 h to complete for each GAbased refining mechanism. As we had different RT mechanisms, we had different GAs (four), but, this time, based on the type of crossover operator used. Consequently, we need 2 × 4 = 8 h per experiment to run all four GAs.

4.  RESEARCH QUESTIONS AND DERIVED HYPOTHESES

The main advantages of neural approaches for classification over the traditional approaches are that ANNs are free of any distributional assumptions, are universal approximators, there are no problems with intercorrelated data, and they provide a mapping function from the input to the outputs without any a priori knowledge about the function form (function approximation capability). The most popular ANN learning technique in the literature is BP, which is ‘an approximate steepest descendent algorithm’ (Hagan et al., 1996) for feedforward neural networks. BP has several limitations, the most important being its scalability: as the size of the training problem increases, the training time increases non-linearly (Pendharkar and Rodger, 2004). When the basic BP is applied to a practical problem, the training may take a relatively long time (Hagan et al., 1996). Among other limitations are the difficulty of the training data itself, handling the outliers, and the reduced power of generalization due to a large solution space. The cause for the last limitation could be the fact that the BP algorithm is likely to get stuck quickly in a local optimum, which means that the algorithm depends strongly on the initial starting values. As we described in Section 3.2, many techniques have been proposed to decrease the learning time of BP and to ignore shallow local minima. SCG was used for ANN training throughout this study.

The difference between BP/BP-variants and GA-based ANN training techniques is that BP starts from one solution and tries to improve it based on some error minimization technique, whereas GAs start with a population of solutions and through some initialization, reproduction and recombination methods try to reach a solution. GAs are known as hill climbing techniques, a capability that arises from the convex combination (arithmetic crossover operator) of two parents on the opposite sides of a hill. Moreover, the possible risk of reaching a local optimum is avoided by the GA since it creates new solutions by altering some elements of the existing ones (mutation operator), hence widening the search space.

We test two training mechanisms: one based on a traditional gradient-descent technique improved by a RT procedure and the other on GAs. Moreover, we analyse the influence of the crossover operator on the predictive performance of GAs.

A crucial step in ANN training is the preprocessing of the input data. Preprocessing can be performed in two ways. One way is to apply the preprocessing technique for each individual input variable, obtaining the same dimensionality of the input dataset. The other is to apply a transformation on the whole input dataset at once, possibly obtaining a different dataset dimensionality. This second way of preprocessing is applied when the dimension of the input vector is large, there are intercorrelations between variables and we want to reduce the dimensionality of the data and uncorrelate the input. The first way of preprocessing deals with two comparability issues regarding the input variables. First, each variable has to have the same importance in the training process. For that, we could scale all variables so that they always fall within a specified range. Second, the dispersion of the variables should be the same for all variables, so that the impact of a variable’s dispersion on ANN training is the same for all variables. In our study we use three preprocessing approaches: no preprocessing, which does not take into consideration any of the comparability concerns; division with the maximum absolute values, which handles the first comparability issue; and normalization, which addresses both comparability issues. In this study we test whether the choice of the preprocessing approach for individual variables has any impact on the predictive performance of the ANN.