After we performed the three experiments we obtained the best ANN architecture and the set of final weights (the solution) that corresponds to this architecture. In the next two sections we will present two training mechanisms used to refine this solution.
Once we determine the ANN architecture (with the corresponding set of weights), the next step is to train the network. The first training mechanism is an RT-based ANN (Nastac and Costea, 2004), briefly described next:
• Start with a network with an initial set of weights from the previous step (determining ANN architecture) as the reference network.
• Perform L runs to improve the ANN classification accuracy. After each experiment we save the best set of weights (the solution) in terms of classification accuracy. Each experiment consists of:
Reduction of the weights of the current best network with successive values of scaling factor γ (γ = 0.1, 0.2, . . . , 0.9).
Retrain the ANN with the new weights and obtain nine accuracy rates.
Choose the best network from the above nine in terms of classification accuracy.
Compare the accuracy rate of the current network with that obtained in the previous step and save the best one for the next run as the current best network.
Depending on the splitting of the training set TR in the effective training set TRe and validation set VAL, we have three types of RT mechanism: one (RT1) where TRe and VAL are common for all of the L runs; another (RT2) where TRe and VAL are different for each run, but the same for all nine reduction weights’ trainings (second step of the experiment); and RT3, where TRe and VAL are distinct for each training. We have four types of accuracy rate: training accuracy rate ACRTRe, validation accuracy rate ACRVAL, total training (effective training plus validation) accuracy rate ACRTR and test accuracy rate ACRTS. Correspondingly, we calculate four mean-square errors: MSETRe, MSEVAL, MSETR and MSETS. In total, five runs (L = 5) were conducted, resulting in 5 × 9 = 45 new trainings for each type of RT mechanism. Each RT mechanism needs approximately 30 min to complete. Consequently, we need 0.5 × 3 = 1.5 h per experiment to run all three RT mechanisms.
The second ANN training mechanism used to refine the solution is based on the principle of natural evolution. A population of solutions is provided, and by initialization, selection and reproduction mechanisms, potentially good solutions are reached.
Unlike the traditional gradient-descent training mechanisms, GA-based ANN training starts with a population of solutions. A solution is the set of ANN weights after training represented as a vector. All solutions (chromosomes) compete with each other to enter the new population. They are evaluated based on the objective function.
Initialization and Fitness Evaluation
The population size is a parameter of our models. It was set to PS = 20 for three reasons: (1) Dorsey and Mayer (1995) suggest that this value is good enough for any grade of problem complexity; (2) the population size increases by adding new chromosomes with both crossover, i.e. PS′ > PS, and mutation operators, i.e. PS″ > PS′ > PS (after the new population is evaluated we resize the population to the initial size by keeping the best PS chromosomes in terms of ACTTR and discarding the others); (3) because, as the population size increases, the running time of our GA-based algorithms becomes unfeasibly high. Even with a small initial population of 20 chromosomes, one running of the GA-based refining mechanism (1000 generations) takes up to 2 h. If we multiply this with 600 we get a total of 1200 h for training all GA-based ANNs.3 For details, see Section 6.
Уважаемый посетитель!
Чтобы распечатать файл, скачайте его (в формате Word).
Ссылка на скачивание - внизу страницы.