Intelligent systems in accounting, finance and managementassessing predictive performance of ann-based classifiers, страница 2

However, after we examined the paper by Alander (1995), which contains 1760 references (from 1987 until 2003) on combining GAs and ANNs, we found that there is no report in the literature that analyses the influence of data distribution, preprocessing method, training mechanism and their combinations on the classification performance of ANNs. For example, it is not known (1) what is the preprocessing method that is most suitable for a certain distribution when training ANNs, (2) what is the most suitable refining mechanism for training ANNs in terms of prediction accuracy when the data distribution is known, (3) what is the best combination preprocessing method–training technique when we already know the distribution of the input dataset, (4) what is the best crossover operator for learning the connection weights of an ANN when the preprocessing method and input data distribution are known, (5) how important is it for the RT-based mechanism at which point in the RT mechanism structure we split the data into effective training and validation sets.

The focus of our study is to address the questions posed above. The paper is organized as follows. In Section 2 we review the literature on classification models emphasizing ANN-based models for classification. Next, we introduce our model for assessing a company’s financial performance in Section 3. Research questions and derived hypotheses are formulated in Section 4. The datasets used, with descriptive statistics, are presented in Section 5. In Section 6 we show our experiments’ results, and finally, the conclusions and directions for future research are discussed in Section 7.

2.  LITERATURE REVIEW

The problem of financial performance classification has been tackled in the literature for nearly 40 years. The taxonomy of classification models is based on the algorithm solution being used (Pendharkar, 2002). First, statistical techniques have been deployed: univariate statistics for prediction of failures (introduced by Beaver (1966)), multivariate analysis (Altman, 1968), linear discriminant analysis (LDA) introduced by Fisher (1936), who first applied it on Anderson’s iris data set (Anderson, 1935), multivariate discriminant analysis (Edmister, 1972; Jones, 1987), and probit and logit models (Hamer, 1983; Zavgren, 1985; Rudolfer et al., 1999). The next step in solving the classification problem was the establishment of induction techniques. Some of the most popular such techniques are recursive partitioning algorithm (Frydman et al., 1985), CART (Breiman et al., 1984) and ID3-C4.5-C5.0 (Quinlan, 1993a,b). Costea and co-workers (Costea et al., 2002; Costea and Eklund, 2003) applied and compared two of the above classifiers, namely multinomial logistic regression and Quinlan’s C5.0 decision tree. The two classifiers performed similarly in terms of accuracy rates and outperformed the self-organising map (SOM)[1] classification (Kohonen, 1997). Among the financial application areas of neural networks in the early 1980s, the financial performance classification problem was not an exception. ANNs were extensively used in financial applications, the emphasis being on bankruptcy prediction. A comprehensive study on ANNs for failure prediction can be found in O’Leary (1998), who investigates 15 related papers for a number of characteristics: what data were used, what types of ANN model, what software, what kind of network architecture, etc. Table I presents a sample of studies with their results which compared different classification techniques.

Table I. Sample of pattern classification studies

Reference

Tasks

Techniques

Results

Marais et al. (1984)

Schütze et al. (1995)

Jeng et al. (1997)

Back et al. (1996a,

1997)

Modelling commercial bank loan classifications

Document routing problem

Prediction of bankruptcy, biomedical

Prediction of bankruptcy

Probit, RPA

Relevance feedback, LDA, logistic regression, ANN

Fuzzy inductive learning

algorithm (FILM), ID3,

LDA

LDA, logit, ANN

RPA is not significantly better, especially when data do not include nominal variables Complex learning algorithms (LDA, logistic regression, ANN) outperformed a weak learning algorithm (relevance feedback) Induction systems achieve better results than LDA. FILM slightly outperforms ID

ANN outperformed the other two methods in terms of accuracy