Intelligent systems in accounting, finance and managementassessing predictive performance of ann-based classifiers, страница 6

cluster. Usually, the elements of the dataset are assigned to the cluster that has the highest membership degree. In spite of the additional information provided by the methodology, there is a problem with the observations that are difficult to position (uncertain observations) when they obtain similar membership values for two or more clusters. Alcaraz and Costea (2004) introduced a modified version of the FCM algorithm to allocate the uncertain observations by introducing weights when calculating distances to the clusters’ centres. They compared the modified version of the FCM algorithm with normal FCM and SOM clustering. The modified FCM algorithm outperformed both the normal FCM and the SOM with respect to pattern classification. In this study, normal FCM was chosen for practical implementation reasons. We created for each financial ratio a linguistic variable that can help us in characterizing the clusters. Linguistic variables are quantitative fuzzy variables whose states are fuzzy numbers that represent linguistic terms (Klir and Yuan, 1995). Alcaraz and Costea (2004) model the seven financial ratios with the help of seven linguistic variables using five linguistic terms: very low (VL), low (L), average (A), high (H), very high (VH). Table II shows the characterization of the seven clusters for the real telecom dataset without preprocessing the data (first preprocessing approach).

We considered that one linguistic term characterizes one cluster if it represents more than 40% of the total number of observations for that cluster. It seems that one of the ratios, i.e. receivables turnover (ReT), does not have the discriminatory power among the data except for one cluster. By comparing the clusters we can easily label them as being good, bad, worst, etc. depending on their linguistic terms.

3.2.  Empirical Procedure for Determining the ANN Architecture

Once the data is ready to be trained, we need to find a suitable architecture for the ANN. Choosing the number of hidden layers and the number of neurons in each hidden layer is not a straightforward task. The choices of these numbers depend on ‘input/output vector sizes, size of training and test subsets, and, more importantly, the problem of nonlinearity’ (Basheer and Hajmeer, 2000: 22). It is well known that neural networks are very sensitive regarding the dimensionality of the dataset (Hagan et al., 1996; Basheer and Hajmeer, 2000; Demuth and Beale, 2001). Basheer and Hajmeer (2000) cite a number of papers that introduce different rules of thumb that link the number of hidden neurons NH with the number of input neurons NI and output neurons NO or with the number of training samples NTRN. One rule of thumb, proposed in Lachtermacher and Fuller (1995), suggests that the number of hidden neurons NH for one output ANN is 0.11NTRN ≤ NH(NI + 1) ≤ 0.30NTRN. Upadhyaya and Eryurek (1992) related the total number of weights Nw with the number of training samples: Nw = NTRN log2(NTRN). Masters (1994) proposed that the number of hidden neurons on the hidden layer should take values in the vicinity of the geometric mean of the number of input and number of output neurons. Taking Basheer and Hajmeer’s (2000: 23) advice, that ‘the most popular approach to finding the optimal number of hidden nodes is by trial and error with one of the above rules’, we chose Masters’s rule of thumb as a starting point to develop our ANN architectures. Concerning the number of hidden layers, we performed a number of experiments for ANN architectures with one and two hidden layers to see what the appropriate number of hidden layers was. In almost every case an ANN with two hidden layers performed better in terms of training mean-square error. Besides our own experiments with the financial dataset, we based our choice of two hidden layers for the ANN architecture on the architecture we found for the prediction of glass manufacturing process variables reported in Nastac and Costea (2004) and on what was previously reported in literature. The two hidden-layer ANNs performed better than single hidden-layer ANNs in the examples from Hartman and Keeler (1991), Lönnblad et al. (1992) and Ohlsson et al. (1994). Concerning the problem of choosing one or two hidden layers, Chester (1990) argues that ‘. . . an MLP with two hidden layers can often yield an accurate approximation with fewer weights than an MLP with one hidden layer’ and that ‘the problem with a single hidden layer is that the neurons interact with each other globally, making it difficult to improve an approximation at one point without worsening it elsewhere’. We did not take into consideration three hidden-layer cases owing to the number of cases per weights ratio-restriction.