Improving Moving Average Trading Rules with Boosting and Statistical Learning Methods, страница 2

Although our research is centred on combining the classical technical trading rules by statistical learning methods, it is necessary to emphasize that there have been numerous attempts to improve the technical trading rules and to create new ones. In this sense, outstanding among others are Gençay (1999) and Allen and Karjalainen (1999). Thus Gençay (1999) considered new trading rules based on non-parametric models which maximize the total return of an investment strategy. The optimal choice of nearest neighbours, optimal number of hidden units in a feedforward network and the optimal size of the training set are determined by the cross-validation method, which minimizes the mean square error. Another well-known paper devoted to fi nding new technical trading rules is Allen and Karjalainen (1999), which used a genetic algorithm to learn optimal technical trading rules.

Finally, the problems of selecting in-sample optimal trading rules have been pointed out in a recent paper by Sullivan et al. (1999) arguing that the dangers of data snooping are immense when we select the ‘best’ trading rule. Following Sullivan et al. (1999), if enough trading rules are considered over time, some rules are bound, by pure luck, even in a very large sample, to produce superior performance even if they do not genuinely possess predictive power over asset returns. Thus the effects of such data snooping can only be quantifi ed provided that one considers the performance of the best trading rule in the context of the full universe of trading rules from which the best rule was conceivably chosen.

Our research is heading in an opposite direction to optimize technical trading rules, because we look for how to combine the existing ones through boosting and model-averaging techniques.

As a review, our paper is dual purpose. On the one hand, since there exist numerous technical trading rules with different degrees of success, we attempt to avoid the mismatching which exists between different trading rules, providing a new rule capable of using all the information provided by every rule, as much the highly successful information as the unsuccessful information, using statistical learning methods. On the other hand, by combining the predictive information of a wide set of rules we also reduce the data-snooping bias introduced by the arbitrary selection of the parameters in technical trading rules, avoiding the element of subjectivity that this procedure involves.

STATISTICAL LEARNING METHODS

Just as a committee of diverse people tends to make better decisions than each individual alone, an ensemble of diverse yet high-performing models tend to perform better than a single model. Statistical learning methods are algorithms that construct a set of classifi ers and then classify new data points by taking a (weighted) vote of their predictions (see Hastie et al., 2001). The original statistical method is Bayesian averaging, but more recent algorithms have been developed. In this section we will describe the most popular statistical learning methods, such as ‘Boosting’, ‘Bayesian model averaging’ and the ‘Committee method’, which will be used in order to combine the technical predictions, thus improving the performance of the individual trading rules.

The boosting method

Boosting is a general method which attempts to boost the accuracy of any set of categorical classifi cation systems (or predictions in general) that becomes one of the most powerful ideas about learning algorithms. It was introduced by Freund and Schapire (1997). Boosting deals with the general problem of producing a very accurate prediction rule by combining rough and moderately inaccurate predictions.

One of the most popular versions of boosting is the AdaBoost.M1 algorithm, known as ‘Discrete AdaBoost’, due to Freund and Schapire (1997). In order to provide an outline of this boosting algorithm, let us consider a two-class problem where the output variable is coded as {−1, +1}. A classifi er h(x) is a function that produces a prediction taking one of the two values {−1, +1}, where x is a set of predictor variables.