Другие предметы \ Проектирование

Improving Moving Average Trading Rules with Boosting and Statistical Learning Methods, страница 3

A weak classifi er is one which performs just slightly better than random guessing. Boosting refers to the general problem of producing a very accurate classifi er by combining rough and moderately inaccurate weak classifi ers. Thus boosting consists of sequentially applying the weak classifi ers to repeatedly modifi ed versions of the data in order to produce a sequence of weak predictions h_t(x), t = 1, 2, . . . , T where h_t: X → {−1, +1} and X represents a training observation set. The predictions are then combined through a weighted majority vote to produce the fi nal prediction.

More specifi cally, the algorithm takes as an input a training observation set Z = {(x₁, y₁), . . . , (x_M, y_M)} where the predictor variables x_i ∈ X belong to the same domain and each y_i belongs to {−1, +1}.

The principal idea of a boosting algorithm is to consider a set of weights {w_t(i)}, t = 1, . . . , T, i = 1, . . . , M over the training observations Z.

Initially all of the weights are set to w i₁( ) = ¹,i =1, . . . ,M, but each of the classifi ers h_t−₁ produces

a new distribution of weights w_t(i) modifying the individual weights, increasing those weights where the observations were misclassifi ed and decreasing the weights of those that were classifi ed correctly, so that the weak classifi er is forced to focus on the hard examples in the training set and observations that are diffi cult to classify correctly receive ever-increasing infl uence. Each successive classifi er is thereby forced to concentrate on those observations that are missed by the previous ones in the sequence.

More formally, as Freund and Schapire (1999) show, the AdaBoost algorithm is as follows:

Initalize w i₁( ) = ,i =1, . . . ,M,

For t = 1, . . . , T:

• Fit a classifi er h_t(x) to the training data using weights w_t(i).

∑^Mw I y_i( _i≠ h_t( )x_i)

• Compute ε_t= ⁱ⁼¹_M (1)

∑w_i

i=1

• Compute α = ¹^^¹−εtε^t^^ (2)

• Update wt+1 ( )i = w it ( )⋅exp(−αt y hi t ( )xi ),i =1, . . . ,M

Z_t

where Z_t is a normalization factor (selected so that w_t+₁ will be a distribution).

The fi nal output will be

(3)

H x( ) = sign∑T αtht ( )x 

(4)

t=1

The Bayesian model-averaging approach and committee methods

Given a set of candidate models, Bayesian model averaging consists of taking a weighted average of the individual predictions, with weights proportional to the posterior probability of each model.

Thus, when we have a set of candidate models h_t, t = 1, . . . , T for our training set Z = {(x₁, y₁), . . . , (x_M, y_M)}, it is also possible to use the Bayesian model-averaging approach as a way of improving all the individual predictions, taking a weighted average of the predictions proportional to the rate

st M

of success ∑^Tj of the model h_t on the training set Z, where s_t⁼∑i I y( _i= h_t(x_i)) (see Hastie et al., s =1

j=1

for a formal demonstration). Therefore, the Bayesian model-averaging approach gives weight to each 2001, for a model depending on how well it fi ts.

On the other hand, we also consider committee methods, which consider a simple unweighted average of the predictions from each model, giving equal probability to each model.

The classical combining predictions procedure

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Скачать файл