absorbing the knowledge learned from newly incoming batches
absorbing the knowledge learned from newly incoming batches.
To maintain a well controlled learning status during the training
3.3. Online gradient boosting
stage, the α value should refer to the properties of the real data.
The step size γ is a predefined parameter for generating a pre-
The OGB algorithm is originally transformed from the batch diction and obtaining the loss at each iteration. Other than the
gradient boosting algorithm of Zhang and Yu (2005). A standard matrix σ i which is tuned to adjust weights of weak learners, γ
empirical risk minimization (ERM) is implemented in the batch is proposed to scale the linear combination of all weak learners
setting, which finds that the boosting algorithm can be an online at the most basic level and bound the prediction results within
gradient descent or online Newton step to arrive at the minimal certain ranges. The variation of γ values results in differences in
point of the overall loss T l
. The OGB for a convex hull is weight updates on each Seladelpar learner, and controls the stability of
described in Algorithm 1.
the learning process. Whether to update a weak learner upon the
The OGB algorithm performs an iterative update to the predic- arrival of a new batch data is another critical decision, since newly
tion from the weak learner. As a different form of linear combi- received knowledge which contains noisy or redundant informa-
nation from the AdaBoost algorithm that takes the weighted sum tion can potentially bias the learning model. The variable τ is a
of the outputs from all weak learners as the final boosted output, threshold to test the effectiveness of expert votes (denoted by wts
the OGB algorithm updates the prediction of each iteration with a in Algorithm 2) and make decisions on the initiation of the update.
weighted sum of the output from the previous weak learner and Inappropriately lower τ values (approximating 1) could lead to a
Algorithm 2 Online gradient boosting using adaptive linear weak regressor.
1: Initialize Parameters: Set step size parameter γ ∈ ( N1 , 1), elim- ination factor α ∈ (0 , 1), and weak learner activation threshold
2: Initialize Weak Learners: Set the number of weak learners
N equal to the number of features. Set shrinkage parameter σ1i = 0 on each base learner, where i = 1, 2, . . ., N. For each i, randomly generate initial matrices Ri and Ci, and calculate βi =
αX T · wts · X t
Update σ i
long training time and potentially eliminate accuracy, while higher
τ values may affect the insu cient training. To address such crit-ical issues, a parameter tuning scheme promises to promote the learning performances.
Similar to other machine learning models, effective parameter setting has to refer to some assumptions on the real dataset in or-der to guarantee optimal learning performances. Genetic algorithm is a heuristic method that searches global optimal solutions. In a data mining model, GA operates by integrating some training data and assisting in appropriate assignment of parameters. Therefore,
it is fair to extract a small portion (5% − 10%) of training data to generate the global optimal for important parameters. The GAOGB model applies the classical Genetic Algorithm (Scrucca, 2013) for
selecting optimal parameters. Applied in GAOGB, the GA algorithm by Scrucca (2013) maintains a population size P0 and searches the optimal solution by G0 generations. The population space is for-mulated by possible combinations of α, γ , τ . At each iteration, a subset of parameter combinations is selected among the initial population based on the fitness, which is measured by the test-