# absorbing the knowledge learned from newly incoming batches

absorbing the knowledge learned from newly incoming batches.

To maintain a well controlled learning status during the training

3.3. Online gradient boosting

stage, the α value should refer to the properties of the real data.

The step size γ is a predefined parameter for generating a pre-

The OGB algorithm is originally transformed from the batch
diction and obtaining the loss at each iteration. Other than the

gradient boosting algorithm of Zhang and Yu (2005). A standard
matrix σ i which is tuned to adjust weights of weak learners, γ

t

empirical risk minimization (ERM) is implemented in the batch
is proposed to scale the linear combination of all weak learners

setting, which finds that the boosting algorithm can be an online
at the most basic level and bound the prediction results within

gradient descent or online Newton step to arrive at the minimal
certain ranges. The variation of γ values results in differences in

point of the overall loss
T
l

y

. The OGB for a convex hull is
weight updates on each Seladelpar learner, and controls the stability of

described in Algorithm 1.

the learning process. Whether to update a weak learner upon the

The OGB algorithm performs an iterative update to the predic-
arrival of a new batch data is another critical decision, since newly

tion from the weak learner. As a different form of linear combi-
received knowledge which contains noisy or redundant informa-

nation from the AdaBoost algorithm that takes the weighted sum
tion can potentially bias the learning model. The variable τ is a

of the outputs from all weak learners as the final boosted output,
threshold to test the effectiveness of expert votes (denoted by wts

the OGB algorithm updates the prediction of each iteration with a
in Algorithm 2) and make decisions on the initiation of the update.

weighted sum of the output from the previous weak learner and
Inappropriately lower τ values (approximating 1) could lead to a

Algorithm 2 Online gradient boosting using adaptive linear weak regressor.

1: Initialize Parameters: Set step size parameter γ ∈ ( N1 , 1), elim-
ination factor α ∈ (0
,
1), and weak learner
activation threshold

2: Initialize Weak Learners: Set the number of weak learners

N equal to the number of features. Set shrinkage parameter σ1i = 0 on each base learner, where i = 1, 2, . . ., N. For each i, randomly generate initial matrices Ri and Ci, and calculate βi =

Update

αX
T
·
wts
·
X
t

and

αXtT

t

t

t

end if

end for

Update σ i

t

t

long training time and potentially eliminate accuracy, while higher

τ values may affect the insu cient training. To address such crit-ical issues, a parameter tuning scheme promises to promote the learning performances.

Similar to other machine learning models, effective parameter setting has to refer to some assumptions on the real dataset in or-der to guarantee optimal learning performances. Genetic algorithm is a heuristic method that searches global optimal solutions. In a data mining model, GA operates by integrating some training data and assisting in appropriate assignment of parameters. Therefore,

it is fair to extract a small portion (5% − 10%) of training data to generate the global optimal for important parameters. The GAOGB model applies the classical Genetic Algorithm (Scrucca, 2013) for

selecting optimal parameters. Applied in GAOGB, the GA algorithm by Scrucca (2013) maintains a population size P0 and searches the optimal solution by G0 generations. The population space is for-mulated by possible combinations of α, γ , τ . At each iteration, a subset of parameter combinations is selected among the initial population based on the fitness, which is measured by the test-