Nocedal, "On the limited memory BFGS
method for large scale optimization.
Among the algorithms used for training, the software automatic module selected the BFGS
(Broyden, 1970; Fletcher, 1970; Goldfarb, 1970; Shanno, 1970), a quasi-Newton method of the second order, very efficient with very fast convergence, although a high memory is required to store the Hessian matrix (Statistica, 2009; Borsato, Pina, Spacino, Scholz, & Androcioli, 2011).
Note that the BFGS
optimizer is not based on matrix inversion, as it is the case for Gauss-Newton method and its derivative.
name Training Error algorithm function Infection severity MLP 89-19-2 BFGS
24 Infection severity Time-to-heal MLP 89-13-3 BFGS
17 Time-to-heal Prediction target Net.
needs storage of nearly Hessian matrix but generally faster than conjugate gradient algorithms.
En la practica, se recomienda la actualizacion BFGS
, debido a que su desempeno numerico es mejor; sin embargo, la actualizacion DFP fue la primera actualizacion secante propuesta, por lo cual tiene un gran interes tanto historico como analitico [13, 2].
k+1] is precisely the BFGS
update in which approximation of the inverse Hessian is restarted as [[theta].
In case of BFGS
algorithm the necessary condition for optimality is the minimization of the error function E(w).
The training performance of the model is observed for seven different algorithms: BFGS
quasi-Newton (BFG), Bayesian regulation (BR), scaled conjugate gradient (SCG), Powell-Beale conjugate gradient (CGB), conjugate gradient with Fletcher-Peeves (CGF), one-step secant (OSS), and Levenberg-Marquardt (LM), respectively [36-39].
Maximum likelihood estimation of model parameters is accomplished by the BFGS
optimization algorithm (Broyden, 1970; Fletcher, 1970; Goldfarb, 1970; Shanno, 1970), an iterative procedure that terminates when one of the following two conditions is reached and satisfited: (1) the maximum number of iterations is exceeded, or (2) the decrease of log-likelihood in two successive iterations is less than a sufficiently small tolerance value.
k+1] using the BFGS
formula as an approximation to [[nabla].
In this work, different gradient-descent variations of the backpropagation algorithm have been considered like Levenberg-Marquardt (LM) , Scaled Conjugate Gradient (SCG) , One Step Secant (OSS) , and Quasi-Newton BFGS