International Journal of Control, Automation, and Systems 2023; 21(11): 3825-3831
https://doi.org/10.1007/s12555-022-0947-1
© The International Journal of Control, Automation, and Systems
An optimization algorithm is essential for minimizing loss (or objective) functions in machine learning and deep learning. Optimization algorithms face several challenges, one among which is to determine an appropriate learning rate. Generally, a low learning rate leads to slow convergence whereas a large learning rate causes the loss function to fluctuate around the minimum. As a hyper-parameter, the learning rate is determined in advance before parameter training, which is time-consuming. This paper proposes a modified stochastic gradient descent (mSGD) algorithm that uses a random learning rate. Random numbers are generated for a learning rate at every iteration, and the one that gives the minimum value of the loss function is chosen. The proposed mSGD algorithm can reduce the time required for determining the learning rate. In fact, the k-point mSGD algorithm can be considered as a kind of steepest descent algorithm. In a real experiment using the MNIST dataset of hand-written digits, it is demonstrated that the convergence performance of mSGD algorithm is much better than that of the SGD algorithm and slightly better than that of the AdaGrad and Adam algorithms.
Keywords Deep learning, machine learning, modified stochastic gradient descent, random learning rate, steepest descent algorithm.
International Journal of Control, Automation, and Systems 2023; 21(11): 3825-3831
Published online November 1, 2023 https://doi.org/10.1007/s12555-022-0947-1
Copyright © The International Journal of Control, Automation, and Systems.
Duk-Sun Shim* and Joseph Shim
Chung-Ang University
An optimization algorithm is essential for minimizing loss (or objective) functions in machine learning and deep learning. Optimization algorithms face several challenges, one among which is to determine an appropriate learning rate. Generally, a low learning rate leads to slow convergence whereas a large learning rate causes the loss function to fluctuate around the minimum. As a hyper-parameter, the learning rate is determined in advance before parameter training, which is time-consuming. This paper proposes a modified stochastic gradient descent (mSGD) algorithm that uses a random learning rate. Random numbers are generated for a learning rate at every iteration, and the one that gives the minimum value of the loss function is chosen. The proposed mSGD algorithm can reduce the time required for determining the learning rate. In fact, the k-point mSGD algorithm can be considered as a kind of steepest descent algorithm. In a real experiment using the MNIST dataset of hand-written digits, it is demonstrated that the convergence performance of mSGD algorithm is much better than that of the SGD algorithm and slightly better than that of the AdaGrad and Adam algorithms.
Keywords: Deep learning, machine learning, modified stochastic gradient descent, random learning rate, steepest descent algorithm.
Vol. 22, No. 10, pp. 2955~3252
KangSoo Ryu, Byungjin Lee, Dong-Gyun Kim, and Sangkyung Sung*
International Journal of Control, Automation, and Systems 2023; 21(11): 3734-3745Donggil You, Jihoon Jung, and Junghyun Oh*
International Journal of Control, Automation, and Systems 2023; 21(11): 3528-3539Jakub Bernat, Jakub Kołota, and Paulina Superczyńska*
International Journal of Control, Automation, and Systems 2023; 21(9): 3080-3090