International Journal of Control, Automation, and Systems 2025; 23(2): 418-428
https://doi.org/10.1007/s12555-024-0487-y
© The International Journal of Control, Automation, and Systems
Visual localization determines an agent’s precise position and orientation within an environment using visual data. It has become a critical task in the field of robotics, particularly in applications such as autonomous navigation. This is due to the ability to determine an agent’s pose using cost-effective sensors such as RGB cameras. Recent methods in visual localization employ scene coordinate regression to determine the agent’s pose. However, these methods face challenges as they attempt to regress 2D-3D correspondences across the entire image region, despite not all regions providing useful information. To address this issue, we introduce an attention network that selectively targets informative regions of the image. Using this network, we identify the highest-scoring features to improve the feature selection process and combine the result with edge detection. This integration ensures that the features chosen for the training buffer are located within robust regions, thereby improving 2D-3D correspondence and overall localization performance. Our approach was tested on the outdoor benchmark dataset, demonstrating superior results compared to previous methods.
Keywords Attention network, computer vision, edge detector, scene coordinate regression, visual localization.
International Journal of Control, Automation, and Systems 2025; 23(2): 418-428
Published online February 1, 2025 https://doi.org/10.1007/s12555-024-0487-y
Copyright © The International Journal of Control, Automation, and Systems.
Nanda Febri Istighfarin and HyungGi Jo*
Jeonbuk National University
Visual localization determines an agent’s precise position and orientation within an environment using visual data. It has become a critical task in the field of robotics, particularly in applications such as autonomous navigation. This is due to the ability to determine an agent’s pose using cost-effective sensors such as RGB cameras. Recent methods in visual localization employ scene coordinate regression to determine the agent’s pose. However, these methods face challenges as they attempt to regress 2D-3D correspondences across the entire image region, despite not all regions providing useful information. To address this issue, we introduce an attention network that selectively targets informative regions of the image. Using this network, we identify the highest-scoring features to improve the feature selection process and combine the result with edge detection. This integration ensures that the features chosen for the training buffer are located within robust regions, thereby improving 2D-3D correspondence and overall localization performance. Our approach was tested on the outdoor benchmark dataset, demonstrating superior results compared to previous methods.
Keywords: Attention network, computer vision, edge detector, scene coordinate regression, visual localization.
Vol. 23, No. 2, pp. 359~682
Jinuk Heo, Hyelim Choi, Yongseok Lee, Hyunsu Kim, Harim Ji, Hyunreal Park, Youngseon Lee, Cheongkee Jung, Hai-Nguyen Nguyen, and Dongjun Lee*
International Journal of Control, Automation, and Systems 2024; 22(6): 1761-1778Amos Alwala*, Haitham El-Hussieny, Abdelfatah Mohamed, Kiyotaka Iwasaki, and Samy F. M. Assal
2023; 21(6): 1978-1992Seonghun Hong and Jinwhan Kim*
International Journal of Control, Automation and Systems 2020; 18(3): 564-574