In this paper, we introduce a novel inversion methodology that combines the benefits offered by Reinforcement-Learning techniques with the advantages of the Epsilon-Greedy method for an expanded exploration of the model space. Among the various Reinforcement Learning approaches, we applied the set of algorithms included in the category of the Q-Learning methods. We show that the Temporal Difference algorithm offers an effective iterative approach that allows finding an optimal solution in geophysical inverse problems. Furthermore, the Epsilon-Greedy method properly coupled with the Reinforcement Learning workflow, allows expanding the exploration of the model-space, minimizing the misfit between observed and predicted responses and limiting the problem of local minima of the cost function. In order to prove the feasibility of our methodology, we tested it using synthetic geo-electric data and a seismic refraction data set available in the public domain.
Citation: Paolo Dell'Aversana. Reinforcement learning in optimization problems. Applications to geophysical data inversion[J]. AIMS Geosciences, 2022, 8(3): 488-502. doi: 10.3934/geosci.2022027
In this paper, we introduce a novel inversion methodology that combines the benefits offered by Reinforcement-Learning techniques with the advantages of the Epsilon-Greedy method for an expanded exploration of the model space. Among the various Reinforcement Learning approaches, we applied the set of algorithms included in the category of the Q-Learning methods. We show that the Temporal Difference algorithm offers an effective iterative approach that allows finding an optimal solution in geophysical inverse problems. Furthermore, the Epsilon-Greedy method properly coupled with the Reinforcement Learning workflow, allows expanding the exploration of the model-space, minimizing the misfit between observed and predicted responses and limiting the problem of local minima of the cost function. In order to prove the feasibility of our methodology, we tested it using synthetic geo-electric data and a seismic refraction data set available in the public domain.
[1] | Boyd SP, Vandenberghe L (2004) Convex Optimization, Cambridge University Press, 129. |
[2] | Tarantola A (2005) Inverse Problem Theory and Methods for Model Parameter Estimation, SIAM. https://doi.org/10.1137/1.9780898717921 |
[3] | Horst R, Tuy H (1996) Global Optimization: Deterministic Approaches, Springer. |
[4] | Neumaier A (2004) Complete Search in Continuous Global Optimization and Constraint Satisfaction. Acta Numerica 13: 271–369. https://doi.org/10.1017/S0962492904000194 doi: 10.1017/S0962492904000194 |
[5] | Raschka S, Mirjalili V (2017) Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, PACKT Books. |
[6] | Russell S, Norvig P (2016) Artificial Intelligence: A Modern approach, Pearson Education, Inc. |
[7] | Ravichandiran S (2020) Deep Reinforcement Learning with Python, Packt Publishing. |
[8] | Duan Y, Chen X, Houthooft R, et al. (2016) Benchmarking deep reinforcement learning for continuous control. ICML 48: 1329–1338. https://arXiv.org/abs/1604.06778 |
[9] | Ernst D, Geurts P, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res 6: 503–556. |
[10] | Geramifard A, Dann C, Klein RH, et al. (2015) RLPy: A Value-Function-Based Reinforcement Learning Framework for Education and Research. J Mach Learn Res 16: 1573–1578. |
[11] | Lample G, Chaplot DS (2017) Playing FPS Games with Deep Reinforcement Learning. AAAI 2140–2146. https://doi.org/10.48550/arXiv.1609.05521 doi: 10.48550/arXiv.1609.05521 |
[12] | Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. Proc Eleventh Int Conf Mach Learn, 157–163. https://doi.org/10.1016/b978-1-55860-335-6.50027-1 doi: 10.1016/b978-1-55860-335-6.50027-1 |
[13] | Nagabandi A, Kahn G, Fearing RS, et al. (2018) Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. ICRA, 7559–7566. https://doi.org/10.1109/ICRA.2018.8463189 doi: 10.1109/ICRA.2018.8463189 |
[14] | Ribeiro C, Szepesvári C (1996) Q-learning combined with spreading: Convergence and results. Proc ISRF-IEE Int Conf Intell Cognit Syst, 32–36. |
[15] | Rücker C, Günther T, Wagner FM (2017) pyGIMLi: an open-source library for modelling and inversion in geophysics. Comput Geosci 109: 106–123. https://doi.org/10.1016/j.cageo.2017.07.011 doi: 10.1016/j.cageo.2017.07.011 |