Research article Special Issues

Reinforcement learning in optimization problems. Applications to geophysical data inversion

  • Received: 06 May 2022 Revised: 18 July 2022 Accepted: 31 July 2022 Published: 08 August 2022
  • In this paper, we introduce a novel inversion methodology that combines the benefits offered by Reinforcement-Learning techniques with the advantages of the Epsilon-Greedy method for an expanded exploration of the model space. Among the various Reinforcement Learning approaches, we applied the set of algorithms included in the category of the Q-Learning methods. We show that the Temporal Difference algorithm offers an effective iterative approach that allows finding an optimal solution in geophysical inverse problems. Furthermore, the Epsilon-Greedy method properly coupled with the Reinforcement Learning workflow, allows expanding the exploration of the model-space, minimizing the misfit between observed and predicted responses and limiting the problem of local minima of the cost function. In order to prove the feasibility of our methodology, we tested it using synthetic geo-electric data and a seismic refraction data set available in the public domain.

    Citation: Paolo Dell'Aversana. Reinforcement learning in optimization problems. Applications to geophysical data inversion[J]. AIMS Geosciences, 2022, 8(3): 488-502. doi: 10.3934/geosci.2022027

    Related Papers:

  • In this paper, we introduce a novel inversion methodology that combines the benefits offered by Reinforcement-Learning techniques with the advantages of the Epsilon-Greedy method for an expanded exploration of the model space. Among the various Reinforcement Learning approaches, we applied the set of algorithms included in the category of the Q-Learning methods. We show that the Temporal Difference algorithm offers an effective iterative approach that allows finding an optimal solution in geophysical inverse problems. Furthermore, the Epsilon-Greedy method properly coupled with the Reinforcement Learning workflow, allows expanding the exploration of the model-space, minimizing the misfit between observed and predicted responses and limiting the problem of local minima of the cost function. In order to prove the feasibility of our methodology, we tested it using synthetic geo-electric data and a seismic refraction data set available in the public domain.



    加载中


    [1] Boyd SP, Vandenberghe L (2004) Convex Optimization, Cambridge University Press, 129.
    [2] Tarantola A (2005) Inverse Problem Theory and Methods for Model Parameter Estimation, SIAM. https://doi.org/10.1137/1.9780898717921
    [3] Horst R, Tuy H (1996) Global Optimization: Deterministic Approaches, Springer.
    [4] Neumaier A (2004) Complete Search in Continuous Global Optimization and Constraint Satisfaction. Acta Numerica 13: 271–369. https://doi.org/10.1017/S0962492904000194 doi: 10.1017/S0962492904000194
    [5] Raschka S, Mirjalili V (2017) Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow, PACKT Books.
    [6] Russell S, Norvig P (2016) Artificial Intelligence: A Modern approach, Pearson Education, Inc.
    [7] Ravichandiran S (2020) Deep Reinforcement Learning with Python, Packt Publishing.
    [8] Duan Y, Chen X, Houthooft R, et al. (2016) Benchmarking deep reinforcement learning for continuous control. ICML 48: 1329–1338. https://arXiv.org/abs/1604.06778
    [9] Ernst D, Geurts P, Wehenkel L (2005) Tree-based batch mode reinforcement learning. J Mach Learn Res 6: 503–556.
    [10] Geramifard A, Dann C, Klein RH, et al. (2015) RLPy: A Value-Function-Based Reinforcement Learning Framework for Education and Research. J Mach Learn Res 16: 1573–1578.
    [11] Lample G, Chaplot DS (2017) Playing FPS Games with Deep Reinforcement Learning. AAAI 2140–2146. https://doi.org/10.48550/arXiv.1609.05521 doi: 10.48550/arXiv.1609.05521
    [12] Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. Proc Eleventh Int Conf Mach Learn, 157–163. https://doi.org/10.1016/b978-1-55860-335-6.50027-1 doi: 10.1016/b978-1-55860-335-6.50027-1
    [13] Nagabandi A, Kahn G, Fearing RS, et al. (2018) Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. ICRA, 7559–7566. https://doi.org/10.1109/ICRA.2018.8463189 doi: 10.1109/ICRA.2018.8463189
    [14] Ribeiro C, Szepesvári C (1996) Q-learning combined with spreading: Convergence and results. Proc ISRF-IEE Int Conf Intell Cognit Syst, 32–36.
    [15] Rücker C, Günther T, Wagner FM (2017) pyGIMLi: an open-source library for modelling and inversion in geophysics. Comput Geosci 109: 106–123. https://doi.org/10.1016/j.cageo.2017.07.011 doi: 10.1016/j.cageo.2017.07.011
  • Reader Comments
  • © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(3584) PDF downloads(382) Cited by(4)

Article outline

Figures and Tables

Figures(12)  /  Tables(1)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog