In order to solve the problem that the scheduling scheme cannot be updated in real time due to the dynamic change of node demand in material emergency dispatching, this article proposes a dynamic attention model based on improved gated recurrent unit. The dynamic codec framework is used to track the change of node demand to update the node information. The improved gated recurrent unit is embedded between codecs to improve the representation ability of the model. By weighted combination of the node information of the previous time, the current time and the initial time, a more representative node embedding is obtained. The results show that compared with the elitism-based immigrants ant colony optimization algorithm, the solution quality of the proposed model was improved by 27.89, 27.94, 28.09 and 28.12% when the problem scale is 10, 20, 50 and 100, respectively, which can effectively deal with the instability caused by the change of node demand, so as to minimize the cost of material distribution.
Citation: Huawei Jiang, Tao Guo, Zhen Yang, Like Zhao. Deep reinforcement learning algorithm for solving material emergency dispatching problem[J]. Mathematical Biosciences and Engineering, 2022, 19(11): 10864-10881. doi: 10.3934/mbe.2022508
In order to solve the problem that the scheduling scheme cannot be updated in real time due to the dynamic change of node demand in material emergency dispatching, this article proposes a dynamic attention model based on improved gated recurrent unit. The dynamic codec framework is used to track the change of node demand to update the node information. The improved gated recurrent unit is embedded between codecs to improve the representation ability of the model. By weighted combination of the node information of the previous time, the current time and the initial time, a more representative node embedding is obtained. The results show that compared with the elitism-based immigrants ant colony optimization algorithm, the solution quality of the proposed model was improved by 27.89, 27.94, 28.09 and 28.12% when the problem scale is 10, 20, 50 and 100, respectively, which can effectively deal with the instability caused by the change of node demand, so as to minimize the cost of material distribution.
[1] | K. Dorling, J. Heinrichs, G. G. Messier, S. Magierowski, Vehicle routing problems for drone delivery, IEEE Trans. Syst. Man Cybern.: Syst., 47 (2017), 70–85. https://doi.org/10.1109/TSMC.2016.2582745 doi: 10.1109/TSMC.2016.2582745 |
[2] | M. Marinelli, A. Colovic, M. Dell'Orco, A novel dynamic programming approach for two-echelon capacitated vehicle routing problem in city logistics with environmental considerations, Transp. Res. Procedia, 30 (2018), 147–156. https://doi.org/10.1016/j.trpro.2018.09.017 doi: 10.1016/j.trpro.2018.09.017 |
[3] | D. Pecin, C. Contardo, G. Desaulniers, E. Uchoa, New enhancements for the exact solution of the vehicle routing problem with time windows, Informs J. Comput., 29 (2017), 377–580. https://doi.org/10.1287/ijoc.2016.0744 doi: 10.1287/ijoc.2016.0744 |
[4] | D. Zhang, S. Cai, F. Ye, Y. W. Si, T. T. Nguyen, A hybrid algorithm for a vehicle routing problem with realistic constraints, Inf. Sci., 394–395 (2017), 167–182. https://doi.org/10.1016/j.ins.2017.02.028 doi: 10.1016/j.ins.2017.02.028 |
[5] | D. M. Pierre, N. Zakaria, Stochastic partially optimized cyclic shift crossover for multiobjective genetic algorithms for the vehicle routing problem with time-windows, Appl. Soft Comput., 52 (2017), 863–876. https://doi.org/10.1016/j.asoc.2016.09.039 doi: 10.1016/j.asoc.2016.09.039 |
[6] | M. Desrochers, J. Desrosiers, M. Solomon, A new optimization algorithm for the vehicle routing problem with time windows, Oper. Res., 40 (1992), 199–415. https://doi.org/10.1287/opre.40.2.342 doi: 10.1287/opre.40.2.342 |
[7] | H. F. Wu, X. Q. Chen, Q. H. Mao, Q. N. Zhang, S. C. Zhang, Improved ant colony algorithm based on natural selection strategy for solving TSP problem, J. Commun., 34 (2013), 165–170. |
[8] | I. Sutskever, O. Vinyals, Q. Le, Sequence to sequence learning with neural networks, preprint, arXiv: 1409.3215. |
[9] | H. J. Dai, E. B. Khalil, Y. Y. Zhang, B. Dilkina, L. Song, Learning combinatorial optimization algorithms over graphs, preprint, arXiv: 1704.01665. |
[10] | Y. Bengio, A. Lodi, A. Prouvost, Machine learning for combinatorial optimization: a methodological tour d'horizon, preprint, arXiv: 1811.06128. |
[11] | M. Deudon, P. Cournut, A. Lacoste, Y. Adulyasak, L. M. Rousseau, Learning heuristics for the TSP by policy gradient, in Integration of Constraint Programming, Artificial Intelligence, and Operations Research, (2018), 170–181. https://doi.org/10.1007/978-3-319-93031-2_12 |
[12] | J. Zhao, M. Mao, X. Zhao, J. Zou, A hybrid of deep reinforcement learning and local search for the vehicle routing problems, IEEE Trans. Intell. Transp. Syst., 22 (2021), 7208–7218. https://doi.org/10.1109/TITS.2020.3003163 doi: 10.1109/TITS.2020.3003163 |
[13] | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, preprint, arXiv: 1706.03762. |
[14] | W. Kool, H. van Hoof, M. Welling, Attention, learn to solve routing problems, preprint, arXiv: 1803.08475. |
[15] | R. Fukasawa, H. Longo, J. Lysgaard, M. P. de Aragão, M. Reis, E. Uchoa, et al., Robust branch-and-but-and-price for the capacitated vehicle routing problem, Math. Program., 106 (2006), 491–511. https://doi.org/10.1007/s10107-005-0644-x doi: 10.1007/s10107-005-0644-x |
[16] | A. Ceselli, G. Righini, E. Tresoldi, Vehicle routing problems with different service constraints: a branch-and-cut-and-price algorithm, Networks, 64 (2014), 282–291. https://doi.org/10.1002/net.21584 doi: 10.1002/net.21584 |
[17] | H. Z. Zhang, Q. W. Zhang, L. Ma, Z. Y. Zhang, Y. Liu, A hybrid ant colony optimization algorithm for a multi-objective vehicle routing problem with flexible time windows, Inf. Sci., 490 (2019), 166–190. https://doi.org/10.1016/j.ins.2019.03.070 doi: 10.1016/j.ins.2019.03.070 |
[18] | B. Yao, C. Chen, X. Song, X. Yang, Fresh seafood delivery routing problem using an improved ant colony optimization, Ann. Oper. Res., 273 (2019), 163–186. https://doi.org/10.1007/s10479-017-2531-2 doi: 10.1007/s10479-017-2531-2 |
[19] | D. Cattaruzza, N. Absi, D. Feillet, The multi-trip vehicle routing problem with time windows and release dates, Trans. Sci., 50 (2016), 363–761. https://doi.org/10.1287/trsc.2015.0608 doi: 10.1287/trsc.2015.0608 |
[20] | S. H. Wang, Z. Y. Lu, L. Wei, G. L. Ji, J. Q. Yang, Fitness-scaling adaptive genetic algorithm with local search for solving the multiple depot vehicle routing problem, Inf. Technol. Control, 92 (2016), 601–616. https://doi.org/10.1177/0037549715603481 doi: 10.1177/0037549715603481 |
[21] | J. Hopfield, D. W. Tank, "Neural" computation of decisions in optimization problems, Biol. Cybern., 52 (1985), 141–152. https://doi.org/10.1007/BF00339943 doi: 10.1007/BF00339943 |
[22] | K. A. Smith, Neural networks for combinatorial optimization: a review of more than a decade of research, Oper. Res., 11 (1999), 1–123. https://doi.org/10.1287/ijoc.11.1.15 doi: 10.1287/ijoc.11.1.15 |
[23] | O. Vinyals, M. Fortunato, N. Jaitly, Pointer networks, preprint, arXiv: 1506.03134. |
[24] | I. Bello, H. Pham, Q. V. Le, M. Norouzi, S. Bengio, Neural combinatorial optimization with reinforcement learning, preprint, arXiv: 1611.09940. |
[25] | M. Nazari, A. Oroojlooy, L. V. Snyder, M. Takáč, Reinforcement learning for solving the vehicle routing problem, preprint, arXiv: 1802.04240. |
[26] | J. J. Q. Yu, W. Yu, J. T. Gu, Online vehicle routing with neural combinatorial optimization and deep reinforcement learning, IEEE Trans. Intell. Transp. Syst., 20 (2019), 3806–3817. https://doi.org/10.1109/TITS.2019.2909109 doi: 10.1109/TITS.2019.2909109 |
[27] | H. W. Ma, Y. X. Sheng, W. Xia, A pointer neural network for the vehicle routing problem with task priority and limited resources, Inf. Technol. Control, 49 (2020), 237–248. https://doi.org/10.5755/j01.itc.49.2.24613 doi: 10.5755/j01.itc.49.2.24613 |
[28] | K. Cho, B. V. Merrienboer, C. Gulcehre, D. Bahdanau, F, Bougares, H. Schwenk, et al., Learning phrase representations using RNN Encoder-Decoder for statistical machine translation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), (2014), 1724–1734. https://doi.org/10.3115/v1/D14-1179 |
[29] | M. M. Solomon, VRPTW benchmark problems, 2003. |
[30] | G. A. Bula, C. Prodhon, F. A. Gonzalez, H. M. Afsar, N. Velasco, Variable neighborhood search to solve the vehicle routing problem for hazardous materials transportation, J. Hazard. Mater., 324 (2017), 472–480. https://doi.org/10.1016/j.jhazmat.2016.11.015 doi: 10.1016/j.jhazmat.2016.11.015 |
[31] | M. Mavrovouniotis, S. Yang, Ant algorithms with immigrants schemes for the dynamic vehicle routing problem, Inf. Sci., 294 (2015), 456–477. https://doi.org/10.1016/j.ins.2014.10.002 doi: 10.1016/j.ins.2014.10.002 |