A two-level load-utility balancing model for multi-source smart grids based on multi-agent reinforcement learning algorithm

Linsen Song; Yukai Zhang; Jishen Jia; Linsen Song; Yukai Zhang; Jishen Jia

doi:10.3934/era.2026170

Electronic Research Archive

2026, Volume 34, Issue 6: 3768-3789. doi: 10.3934/era.2026170

Previous Article Next Article

Research article Special Issues

A two-level load-utility balancing model for multi-source smart grids based on multi-agent reinforcement learning algorithm

1.
School of Artificial Intelligence, Henan Institute of Science and Technology, Xinxiang 453003, China
2.
School of Mathematical Sciences, Henan Institute of Science and Technology, Xinxiang 453003, China

Received: 06 February 2026 Revised: 11 April 2026 Accepted: 22 April 2026 Published: 06 May 2026

In smart grids, the interactions among diverse participants significantly affect system efficiency and social welfare. Considering that the real-time electricity pricing (RTP) mechanism under traditional social welfare maximization fails to account for the independent interests of various entities in the power system, a two-level load-utility balancing model for real-time pricing in smart grids is proposed in this paper, where the welfare of the demand side and the multi-energy supply side is collaboratively optimized and balanced. Furthermore, a multi-agent reinforcement learning (MARL) algorithm based on the centralized training and decentralized execution (CTDE) framework is designed for this model, and a multi-agent electricity market environment is constructed accordingly, comprising users, an aggregated power supplier, and a power market scheduling center (PMSC). The user agent is modeled with a heterogeneous utility function, the supplier agent is modeled with a profit function coordinating both traditional and renewable energy, while the PMSC agent is responsible for real-time pricing and cross-agent welfare balance coordination. Finally, simulation results show the effectiveness of the proposed model and algorithm in achieving welfare balance between the supplier and users. Compared with the pricing scheme without a welfare-balancing mechanism, the proposed model reduces the welfare gap between the supplier and users by approximately 46.9%. Compared with the non-dominated sorting genetic algorithm II (NSGA-II), the proposed method can achieve a comparable level of total social welfare.
- smart grid,
- real-time pricing,
- social welfare,
- reinforcement learning
Citation: Linsen Song, Yukai Zhang, Jishen Jia. A two-level load-utility balancing model for multi-source smart grids based on multi-agent reinforcement learning algorithm[J]. Electronic Research Archive, 2026, 34(6): 3768-3789. doi: 10.3934/era.2026170

Related Papers:

Abstract

In smart grids, the interactions among diverse participants significantly affect system efficiency and social welfare. Considering that the real-time electricity pricing (RTP) mechanism under traditional social welfare maximization fails to account for the independent interests of various entities in the power system, a two-level load-utility balancing model for real-time pricing in smart grids is proposed in this paper, where the welfare of the demand side and the multi-energy supply side is collaboratively optimized and balanced. Furthermore, a multi-agent reinforcement learning (MARL) algorithm based on the centralized training and decentralized execution (CTDE) framework is designed for this model, and a multi-agent electricity market environment is constructed accordingly, comprising users, an aggregated power supplier, and a power market scheduling center (PMSC). The user agent is modeled with a heterogeneous utility function, the supplier agent is modeled with a profit function coordinating both traditional and renewable energy, while the PMSC agent is responsible for real-time pricing and cross-agent welfare balance coordination. Finally, simulation results show the effectiveness of the proposed model and algorithm in achieving welfare balance between the supplier and users. Compared with the pricing scheme without a welfare-balancing mechanism, the proposed model reduces the welfare gap between the supplier and users by approximately 46.9%. Compared with the non-dominated sorting genetic algorithm II (NSGA-II), the proposed method can achieve a comparable level of total social welfare.

References

[1]	K. Dheeraja, R. Padma Priya, T. Ritika, Optimal real-time pricing and sustainable load scheduling model for smart homes using Stackelberg game theory, in Proceedings of International Conference on Computational Intelligence and Data Engineering, Springer, Singapore, 99 (2022). https://doi.org/10.1007/978-981-16-7182-1_22
[2]	Y. Dai, Y. Gao, H. Yin, X. Feng, Research on integrated demand response mechanism of electricity market considering renewable energy subsidies, Chin. J. Manage. Sci., 33 (2025), 357–368. https://doi.org/10.16381/j.cnki.issn1003-207x.2022.2441 doi: 10.16381/j.cnki.issn1003-207x.2022.2441
[3]	T. Namerikawa, N. Okubo, R. Sato, Y. Okawa, M. Ono, Real-time pricing mechanism for electricity market with built-in incentive for participation, IEEE Trans. Smart Grid, 6 (2015), 2714–2724. https://doi.org/10.1109/TSG.2015.2447154 doi: 10.1109/TSG.2015.2447154
[4]	Y. Dai, X. Sun, L. Li, H. Gao, Residential electricity real-time demand response mechanism based on multi-level game in smart grid, Oper. Res. Manage. Sci., 30 (2021), 11–17. https://doi.org/10.12005/orms.2021.0307 doi: 10.12005/orms.2021.0307
[5]	P. Samadi, A. H. Mohsenian-Rad, R. Schober, V. W. S. Wong, J. Jatskevich, Optimal real-time pricing algorithm based on utility maximization for smart grid, in 2010 First IEEE International Conference on Smart Grid Communications, Gaithersburg, MD, USA, (2010), 415–420. https://doi.org/10.1109/SMARTGRID.2010.5622077
[6]	G. Yuan, Y. Gao, H. Wang, A real-time pricing algorithm based on utility classification in a smart grid, J. Univ. Shanghai Sci. Technol., 42 (2020), 29–35. https://doi.org/10.13255/j.cnki.jusst.2020.01.006 doi: 10.13255/j.cnki.jusst.2020.01.006
[7]	L. Song, G. Sheng, A nonsmooth Levenberg–Marquardt method based on KKT conditions for real-time pricing in smart grid, Int. J. Electr. Power Energy Syst., 162 (2024), 110235. https://doi.org/10.1016/j.ijepes.2024.110235 doi: 10.1016/j.ijepes.2024.110235
[8]	H. Wang, Y. Gao, Real-time pricing method for smart grids based on complementarity problem, J. Mod. Power Syst. Clean Energy, 7 (2019), 1280–1293. https://doi.org/10.1007/s40565-019-0508-7 doi: 10.1007/s40565-019-0508-7
[9]	Y. Li, J. Li, Z. Yu, J. Dong, T. Zhou, A cosh-based smoothing Newton algorithm for the real-time pricing problem in smart grid, Int. J. Electr. Power Energy Syst., 135 (2022), 107296. https://doi.org/10.1016/j.ijepes.2021.107296 doi: 10.1016/j.ijepes.2021.107296
[10]	L. Song, Y. Du, A real-time pricing dynamic algorithm for a smart grid with multi-pricing and multiple energy generation, Electron. Res. Arch., 33 (2025), 2989–3006. https://doi.org/10.3934/era.2025131 doi: 10.3934/era.2025131
[11]	Y. Xu, J. Han, Z. Yin, Q. Liu, C. Dai, Z. Ji, Voltage and reactive power-optimization model for active distribution networks based on second-order cone algorithm, Computers, 13 (2024), 95. https://doi.org/10.3390/computers13040095 doi: 10.3390/computers13040095
[12]	M. A. L. Silva, S. R. de Souza, M. J. F. Souza, A. L. C. Bazzan, A reinforcement learning-based multi-agent framework applied for solving routing and scheduling problems, Expert Syst. Appl., 131 (2019), 148–171. https://doi.org/10.1016/j.eswa.2019.04.056 doi: 10.1016/j.eswa.2019.04.056
[13]	A. K. Shakya, G. Pillai, S. Chakrabarty, Reinforcement learning algorithms: a brief survey, Expert Syst. Appl., 231 (2023), 120495. https://doi.org/10.1016/j.eswa.2023.120495 doi: 10.1016/j.eswa.2023.120495
[14]	R. Lu, S. H. Hong, X. Zhang, A dynamic pricing demand response algorithm for smart grid: reinforcement learning approach, Appl. Energy, 220 (2018), 220–230. https://doi.org/10.1016/j.apenergy.2018.03.072 doi: 10.1016/j.apenergy.2018.03.072
[15]	J. Wang, Y. Gao, R. Li, Reinforcement learning based bilevel real-time pricing strategy for a smart grid with distributed energy resources, Appl. Soft Comput., 155 (2024), 111474. https://doi.org/10.1016/j.asoc.2024.111474 doi: 10.1016/j.asoc.2024.111474
[16]	H. Song, Z. Wang, Y. Gao, Bi-level real-time pricing model in multitype electricity users for welfare equilibrium: a reinforcement learning approach, J. Renewable Sustainable Energy, 17 (2025), 015501. https://doi.org/10.1063/5.0242836 doi: 10.1063/5.0242836
[17]	M. Ahrarinouri, M. Rastegar, A. R. Seifi, Multiagent reinforcement learning for energy management in residential buildings, IEEE Trans. Ind. Inf., 17 (2021), 659–666. https://doi.org/10.1109/TII.2020.2977104 doi: 10.1109/TII.2020.2977104
[18]	B. C. Lai, W. Y. Chiu, Y. P. Tsai, Multiagent reinforcement learning for community energy management to mitigate peak rebounds under renewable energy uncertainty, IEEE Trans. Emerging Top. Comput. Intell., 6 (2022), 568–579. https://doi.org/10.1109/TETCI.2022.3157026 doi: 10.1109/TETCI.2022.3157026
[19]	Y. He, C. Gu, Y. Gao, J. Wang, Bi-level day-ahead and real-time hybrid pricing model and its reinforcement learning method, Energy, 322 (2025), 135316. https://doi.org/10.1016/j.energy.2025.135316 doi: 10.1016/j.energy.2025.135316
[20]	Y. Du, F. Li, H. Zandi, Y. Xue, Approximating Nash equilibrium in day-ahead electricity market bidding with multi-agent deep reinforcement learning, J. Mod. Power Syst. Clean Energy, 9 (2021), 534–544. https://doi.org/10.35833/MPCE.2020.000502 doi: 10.35833/MPCE.2020.000502
[21]	T. A. Nakabi, P. Toivanen, Deep reinforcement learning for energy management in a microgrid with flexible demand, Sustainable Energy Grids Networks, 25 (2021), 100413. https://doi.org/10.1016/j.segan.2020.100413 doi: 10.1016/j.segan.2020.100413
[22]	Y. Wang, H. Zhang, Y. An, Z. Ji, I. Ganchev, RG hyperparameter optimization approach for improved indirect prediction of blood glucose levels by boosting ensemble learning, Electronics, 10 (2021), 1797. https://doi.org/10.3390/electronics10151797 doi: 10.3390/electronics10151797
[23]	Y. Chen, J. Xu, Solar and wind power data from the Chinese State Grid Renewable Energy Generation Forecasting Competition, Sci. Data, 9 (2022), 577. https://doi.org/10.1038/s41597-022-01696-6 doi: 10.1038/s41597-022-01696-6
[24]	Q. Zhang, Y. Sun, Real-time pricing model of smart grid under dual-carbon target based on social welfare maximisation, Econ. Comput. Econ. Cybern. Stud. Res., 58 (2024), 299–313. https://doi.org/10.24818/18423264/58.2.24.18 doi: 10.24818/18423264/58.2.24.18
[25]	N. Harder, R. Qussous, A. Weidlich, Fit for purpose: modeling wholesale electricity markets realistically with multi-agent deep reinforcement learning, Energy AI, 14 (2023), 100295. https://doi.org/10.1016/j.egyai.2023.100295 doi: 10.1016/j.egyai.2023.100295
[26]	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Advances in Neural Information Processing Systems, 30 (2017).
[27]	R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, I. Mordatch, Multi-agent actor-critic for mixed cooperative-competitive environments, preprint, arXiv: 1706.02275. https://doi.org/10.48550/arXiv.1706.02275

Reader Comments

Your name:*

Email:*
© 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)