Research article Special Issues

Maximize Producer Rewards in Distributed Windmill Environments: A Q-Learning Approach

  • Received: 29 September 2014 Accepted: 03 February 2015 Published: 02 March 2015
  • In Smart Grid environments, homes equipped with windmills are encouraged to generate energy and sell it back to utilities. Time of Use pricing and the introduction of storage devices would greatly influence a user in deciding when to sell back energy and how much to sell. Therefore, a study of sequential decision making algorithms that can optimize the total pay off for the user is necessary. In this paper, reinforcement learning is used to tackle this optimization problem. The problem of determining when to sell back energy is formulated as a Markov decision process and the model is learned adaptively using Q-learning. Experiments are done with varying sizes of storage capacities and under periodic energy generation rates of different levels of fluctuations. The results show a notable increase in discounted total rewards from selling back energy with the proposed approach.

    Citation: Bei Li, Siddharth Gangadhar, Pramode Verma, Samuel Cheng. Maximize Producer Rewards in Distributed Windmill Environments: A Q-Learning Approach[J]. AIMS Energy, 2015, 3(1): 162-172. doi: 10.3934/energy.2015.1.162

    Related Papers:

  • In Smart Grid environments, homes equipped with windmills are encouraged to generate energy and sell it back to utilities. Time of Use pricing and the introduction of storage devices would greatly influence a user in deciding when to sell back energy and how much to sell. Therefore, a study of sequential decision making algorithms that can optimize the total pay off for the user is necessary. In this paper, reinforcement learning is used to tackle this optimization problem. The problem of determining when to sell back energy is formulated as a Markov decision process and the model is learned adaptively using Q-learning. Experiments are done with varying sizes of storage capacities and under periodic energy generation rates of different levels of fluctuations. The results show a notable increase in discounted total rewards from selling back energy with the proposed approach.


    加载中
    [1] The Smart Grid: An Introduction. Technical report, Office of Electricity Delivery and Energy Reliability, Department of Energy, 2008.
    [2] Understanding the Benefits of the Smart Grid. Technical report, DOE/NETL-2010/1413, NETL Lab, Department of Energy, 2010.
    [3] Methodological Approach for Estimating the Benefits and Costs of Smart Grid Demonstration Projects. Technical report, 1020342, Electric Power Research Institute, 2010.
    [4] Borenstein S, Jaske M, Rosenfeld A (2002) Dynamic pricing, advanced metering, and demand response in electricity markets. Available from: https://escholarship.org/uc/item/11w8d6m4.
    [5] King CS (2001) The economics of real-time and time-of-use pricing for residential consumers. Technical report, Technical report, American Energy Institute.
    [6] SMART GRID POLICY. Technical report, Docket No. PL09-4-000, United States of America Federal Energy Regulatory Commission, 2009.
    [7] Communication Networks and Systems for Power Utility Automation—Part 7-420: Basic Communication Structure—Distributed Energy Resources Logical Nodes. Technical report, IEC 61850-7-420, International Electrotechnical Commission, 2009.
    [8] Distributed Generation and Renewable Energy Current Programs for Businesses. Available from: http://docs.cpuc.ca.gov/published/news release/7408.htm.
    [9] Understanding Net Metering. . Available from: http://www.solarcity.com/learn/understanding-netmetering.aspx.
    [10] Ketter W, Collins J, Block CA (2010) Smart grid economics: Policy guidance through competitive simulation. ERIM report series research in management Erasmus Research Institute of Management. Erasmus Research Institute of Management (ERIM). Available from: http://hdl.handle.net/1765/21307.
    [11] Nanduri V, Das TK (2007) A reinforcement learning model to assess market power under auction-based energy pricing. IEEE T Power Syst 22: 85-95. doi: 10.1109/TPWRS.2006.888977
    [12] Krause T, Beck EV, Cherkaoui R, et al. (2006) A comparison of Nash equilibria analysis and agent-based modelling for power markets. Int J Elec Power 28: 599-607. doi: 10.1016/j.ijepes.2006.03.002
    [13] Frezzi P, Garcés F, Haubrich HJ (2007) Analysis of Short-term Bidding Strategies in Power Markets. Power Tech, 2007 IEEE Lausanne 971-976.
    [14] Tellidou AC, Bakirtzis AG (2006) Multi-agent reinforcement learning for strategic bidding in power markets. Intelligent Systems, 2006 3rd International IEEE Conference on, 408-413.
    [15] Watanabe I, Okada K, Tokoro K, et al. (2002) Adaptive multiagent model of electric power market with congestion management. Evolutionary Computation, 2002. CEC'02. Proceedings of the 2002 Congress on, 523-528.
    [16] Bompard EF, Abrate G, Napoli R, et al. (2007) Multi-agent models for consumer choice and retailer strategies in the competitive electricity market. Int J Emerging Electr Pow Syst 8: 4.
    [17] Vytelingum P, Voice TD, Ramchurn SD, et al. (2010) Agent-based micro-storage management for the smart grid. Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems 1: 39-46.
    [18] Li B, Gangadhar S, Cheng S et al. (2011) Predicting user comfort level using machine learning for Smart Grid environments. Innovative Smart Grid Technologies (ISGT), 2011 IEEE PES 1-6.
    [19] Reddy PP, Veloso MM (2011) Strategy Learning for Autonomous Agents in Smart Grid Markets. Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence (IJCAI), 1446-1451.
    [20] Reddy PP, Veloso MM (2011) Learned Behaviors of Multiple Autonomous Agents in Smart Grid Markets. Proceedings of the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), 1396-1401.
    [21] Goldin J (2007) Making Decisions about the Future: The Discounted-Utility Model. Mind Matters: Wesleyan J Psychology 2: 49-55.
    [22] Watkins C. Learning from Delayed Rewards. PhD thesis, University of Cambridge,England, 1989.
    [23] Watkins C, Dayan P (1992) Technical Note: Q-Learning. Mach Learn 8: 279-292.
    [24] Puterman ML (1990) Markov decision processes. Handbooks in Operations Research and Management Science 2: 331-434. doi: 10.1016/S0927-0507(05)80172-0
  • Reader Comments
  • © 2015 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(5004) PDF downloads(1103) Cited by(7)

Article outline

Figures and Tables

Figures(4)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog