强化学习


2023-12-07 更新

Deep Reinforcement Learning for Community Battery Scheduling under Uncertainties of Load, PV Generation, and Energy Prices

Authors:Jiarong Fan, Hao Wang

In response to the growing uptake of distributed energy resources (DERs), community batteries have emerged as a promising solution to support renewable energy integration, reduce peak load, and enhance grid reliability. This paper presents a deep reinforcement learning (RL) strategy, centered around the soft actor-critic (SAC) algorithm, to schedule a community battery system in the presence of uncertainties, such as solar photovoltaic (PV) generation, local demand, and real-time energy prices. We position the community battery to play a versatile role, in integrating local PV energy, reducing peak load, and exploiting energy price fluctuations for arbitrage, thereby minimizing the system cost. To improve exploration and convergence during RL training, we utilize the noisy network technique. This paper conducts a comparative study of different RL algorithms, including proximal policy optimization (PPO) and deep deterministic policy gradient (DDPG) algorithms, to evaluate their effectiveness in the community battery scheduling problem. The results demonstrate the potential of RL in addressing community battery scheduling challenges and show that the SAC algorithm achieves the best performance compared to RL and optimization benchmarks.
PDF The 7th IEEE Conference on Energy Internet and Energy System Integration (EI2 2023)

点此查看论文截图

Using Curiosity for an Even Representation of Tasks in Continual Offline Reinforcement Learning

Authors:Pankayaraj Pathmanathan, Natalia Díaz-Rodríguez, Javier Del Ser

In this work, we investigate the means of using curiosity on replay buffers to improve offline multi-task continual reinforcement learning when tasks, which are defined by the non-stationarity in the environment, are non labeled and not evenly exposed to the learner in time. In particular, we investigate the use of curiosity both as a tool for task boundary detection and as a priority metric when it comes to retaining old transition tuples, which we respectively use to propose two different buffers. Firstly, we propose a Hybrid Reservoir Buffer with Task Separation (HRBTS), where curiosity is used to detect task boundaries that are not known due to the task agnostic nature of the problem. Secondly, by using curiosity as a priority metric when it comes to retaining old transition tuples, a Hybrid Curious Buffer (HCB) is proposed. We ultimately show that these buffers, in conjunction with regular reinforcement learning algorithms, can be used to alleviate the catastrophic forgetting issue suffered by the state of the art on replay buffers when the agent’s exposure to tasks is not equal along time. We evaluate catastrophic forgetting and the efficiency of our proposed buffers against the latest works such as the Hybrid Reservoir Buffer (HRB) and the Multi-Time Scale Replay Buffer (MTR) in three different continual reinforcement learning settings. Experiments were done on classical control tasks and Metaworld environment. Experiments show that our proposed replay buffers display better immunity to catastrophic forgetting compared to existing works in most of the settings.
PDF

点此查看论文截图

MACCA: Offline Multi-agent Reinforcement Learning with Causal Credit Assignment

Authors:Ziyan Wang, Yali Du, Yudi Zhang, Meng Fang, Biwei Huang

Offline Multi-agent Reinforcement Learning (MARL) is valuable in scenarios where online interaction is impractical or risky. While independent learning in MARL offers flexibility and scalability, accurately assigning credit to individual agents in offline settings poses challenges due to partial observability and emergent behavior. Directly transferring the online credit assignment method to offline settings results in suboptimal outcomes due to the absence of real-time feedback and intricate agent interactions. Our approach, MACCA, characterizing the generative process as a Dynamic Bayesian Network, captures relationships between environmental variables, states, actions, and rewards. Estimating this model on offline data, MACCA can learn each agent’s contribution by analyzing the causal relationship of their individual rewards, ensuring accurate and interpretable credit assignment. Additionally, the modularity of our approach allows it to seamlessly integrate with various offline MARL methods. Theoretically, we proved that under the setting of the offline dataset, the underlying causal structure and the function for generating the individual rewards of agents are identifiable, which laid the foundation for the correctness of our modeling. Experimentally, we tested MACCA in two environments, including discrete and continuous action settings. The results show that MACCA outperforms SOTA methods and improves performance upon their backbones.
PDF 16 pages, 4 figures

点此查看论文截图

文章作者: 木子已
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 木子已 !
  目录