Federated double DQN based multi-energy microgrid energy management strategy considering carbon emissions

2023-12-28 03:31YanhongYangTengfeiMaHaitaoLiYiranLiuChenghongTangWeiPei
Global Energy Interconnection 2023年6期

Yanhong Yang ,Tengfei Ma ,Haitao Li ,Yiran Liu ,Chenghong Tang ,Wei Pei

1.Institute of Electrical Engineering,Chinese Academy of Sciences,Beijing 100190,P.R.China

2.Faculty of Information Technology,Beijing University of Technology,Beijing 100124,P.R.China

3.State Grid Electric Power Research Institute,Nanjing,211100,P.R.China

Abstract: Multi-energy microgrids (MEMG)play an important role in promoting carbon neutrality and achieving sustainable development.This study investigates an effective energy management strategy (EMS)for MEMG.First,an energy management system model that allows for intra-microgrid energy conversion is developed,and the corresponding Markov decision process (MDP)problem is formulated.Subsequently,an improved double deep Q network (iDDQN)algorithm is proposed to enhance the exploration ability by modifying the calculation of the Q value,and a prioritized experience replay (PER)is introduced into the iDDQN to improve the training speed and effectiveness.Finally,taking advantage of the federated learning (FL)and iDDQN algorithms,a federated iDDQN is proposed to design an MEMG energy management strategy to enable each microgrid to share its experiences in the form of local neural network (NN)parameters with the federation layer,thus ensuring the privacy and security of data.The simulation results validate the superior performance of the proposed energy management strategy in minimizing the economic costs of the MEMG while reducing CO2 emissions and protecting data privacy.

Keywords: Multi-energy microgrid;Federated learning;Improved double DQN;Energy conversion

0 Introduction

Currently,energy crisis and environmental pollution have become the main bottlenecks restricting the sustainable development of human society.To accelerate the construction of an ecologically aware civilization,carbon peak and carbon neutrality strategies have been formulated to overcome resource and environmental constraints [1].To achieve this goal,higher requirements have been proposed for the installed scale of new energy and the utilization efficiency of distributed energy.Microgrids (MGs),which support a large proportion of renewable energy access,have received widespread attention.As small-scale energy systems composed of distributed energy and loads,microgrids are considered ideal platforms for accepting the high penetration rates of renewable energy,which can meet the diverse energy needs of the future.In recent years,multi-energy microgrids have achieved the optimization and complementation of multiple energy sources,such as power,gas,heat,and energy exchange between microgrids,which further improves the reliability of the power supply and the penetration of new energy resources,can play an important role in promoting carbon neutrality and achieving sustainable development [2].

In a multi-energy microgrid system,various complex factors,such as data scale,uncertainty in user consumption behavior,and randomness of weather conditions,pose significant challenges to its operation and management.Therefore,it is crucial to develop effective energymanagement technologies.Generally,microgrid energy management can be divided into centralized and distributed control methods.The former is based on a centralized energy management center that can provide relevant information to all microgrids.However,owing to the increased awareness regarding privacy protection,it is difficult for energy management centers to obtain all the information about microgrids.Thus,distributed microgrid energy-management technology has been extensively studied.The authors in [3] proposed the concept of a multi-microgrid control system and implemented the distributed control of each microgrid using a multi-agent approach.In [4] a multiagent multimicrogrid system was proposed in which each microgrid collected data locally and optimized independently.These distributed control microgrids are solved using modelbased optimization or heuristic algorithms.Among them,optimization algorithms have high computational efficiency but are prone to falling into local optima when dealing with nonlinear,nonconvex,or discontinuous problems.Heuristic algorithms can obtain the corresponding optimal solution under the given conditions.However,the computation time is long,and the generalization learning ability of the algorithm is inadequate.

In recent years,with the development of artificial intelligence (AI)technology,reinforcement learning (RL)-based methods have been proposed to address this problem.This algorithm is based on the physical properties of the interaction between the microgrid and its operating environment,and is widely used to solve microgrid energy management problems.A microgrid energy management method based on multiagent deep reinforcement learning,in which distributed interactions between agents can be considered,is proposed to improve the self-sufficiency of microgrid energy management.The authors in [5] adopt an improved deep deterministic strategy gradient (DDPG)algorithm to optimize the power generation efficiency of the presented distributed clean energy power generation system model.A reinforcement learning framework based on flexible actor-critic (AC)model was employed to solve the problem of multi-energy flow collaborative optimization in integrated energy systems in [6].The authors in [7] adopted a deep Q network (DQN)to establish an optimized management model for microgrid systems,which manages the energy production,conversion,and storage operations of electric,thermal,and cold energy systems and achieves economic scheduling of microgrids.

In the studies on microgrid energy management based on deep reinforcement learning mentioned above,to train agent models with high generalization,it is necessary for each microgrid to provide a large amount of local data for model training.However,each microgrid usually belongs to a different entity that involves data privacy protection and transmission security issues.Therefore,applying federated learning with data-privacy protection characteristics to microgrid energy management can provide a feasible technical approach for solving this problem.For example,the authors in [8] proposed federated deep reinforcement learning (FDRL)based on the actor-critic model for microgrid energy management while protecting data privacy.To decouple the dynamic behavior between microgrids that traditional horizontal federated learning cannot achieve,the authors in [9] proposed a vertically federated actor-critic algorithm that solves the problem of data sharing between different microgrids and enhances their elasticity.A blockchain-based federated learning algorithm that could predict the energy production and load demand in microgrids and effectively reduce their operating costs was proposed in [10].

Existing research on microgrid energy management based on federated deep reinforcement learning focuses mainly on a single aspect of power,without considering the complex energy conversion issues between different microgrids in multi-energy microgrids.Accordingly,this study builds on a MDP model for energy management in an integrated microgrid system that includes wind,solar,power,gas,and heat energy.An energy management strategy (EMS)with energy conversion was designed using an improved federated double DQN learning algorithm.The proposed EMS can effectively improve the economic profits of microgrid operations while ensuring policy security and data privacy.The main contributions of this study are as follows.

1)We developed an MEMG system and model to facilitate microgrid energy management with energy conversion.A discrete MDP-based energy management model was built to achieve the objective of minimize the operating cost of the MEMG.Economic profits and data privacy preservation can be improved using the EMS model.

2)Furthermore,we proposed an improved double DQN algorithm to handle the formulated MDP problem,which was leveraged to modify the calculation of the Q value to enhance the exploration ability.The prioritized experience replay was introduced into the iDDQN to improve the training speed,which provided a potential solution to solve the decision-making problem in microgrid energy management.

3)To take advantage of federated learning and the iDDQN algorithm,we designed a microgrid energy management strategy based on the federated iDDQN.To the best of our knowledge,this is the first time that this methodology has been considered in an FDRL-based microgrid energy management.

The remainder of this paper is organized as follows.Section 1 describes the microgrid energy management model,which aims at economic profits and optimizes the energy management strategy.In Section 2,the improved DDQN algorithm for the MDP model is presented and a federated iDDQN-based microgrid energy management strategy is proposed.In Section 3,we evaluate the performance of the proposed microgrid energy management strategy using simulations.Finally,the conclusions are presented in Section 4.

1 MEMG energy management model

1.1 System model

Consider the multi-energy microgrid system shown in Fig.1,in which each microgrid can trade power with the main grid using an energy-trading center (ETC).Each microgrid can actively sell power when its power generation exceeds the demand,or purchase power from the ETC when power generation is insufficient.The MEMG aims at economic profits,and the energy management platform optimizes the EMS based on the power generated by the distributed generation components and the power consumed by the power loads in each microgrid to obtain better profits and reduce carbon dioxide emissions.Each microgrid in the proposed MEMG system consists of distributed generation,energy storage,power load,and heat load components,and energy conversion devices.The models for each component are as described next.

Distributed generation components refer to a variety of technologies that generate electricity at or near where it will be used,such as wind and solar energy.The application of these clean energy sources has caused large changes in the energy market,and the proportion of traditional fossil fuels is gradually decreasing.The microgrid system constructed in this study includes models of wind turbines and photovoltaic power stations that generate different amounts of power based on the weather conditions.In contrast to the theoretical power generation model,the actual power generation data [11] from a Finnish wind power plant and the power generation data [12] from a photovoltaic power plant in Austin,Texas,USA,were used as outputs of the distributed generation modules.

Fig.1 MEMG system

The energy storage component is used to store the energy generated by the distributed generation components,power load,and heat load components during the process of microgrid energy management.The microgrid system described in this study includes battery energy storage component (BESCs)and hydrogen energy storage component (HESCs).The charging and discharging behavior of the battery energy storage component is directly controlled by the microgrid,which exchanges energy with the distributed generation and power load components.The HESC,also known as the hydrogen storage tank (HS),is primarily used to store hydrogen gas.As a clean energy source,hydrogen has the versatility of power generation and heating and has great potential for application in microgrids.The hydrogen charging and discharging behaviors of HESC are directly controlled by the microgrid,and energy trading is conducted using distributed generation,power load,and heat load components.

The dynamic storage capacities of the BESC and HESC at timetare modeled as described in [13],[14]:

where the subscripts (e,h2)denote the BESC and HESC,respectively.βe,Candβe,Ddenote the BESC charging and discharging efficiency coefficients,respectively.βh2,Candβh2,Dare the inflow and outflow efficiency coefficients in the HESC,respectively.is the charging or discharging power,andis the inflow and outflow hydrogen rates of the HESC.Similarly,Let the inequalitybe the charge and discharge power range of the BESC and the hydrogen flow range of the HESC,and letbe the battery capacity range and capacity range of the HESC.The charge state of the BESC and remaining gas volume of the HESC at timetcan be expressed as

With technological development,various new types of power loads are constantly emerging,including directly controllable loads (DCL),thermostatically controlled loads (TCL),price-responsive loads (PRL),and electric vehicle loads.This study mainly modeled the TCL and PRL [15].TCL mainly include loads that maintain the temperature within an acceptable range,such as air conditioning systems,water heaters,and refrigerators,and the action of TCLjat timetis defined as

wheredenotes the current ambient temperature.anddenote the maximum and minimum temperatures,respectively.is the action of the TCL,which is set to two gears in this study.The electrical load of TCLjat timetcan be modeled as

wherePTCLindicates the nominal power of the TCL.

Price-responsive loads are the load in a microgrid that cannot be directly controlled and is affected by electricity prices.The power load of PRLmat timetcan be modeled as

Next,the microgrid system is equipped with an energy conversion device (ECD)to convert different energies.Three types of ECDs,namely,electrolytic cells,fuel cells,and gas boilers,are configured for energy conversion in the microgrids.Electrolytic cells can convert electrical energy into hydrogen energy,fuel cells can convert hydrogen energy into electrical and heat energy,and gas boilers can convert hydrogen energy or natural gas into heat energy.Natural gas is a backup fuel that satisfies the required heat demand.The energy conversion function is defined to represent the conversion mapping rules of energy from inflow to outflow in the ECD [16] and is used to model the three types of ECD as follows:

where Eq.(6)represents the energy conversion function of the fuel cell,are the hydrogen inflow,electrical outflow,and heat outflow of the fuel cell at timet,respectively.andare the electrical and heat conversion coefficients of the fuel cell,respectively.Eq.(7)is the energy-conversion function of the water electrolyzer,whereandβWEdenote the electrical inflow,hydrogen outflow,and conversion coefficient of the water electrolyzer,respectively.Eq.(8)represents the conversion function of a gas boiler with hydrogen or natural gas inputs,wheredenote the hydrogen,and natural gas inflows and heat outflows of the gas boiler at timet,respectively.andare the hydrogen and natural gas conversion coefficients of the gas boiler,respectively.

To ensure normal operation of the energy network of the microgrid,it is necessary to ensure a balance between energy generation and consumption during the ∆t∈ [t,t+1] period.Therefore,the balance constraints of the electrical,heat,and hydrogen energies of the microgrid are defined as:

whereis the electricity traded between the MEMG and ETC,is the power generated by the wind power modules,is the power generated by the photovoltaic modules,is the total power of the electrical load,and∆tis the total heat demand at ∆t.

1.2 Energy management model

Considering that the MEMG system is composed ofNinterconnected microgrids,each microgrid is connected to the ETC to allow power trading,and its energy management and optimization can be regarded as a Markov decision process that regards each microgrid as an agent to observe the state of the microgridstand select the actionat.Ifatexecutes electricity trading,the microgrid sends the transaction information to the energy management platform.The platform collects the information,uses transaction rules to design electricity trading strategies,and sends transaction decisions to each microgrid to control the operation of various components,thereby ensuring maximum economic profits in the operation of microgrids.Specifically,the tuples of the MDP model for energy management in MEMGs are defined as follows:

(1)State space:The state of each microgrid is

whereis the power set ofNTCLs;is the power generated by the wind power module;is the power generated by the photovoltaic module;is the price of electricity sold to the ETC;andis the price of electricity purchased from the ETC.

(2)Action space:The microgrid action sets include four priority actions corresponding to the TCL,and five price level actions corresponding to the price response load.There are three actions when there is an excess of electricity and three when there is a shortage of electricity.When there is a heat shortage,there are two actions.Note that the microgrid action set isat={at[0],at[1],at[2],at[3],at[4]}.Specifically,the values ofat[0] can be 0,1,2,and 3,which respectively represent the actions of the four gears of TCLs.The value ofat[1] can be 0,1,2,3,4,which indicates the action when the electricity price of PRLs is set to -2,-1,0,1,and 2,respectively.The value ofat[2] can be 0,1,or 2,which represents the action of supplying power to the ETC,BESC,and HESC,respectively,when the microgrid has excess power.The values ofat[3] can be 0,1,and 2,which represent the actions of purchasing electricity from the ETC,BESC,and HESC,respectively,when the microgrid is short of power.The value ofat[4] can be 0 and 1,which indicates the action of purchasing hydrogen from HESC and purchasing natural gas from external network,respectively,in case of heat shortage in the microgrids.

(3)Reward:The rewardRtotal t,is the sum of the reward values for all microgrids;that is,

whereRi t,is the reward obtained by microgridiat timet,and is defined as the economic gain obtained by

whereRi inc,is the gain obtained by microgridiafter executing the relevant action and is calculated aswhereρloadis the price of price-responsive load.It is calculated by the equationwhereρmarketrepresents the market price andσis the parameter that adjustsNloadsis the number of price-responsive loads,ρTCLrepresents the price of electricity for the TCLs,NTCLsis the number of TCLs,is the electricity sold by microgridito the ETC.

Ri cost,is the cost required by microgridito execute the relevant action,and is given by

whereρcost W,andρcost PV,denote the cost price of electricity generated from the wind and photovoltaic power generation components,respectively;is the regulated price of electricity sold to the ETC;represents the transmission cost of electricity sold;represents the regulated price of electricity bought from the ETC;is the electricity purchased by microgridifrom the ETC;is the price of natural gas;anddenotes the amount of natural gas purchased by microgridifrom the external network at timet.The environmental costis the economic loss of microgridiowing to CO2emissions from the burning of natural gas and the purchase of electricity from the ETC and is denoted as

where the carbon intensity ∂gasand ∂edenote the CO2emission rates associated with natural gas flaring and purchased net electricity,respectively,and the carbon tax priceρCO2translates the carbon emissions into an economic penalty.

2 Federated iDDQN based energy management strategy

Deep reinforcement learning algorithms can be used to design the energy management strategies for the constructed MDP model of the MEMG.Each microgrid typically conducts independent local training to protect data privacy.However,the data diversity of a single microgrid is limited,and the agent is prone to falling into local optima during training.To improve the generalization of the microgrid agents,we introduced federated learning into multienergy microgrids to design a federated DDQN energy management strategy.The following section provides a specific introduction to the design of this strategy.

2.1 The improved DDQN algorithm

The classical reinforcement learning algorithm for solving the MDP problem is the DQN algorithm proposed by combining reinforcement learning and deep learning [17].It uses neural networks to estimate Q values and replaces the Q table in the Q-learning algorithm;it can be applied to scenarios with large state-action sets.The DQN network consists of an evaluated Q-network and a target Q-network.The evaluated Q-network generates the current Q valueQ s a(,;)θbased on the current state,and the target Q value is calculated fromThe mean squared error between the Q values of the target and current network is defined as the loss function:

whereγdenotes the discount rate;s a,are the state and action,respectively,at the current time;s a′ ′,are the state and action,respectively at the next time instant;andθandθ′ are the neural network parameters of the evaluated and target networks,respectively.In the DQN learning process,data samples are extracted from the experience pool and fed to the current and target networks.Once the target and current Q values are obtained,the network weights are adjusted to reduce the loss function using the gradient descent method.

The maximization operation in the DQN algorithm for calculating the current Q value can make the estimated value function larger than the true value of the value function,thereby creating a nonuniform overestimate that can affect the final decision.The double DQN proposed in [18] no longer finds the maximum Q value of each action directly in the target Q-network,but first finds the action corresponding to the maximum Q value in the current Q-network,and then uses the selected action to calculate the Q value of the target network,which iswhere the Q function with weightθ′ is used to select the action,the Q function with weightθis used to evaluate the action,andargmax aQ(s′,a;θ)denotes the maximum Q value calculated from all action selections for states′ at the next time instant.

Compared with the DQN algorithm,the DDQN algorithm only changes the method for calculation of the target value,which improves the performance of the DQN algorithm to a certain extent.However,problems remain,such as slow convergence and poor training effects,due to the poor exploration ability of the algorithm.In this study,an improved DDQN algorithm is proposed to enhance the exploration ability of the conventional DDQN by modifying the procedure for calculation of the Q value and reducing the number of repeated states.We define equation for updating the Q-value as:

where the coefficientµ∈(0,1)is the probability of selecting actionaat states,and isp r(s,a)=1/M,Mis the number of times the agent has already selected actionaat states.From (18),it can be seen that the more times actionais selected at states,that is,larger theM,smaller thep r(s,a).Thus,the value ofQ s a(,)decreases,and the DDQN algorithm will be more inclined to select other actions in the action space to explore new states and subsequently increase the convergence speed and also avoid convergence to a local optimum.

On the other hand,the conventional DDQN uses equal probability sampling from a replay memory,which ignores the significant difference between different experience samples and suffers from the problems of poor utilization of important experiences,leading to slow convergence of the algorithm.To make better use of the empirical samples,the prioritized experience replay in [19] is introduced into the improved DDQN algorithm.Different priorities are assigned to each sample based on the temporal difference error (TDerror)of the empirical samples,and empirical samples with large absolute values of temporal differences are frequently sampled to improve training speed and effectiveness.In this case,the TD error for evaluating the priority of a transition is defined as the absolute value of the difference between the target and the current Q values:

The sampling probability of samplejis defined aswhereis the sample priority and the exponentϕdetermines how much prioritization is used.This study adopts a variant of proportional priority,namelywhereδ jis the TD error andεis a very small positive constant.

2.2 The federated iDDQN strategy

Federated learning is a distributed machine-learning technique for solving the problem of data silos while protecting data security.It is based on the principle that multiple participants first train their model using local private data,and then periodically upload their local models to a server to build a global model,and then broadcast the updated global model to each participant [20].In the proposed federated DDQN algorithm,which combines federated learning with an improved DDQN,federated learning is responsible for processing the DDQN to determine the optimal parameters that minimize the global loss function.Owing to the distributed architecture and inaccessibility of the data in the federated learning framework,the global loss function cannot be calculated directly and must be expressed as a weighted average of the local loss functions:

whereLis the global loss function,D is the dataset,denotes the percentage contribution of the neural network of thei-th agent to the global network,is the local loss function of agentiandThe more the data samples collected from each participant,the more it helps in training the global model.

To reduce the communication cost caused by the transmission of model parameters,federated learning is aggregated after a few time steps of gradient descent for each agent,and the process is repeated until the model reaches the required accuracy.The local participants periodically communicate with the central server for model aggregation within time periodepsd,and the server broadcasts the global model to the local participants to update their model based on gradient descent,that is,

The MEMG energy-management strategy designed based on the federated iDDQN algorithm is described in Algorithm 1,and its workflow is as follows:

Step 1:Create the initial model and send it to each MEMG.

Step 2:Each MEMG trains its local model using private data and the improved DDQN algorithm.

Step 3:The MEMG uploads the model parameters to the server.

Step 4:The server aggregates all the MEMG models to obtain the global modelωG.

Step 5:The updated global model is broadcast to the MEMG.

Steps 2–5 are repeated until the maximum number of iterations for federated learning is reached.The federated iDDQN model aggregation method can be implemented using the federated averaging (FedAvg)algorithm,which averages the model weights at each MEMG to form a global model that is simple to implement.

3 Simulation results

In our simulation,we assumed that three MEMGs were configured in the multi-microgrid system shown in Fig 1 The relevant simulation parameters are listed in Table 1.Each MEMG uses the wind power,photovoltaic,and electrical load data provided in [11],[12].The parameters of the energy-conversion devices in the MEMG are listed in Table 2.We set the cost of wind power generation t oρcost,W=32€/MWh,the cost of photovoltaic power generation isρcost PV,=42€/MWh,the market electricity price isρmarket=5.48€/MWh,the natural gas price is=0.13€/MWh,and carbon tax isρCO2=0.05€/kg.The convolutional neural network was trained with a 10-day data sample.Each training round corresponded to one day,and the time interval was one hour.The simulation software used was Python 3.6.1 and TensorFlow 1.8.0.

Table 1 Parameters of MEMGs

Table 2 Parameters of ECD

Based on the constructed microgrid simulation environment,the performances of the different EMS were compared using the total reward value during the agent training process.First,the reward values of the federated DDQN (F-DDQN)and federated improved DDQN algorithms were compared,as shown in Fig.2.For the visual display,the reward values are normalized in the figure.It can be seen that the MEMG energy management strategy using F-iDDQN obtains better reward values than the F-DDQN algorithm.

Fig.2 Comparison of rewards for F-DDQN and F-iDDQN

Figure 3 shows the daily economic profits of the MEMG for ten consecutive days.It can be observed that the energy management strategy based on the F-iDDQN achieves better daily profits than the F-DDQN-based EMS.

Figure 4 shows the daily CO2emissions of the MEMG system for 10 consecutive days.It can be observed that the energy management strategy based on the F-iDDQN achieved lower CO2emissions than the F-DDQN-based EMS.

Fig.3 Comparison of daily profit

Fig.4 Comparison of CO2 emission

Next,we consider MEMG2 as an example to illustrate the operational status of the various components of the MEMG regulated by the proposed federated iDDQN strategy.Figure 5 shows how the electricity demand of MEMG2 is satisfied within one day,where ED,WG,PV,BS,ET,and WE represent the electricity demand,wind power generation,photovoltaic power generation,battery storage,electricity trade,and water electrolyzer,respectively.It can be seen that MEMG2 generates more power within 24 hours,and the generated power meets the demand for electricity and there is remaining electricity at 0-15 hours.During the 0-3rdhours,the BESC performs charging.At the 3rdto 5thhours,the water electrolyzer converts electricity into hydrogen,and the HESC is charged with hydrogen.Because the remaining electricity generated is higher than the maximum capacity of the water electrolyzer,the MEMG2 sells the remaining electricity to the ETC after hydrogen charging.In the 5thto 15thhour,MEMG2 chooses to sell electricity directly to the ETC to obtain maximum profits.From the 15thto 23rdhours,the generated power cannot meet the electricity demand;therefore,the BESC first performs the discharge operation.If this is insufficient remains unsatisfactory,MEMG2 purchases electricity from the ETC.

To reduce CO2emissions,the heat demand is jointly satisfied by natural gas and hydrogen.Figure 6 shows how the heat demand of MEMG2 is satisfied within one day.Figure 7 shows the hourly CO2emissions from these sources of the MEMG2.

Fig.5 Electricity demand and electricity sources

Fig.6 Heat demand and heat sources

Fig.7 Hourly CO2 emission

We further analyzed the impact of different carbon tax prices on the profits and CO2emissions of the multi-microgrid system.As shown in Table 3,when the carbon tax increases,the profits of the MEMG system decrease owing to increased environmental costs.When the carbon tax increase from 0 to 0.025 €/kg,CO2emissions remain stable.When the carbon tax increases from 0.025 €/kg to 0.1 €/kg,CO2emissions gradually decrease.This is because,with the rise in carbon tax,the MEMG adopts a more environmentally friendly energy management strategy to reduce CO2emissions by using hydrogen as a clean energy source.

Table 3 Profit and CO2 emissions under different carbon tax rates

4 Conclusion

An energy management strategy for a multi-energy microgrid system,which includes multiple types of energy sources such as wind,solar,electricity,and gas,and can convert energy internally,was investigated in this study.First,we formulated an MDP model of an EMS with energy conversion.Subsequently,an iDDQN learning algorithm was proposed by modifying the calculation of the Q value and introducing a prioritized experience replay.Finally,a federated iDDQN-based MEMG energy management strategy that aggregates heterogeneous local models in the form of federated learning was designed to improve the capability of each microgrid to learn optimal actions.The simulation results show that the proposed energy management strategy can obtain higher rewards and better convergence while protecting the data privacy of multi-energy microgrids,increasing the economic profit of MEMG,and encouraging energy conversion in the microgrid to reduce CO2emissions for environmental protection purposes.In the future,we will investigate the performance of the proposed approach for MEMG with energy trading between microgrids.

Acknowledgement

This work was supported by the Research and Development of Key Technologies of the Regional Energy Internet based on Multi-Energy Complementary and Collaborative Optimization (BE2020081).

Declaration of Competing Interest

The authors have no conflicts of interest to declare.