1.
Introduction
The promotion of clean energy is a key component of energy policy aimed at facilitating the global transition towards decarbonisation. The transition to low-carbon and climate-resilient economies requires the integration of renewable energy sources into the power generation mix. One of the most discussed issues is the impact of the high penetration of renewables on electricity prices, highlighting the need for comprehensive analysis and policy adaptation.
On the one hand, due to their low opportunity costs, renewable generators can bid at very low prices, or even at zero, and participate as price takers in the day-ahead auction market, which is based on a marginal pricing system. A sufficient number of low-price bids from renewable generators can shift the supply offer curve in such a way that the resulting auction price is set at a lower level. As a result, more renewable production is expected to lead to lower marginal prices. This is the so-called merit order effect of renewables, which refers to the reduction in day-ahead market prices due to the introduction of renewable generation sources into the electricity system, and it is well documented in the literature [Holttinen (2004) in the Nord Pool; Sensfuβ et al. (2008) and De Lagarde and Lantz (2018) in the German market; McConell et al. (2013) in the Australian market; and Sánchez de Miera et al. (2008) and Ballester and Furió (2015) in the Spanish market,among others1].
1 For a complete overview of past research on the merit order effect of renewables, see Würzburg et al. (2013), and more recently for the Iberian market, Carvalho and Pereira (2019).
On the other hand, the non-storability of electricity at a large scale, together with the intermittency of the main renewable energy sources (wind and solar PV), may result in higher market balancing requirements and costs. As electricity is a non-storable commodity, its delivery must be planned in advance, typically on the day ahead market2. Up to the time of delivery, adjustments are usually necessary to deal with unexpected deviations from the scheduled delivery. The wholesale electricity market typically consists of a series of interrelated markets: (i) the day-ahead market, (ii) the intraday markets for short-term adjustments, and finally (iii) balancing markets to handle the remaining deviations and other technical issues. The variability and limited predictability of renewable generation could increase the need for load balancing to ensure the electricity supply at the moment of delivery. Therefore, greater renewable production could result in a need for more balancing and therefore more balancing costs, which could ultimately drive up final wholesale electricity prices3.
2 The so-called spot market is actually a day-ahead market, where electricity is traded (at t) for delivery during the 24 hours of the following day (at t+1).
3 In this regard, Gianfreda et al. (2018) found a significant positive difference between real-time and day-ahead market prices, particularly for wind electricity in the Italian market.
While many previous studies have focused on the impact of renewables on day-ahead market prices, our analysis goes a step further by examining their potential influence on the additional costs incurred in the subsequent processes leading up to real-time electricity delivery. This comprehensive approach represents a departure from conventional research and addresses a significant and previously underexplored gap in current literature. By extending our investigation beyond the immediate pricing effects on the day-ahead market and delving into the intricacies of the additional costs associated with real-time electricity delivery, we shed light on a crucial yet under-researched aspect of renewable energy integration.
These additional price components, beyond day-ahead and intraday prices, mainly stem from intermediate processes aimed at ensuring continuous supply and system reliability. For the purposes of this study, the Iberian market is chosen as a paradigmatic example due to the high penetration of renewable energy in recent years, growing by 44% over the period 2017–20214.In order to carry out the analysis, a complete dataset with 67 predictor variables is generated. To manage such an exhaustive list of variables, we use machine learning techniques, which are a relatively recent addition to this branch of the literature and are preferable to other classical parametric models that impose assumptions that our series do not necessarily meet. As Breiman (2001a) pointed out, when dealing with highly complex realities such as medical, genetic, or financial datasets, it may be more appropriate to assume that the observed data are generated by a complex and unknowable mechanism rather than to impose one of the established classical parametric models, such as linear regression, logistic regression, or the Cox model. Therefore, efforts should be redirected from the search for a reasonably good classical model to the identification of an algorithm, such as neural networks or decision trees—collectively known as machine learning algorithms (MLAs)—capable of processing the observed data (input variables) to yield accurate predictions (response variables) through iteration and convergence. Although these algorithms lack the interpretability of classical models, they can provide greater accuracy and may be better suited to addressing a broader range of problems. Since their introduction in 2001, the use of MLAs has increased exponentially, and they have evolved to become not only more accurate but also more informative about how nature relates response variables to input variables, potentially revealing causal relationships. This latter aspect is very much in line with the objectives of our work. In this regard, Prasanna et al. (2019) summarised the advances of MLAs in agent-based modelling of energy markets. Tschora et al. (2022) and Schnürch and Wagner (2019) evaluated the use of MLAs to forecast spot electricity prices, Qays et al. (2020) applied the backpropagation neural network MLA to check the charge condition of photovoltaic battery hybrid systems, whereas Duras et al. (2023) evaluated the performance of various machine learning techniques when selecting relevant input variables.
4 The specific growth rates by generation technology were 150%, 61%, 31%, 26%, 14%, and −12%, respectively, for solar photovoltaic, hydraulics, the "other renewables" group including biogas, biomass, marine hydraulics, and geothermal, wind, hydraulic wind, and solar thermal.
One such approach, machine learning–based causal models, has recently gained traction across a wide range of fields to estimate conditional average treatment effects (CATE). Machine learning-based causal models have emerged as a robust methodological framework in academic research, providing a powerful means of analysing complex systems and discerning causal relationships. These models facilitate the identification of causal effects without being constrained by certain limitations inherent to conventional econometric models. Because of their flexible and adaptive nature, machine learning-based causal models are making a significant contribution to advancing our understanding of complex phenomena and increasing the depth of empirical analysis in various fields of study. In agriculture, Quigley et al. (2023) employed the causal random forest approach to explore the impact of persistent warming temperatures on the use of cover crops. Zhang et al. (2022) investigated the impact of speed cameras on road safety using generalised random forests, selected for their superior performance in simulation exercises compared with other causal methods such as outcome regression, propensity score, and doubly robust estimation. Li et al. (2024) applied a generalised doubly robust causal machine learning approach to study the effect of crashes on traffic speed, while in the labour market domain, Elamin (2023) used the random forest method to examine the effects of informal job search on wages and job satisfaction. Finally, Mizuguchi and Sawamura (2023) and Xu et al. (2024) focused on the areas of health and finance, respectively, employing random forest techniques for predictive purposes. All of these previous studies analysed causal relationships between variables. In this study, we use causal random forest regression and partial dependence plots to identify the main factors explaining Spanish final electricity prices.
The contributions of the paper are as follows. First, our research addresses a significant gap in the existing literature by examining electricity cost components beyond spot prices. This study focuses on the costs of balancing and other technical processes that have been largely overlooked in previous studies. Second, we employ a range of innovative techniques within this branch of literature for comparative purposes. This includes the use of machine learning-based causal models to estimate CATE, which provides insights into the impact of renewable generation on various cost components. Third, the methodology employed can be readily extrapolated to other electricity market areas. This adaptability allows for the study of the effect of renewable generation on balancing and adjustment costs across different wholesale market designs and regulatory environments, thereby enhancing the broader applicability of our findings.
The remainder of this paper is structured as follows. Section 2 provides a concise overview of the Spanish electricity system. Section 3 lists the dataset used. Section 4 presents the methodology adopted, including details on the data pre-processing and performance measures used to evaluate the algorithms. Section 5 provides and discusses the results. Finally, Section 6 outlines the main findings and concludes.
2.
The Iberian electricity market
Since 2007, the Spanish and Portuguese electricity systems have been integrated into a common market area, the Iberian Electricity Market. The wholesale electricity market5 consists mainly of (i) a day-ahead market, (ii) an intraday market, and (iii) other balancing processes.
5 The wholesale market represents 89% of the total energy generated in the Iberian market.
The day-ahead market is a daily uniform price auction in which participants submit their bids to purchase or sell electricity for each of the 24 hours of the following day. The resulting price for each specific hour is determined by the point at which the supply and demand curves meet, according to a marginal pricing system.
The intraday market allows participants to adjust the resulting day-ahead market schedule by utilising more up-to-date and accurate forecasts. This market is currently structured into an auction market and a continuous market. The intraday auction market consists of six consecutive auction sessions, each of which comprises several scheduled periods closer to the delivery date6. In contrast, the intraday continuous market is a continuous European cross-border market.
6 Details of the opening and closing times of each session of the intraday market can be found on the Iberian NEMO website (www.omie.es). Last accessed: March 2022
In the initial phase, the Nominated Electricity Market Operator (NEMO) obtains the resulting marginal price and assigned electricity from the auctions in both the day-ahead and intraday markets. This is done on the basis of purely economic criteria. It is then necessary to ensure the technical feasibility of the allocations of electricity to generators and retailers and/or end consumers, which initially originate from the day-ahead and intraday markets. Red Eléctrica de España (REE), as Transmission System Operator (TSO), is the entity responsible for validating them from a technical perspective through what is known as the management of the system's technical constraints, which involves resolving network congestion. In other words, the capacity of the network is analysed to determine whether it is sufficient to meet demand, given that the electricity should flow from the generation plants to the consumption points under conditions that are sufficiently reliable. Consequently, the outcomes of the day-ahead and intraday market auctions are provisional and subject to modification.
In addition to the management of technical constraints due to network capacity issues, there are other balancing or adjustment processes to ensure the operation of the system for which the TSO is responsible: (i) the market mechanism for additional upward reserve power, whose purpose is to provide the system with the estimated necessary level of upward reserve power; (ii) the secondary control band, designed to maintain the balance between generation and demand by correcting deviations in temporary action horizons ranging from 20 seconds to 15 minutes; (iii) the tertiary control, to resolve deviations between generation and consumption and to restore the secondary control band reserve used; and finally, (iv) the real-time deviation management processes7.
7 There are two other components of final electricity prices: capacity payments and the so-called interruptibility service. Capacity payments are paid to stand-by generators to act as a backup during periods of excess demand in order to prevent power outages. The interruptibility service, which ceased to be in force on June 30, 2020, was provided by some authorised large consumers by reducing their consumption (when requested by the TSO) to maintain the balance between generation and demand during periods when demand exceeds supply.
Consequently, final wholesale electricity prices include several costs other than the day-ahead market price (its main component) and the difference between it and the intraday market price (which may be positive or negative and is merely residual8). It is worth taking a closer look at these other costs included in final wholesale electricity prices in order to identify their determinants, focusing in particular on the impact of renewables, among an exhaustive list of potentially key variables.
8 That price difference is added to the day-ahead market price to capture the impact (positive or negative) of intraday trading when computing the final price.
3.
Data
The first dataset used is the price series of the components of the final wholesale prices, other than the day-ahead market price, at an hourly frequency, from January 2017 to December 20219. In particular, (i) the daily average of the hourly price series of the intraday market (IM) component, which captures the net effect of the six sessions of the intraday market on the final price; (ii) the daily average of the hourly net effect on the final price of the procedure to solve technical constraints (TTCC), which includes the costs incurred to manage technical constraints after the day-ahead market auction, after each of the intraday markets auctions, and in the real-time market; (iii) the daily average of the hourly costs resulting from ancillary services and deviation management (SO); (iv) the daily average of the hourly costs related to capacity payments (CP); and (v) the daily average of the hourly costs associated with the interruptibility service (IS).
9 Prices are all expressed in €/MWh and are publicly available on the website of the Spanish National Commission for Markets and Competition (www.cnmc.es).
In addition, we compute the daily average of the hourly series of bids (price and amount of power) individually submitted by market participants in order to buy or sell energy, distinguishing between matched and non-matched bids, both in the day-ahead and in the first session of the intraday market, since this session is the one in which most of the intraday market liquidity is concentrated10.
10 The entire supply and demand curves can be found on the OMIE website (www.omie.es), except for four suspended intraday trading sessions on 1 January 2019, 30 and 31 July 2021 and 1 November 2021.
Other energy-related price series included in the analysis are: (i) the Dutch TTF (Title Transfer Facility) futures price11. as the natural gas benchmark in Europe (Chuliá et al., 2019); (ii) the API2 index for the coal price12; (iii) the European Emission Allowances (EUA) futures price13; and (iv) several data from the France–Spain interconnection, specifically the percentage of hours with 100% use (both sides) and the spread, calculated as the Spanish day-ahead market price minus the French day-ahead market price14. In addition, the percentage of water reserves in the reservoirs of the Iberian Peninsula is also considered15. In total, the dataset used contains 122,432 observations.
11 Source: Thomson Reuters database
12 Source: Thomson Reuters database
13 Available at http://www.sendeco2.com.
14 Available on the webpage of the Electricity Interconnection in South–Western Europe (www.iesoe.eu).
15 Available on the webpage of the Spanish Ministry of Economy and Digital Transformation (www.miteco.gov.es).
4.
Methodology
A regression tree, a non-parametric supervised machine learning algorithm for regression tasks, is used to estimate each of the aforementioned cost components of the final wholesale electricity price. The algorithm is based on a recursive partitioning of the feature space represented by a tree growing. The starting point is a root node, which is the space containing all observations. The space is divided into regions and the target is modelled as the mean of each region. The split point that allows the space to be divided into regions is the one with the best fit (the one with the lowest estimation error), i.e., the one that shows different separation conditions (e.g., day-ahead price above 50 MWh). Each split point drives to a new node (or sub-region), called a leaf, from which new branches are derived until a stop criterion is applied (usually the minimum size of the sub-region or the maximum number of split points). It should be noted that this is a non-parametric procedure that has interesting advantages in that it allows us to handle non-normal data or multicollinearity. It is also robust even when there are outliers or missing values.
There are several versions of the algorithm. The most basic version is known as the Classification and Regression Tree (hereinafter referred to as CART). The root node is split into two leaf nodes according to the following criteria: given a set of predictors {X1, X2, ...Xp}, the goal is to select one of them, Xj, and the split point c to obtain two sub-regions: R1 = {X|Xj < c} and R2 = {X|Xj > = c} in such a way that the following measure is minimised:
where yi denotes the target observed for the region Ri; ^yRi is the estimated target (the mean) for the region Ri; and RSSi refers to the residual sum of squares for the region Ri. In this way, a partition is chosen that minimises the total residual sum of squares.
However, this first version of the algorithm does have some drawbacks, such as a lower prediction accuracy compared to other techniques, a high variance in the outcomes, and a tendency to overfit. To overcome these shortcomings, improved algorithms have been introduced, such as random forest and, more recently, its generalisation, causal forest.
The random forest version reduces the variance by estimating more trees and using bootstrap according to the following procedure. First, different samples with different sets of predictors are generated with bootstrap. Second, a regression tree is fitted to each of the samples. Finally, the average of the predictions using all the trees is the final prediction of the target.
Causal forest (Athey and Stefan, 2019; Credit and Lehnert, 2023) is a generalisation of random forest to estimate heterogeneity treatment effects. Keeping the same structure as random forest, an adaptive-kernel nearest-neighbour method is used to obtain the predicted values, where closeness is measured in terms of the characteristics of the training observations that fall into the same leaf. Each test observation drops into a particular leaf according to its characteristics, and a list of similar training observations is generated. A neighbourhood weight for each training observation is then calculated based on the number of times it falls into the same leaf as a given test observation. The predicted value for each test data point is the neighbourhood-weighted average difference of the outcome variable between treated and untreated observations. In addition, the splitting criterion used to construct the tree is to maximise the difference between the target observed in treated and untreated observations (according to a linear approximation of the mean difference gradient), rather than to minimise the prediction error as in traditional random forest. Following an honest estimation strategy, two samples are selected: one devoted to splitting each tree and the other to estimating causal effects. Whereas random forest produces predicted values of the outcome variable, causal forest obtains predicted values of the conditional average treatment effects at the unit level.
In this paper, the three versions discussed are applied for comparative purposes: random forest, causal forest, and the most basic version, CART, as a benchmark. Note that the random forest versions are chosen instead of others, such as boosting, because they are considered to be more effective in assessing importance measures for predictors, which is our main point of interest (Chen et al., 2023). Calculating an importance measure is not an easy task when dealing with machine learning models. In fact, they are often referred to as black boxes. The most common random forest importance measures are based on the permutation principle. The values of a predictor are permutated as a way of reducing noise, and the difference in the prediction accuracy of a random forest before and after permutating the values is seen as a quantification of the importance of predicting the outcome. (Breiman, 2001b; Strobl et al., 2008). For the purposes of this research, we select the importance measure proposed by Debeer and Strobl (2020), which is a conditional permutation random measure that is strongly recommended when there is a correlation between the predictors.
The procedure therefore consists of several steps. First, the sample is randomly split into two samples: a training sample, which contains 70% of the total sample and is used to train the algorithm, and a test sample, which uses the remaining 30% of the sample to evaluate the predictive power of the model. Second, the cost arising from managing technical constraints and the cost derived from TSO balancing and technical processes are estimated using the three regression tree versions mentioned above: CART, random forest, and causal forest. Third, the mean absolute error (MAE) and the root mean squared error (RMSE) in both the training and test samples are calculated for comparison purposes. Next, the determinants of each cost are extracted based on unconditional and conditional permutation importance measures from random forest and causal forest fitted models, respectively. Finally, the relationship between these determinants and the costs is explored in order to obtain the marginal effect of each variable on the outcome of each of the fitted models16.
16 The software used is the R packages "rpart" (Therneau and Atkinson, 2018), "randomForest" (Liaw and Wiener, 2002), "moreParty" (Robette, 2022), and, for the calculation of conditional importance measures, "permimp" (Debeer et al., 2021). The random forest packages also provide the unconditional importance measure of the permutation type.
5.
Results
Due to space constraints, we only present results for the cost derived from managing technical constraints and the cost derived from TSO balancing and technical processes. The remaining components of the final price, other than the day-ahead market price, have a very limited impact and are very stable, suggesting that their levels do not depend on the amount of renewable energy in the day-ahead market. This intuition is confirmed by the results obtained for these components, which are available upon request from the corresponding author. Therefore, the cost derived from managing technical constraints (TTCC) and the cost of the balancing processes managed by the TSO (SO) are chosen as dependent variables, or targets, for the machine learning algorithms.
As mentioned above, a total of 67 variables are used as potential predictors for each one of the targets (Table 1). The variables are grouped as follows: (i) variables derived from the day-ahead market data: the difference between the day-ahead price for the Spanish market and the day-ahead price for the Portuguese market, in an attempt to capture the effect of network bottlenecks between the two areas; the mean offer price to sell electricity and the share of power sold in the day-ahead market, grouped by technology: combined cycle plants (denoted by CC), hydroelectric (CH), pumped hydroelectric (CH_B), nuclear (CN), renewables [mainly wind and solar]17 (CR), and thermal (CT); (ii) variables generated from the first session of the intraday market data: mean offer prices to either purchase or sell electricity, grouped by generation technology, and the percentage of matching offers to sell or purchase, as well as the difference between the two; (iii) balancing costs and regulated payments, and two daily dummies that take the value 1 if there is no capacity payment on day t-1 or if there is no interruptibility service on t-1, respectively, and 0 otherwise; (iv) other energy-related commodity prices, such as natural gas prices (TTF), coal prices (API2) and carbon prices (EUA), as well as Spanish water reservoir levels; (v) several data series from the France–Spain interconnection, namely the difference between the Spanish and French spot prices, the percentage of hours at 100% load on the power flow from Spain to France and the percentage of hours at 100% load on the power flow from France to Spain; and finally, (v) calendar variables to control for seasonality (yearly, monthly, day of the week) and a labour dummy that takes the value 1 if it is a non-holiday weekday and 0 otherwise. It should be noted that the variables from the day-ahead market can be predictors for the same day of the target, while the rest of the variables are one-lagged.
17 This category also includes bids coming from cogeneration and surplus production, but these latter bids are actually of minimal importance because of their relatively limited associated volume during the studied period.
The CART version requires an additional process, the so-called pruning method, to prevent overfitting. The aim here is to reduce the number of branches by eliminating those that do not contribute to the prediction and may cause overfitting. To determine the optimal size for a tree, a tree pruning method using cross-validation (CV) is used, following Hastie et al. (2001). In this method, we first omit one observation for training and then use the resulting model to predict the omitted observation. However, full leave-one-out cross-validation is computationally more costly, so it is better to work with k-fold cross-validation and a cost-complexity function to reduce the number of fits required. A cost-complexity function for trees is CC (tree) = ∑RSSi + λ, which is the sum of the squared residuals of all terminal nodes plus λ, where the parameter λ is the number of terminal nodes. In practice, the parameter CP (cost complexity) is used, calculated as CP = λ/RSS, where RSS is the sum of squared residuals in a tree with no branches. Finally, the pruning strategy consists of growing a large tree and then pruning it back, considering the smallest sub-tree with a CV error within one standard error of the minimum18.
18 The other parameters used are CP = 0.01 in the initial tree without pruning; MinSplit = 20, the minimum number of observations in a sub-region to be split; MinBucket = 20/3 (default value); xval = 10, the number of cross-validations; and maxdepht = 30, the maximum number of levels in a tree.
For the random forest version, the number of trees should not be set too low in order to ensure that each input row is predicted at least a few times, thereby obtaining more stable outcomes. Accordingly, the number of trees selected is 500. In the splitting process, variables are selected at random in order to avoid overfitting. The default number of variables to be considered in regression trees is p/3 in each step, where p is the number of predictors (67 in this case). Next, for the causal forest version, the number of trees and the number of variables to be used in the regression trees are the same as for the random forest (500 trees and p/3).
Table 2 presents the performance metrics obtained for each target under the random forest and the causal forest approaches. As can be observed, the CART version, which is the simplest, has the poorest performance, as it gives the highest overall error metrics (MAE and RMSE) for both targets. Furthermore, despite the rigorous estimation process, there is some evidence of overfitting. In fact, the overall error metrics are marginally higher for the test sample than for the training sample, which suggests that overfitting may be an issue. However, as the main goal is to identify the factors influencing the targets, a satisfactory performance on the training sample without excessive overfitting is deemed sufficient. As can be seen, the random forest version outperforms the causal forest version in terms of the error metrics considered, with overall lower error rates for the former. Therefore, our results indicate that random forest shows a higher predictive capacity than causal forest. Nevertheless, it is important to emphasise that our goal is not to predict future values of the targets but to identify the primary factors driving them.
Once the algorithms have been trained, we proceed to obtain an importance measure that will allow us to extract from the set of 67 variables included in this empirical exercise the main determinants of the costs arising from managing technical constraints and the TSO technical processes. The importance measure will provide the ranking of the fundamentals for each of the targets. It is computed using the out-of-bag samples that are reserved during the construction of the regression trees in accordance with the permutation concept. The out-of-bag samples are generated in the regression trees following Breiman (2001b). Before each tree is constructed, the training set is bootstrapped into two samples; one is used to construct the tree, and the other, the out-of-bag portion, is saved internally for validation and also to estimate importance measures. The prediction is run twice on the out-of-bag examples, once with the values of the variables intact and once with the values of the variables randomly permutated. The differences in accuracy obtained are used to calculate the measure of the importance variable. As mentioned above, two importance measures are calculated: the unconditional permutation importance measure using the random forest approach19, where the values of each variable are permutated at random, and the conditional permutation importance measure with casual forest, according to Deeber and Strobl (2020).
19 The package R "randomForest" (importance function) is used.
Table 3 shows the ranking of the determinants or predictors for each target. Panel A exhibits the unconditional permutation importance measure ranking, based on random forest estimation, while Panel B shows the conditional permutation importance measure ranking, based on causal forest estimation.
The results obtained under the random forest approach (Table 3, Panel A) indicate that the main determinants of the technical constraints cost are, in order of importance: the one-period lagged technical constraints cost; the share of electricity sold by combined cycle plants in the day-ahead market; the mean offer price to sell electricity by combined cycle plants in the day-ahead market; the share of electricity sold by renewable energy plants in the day-ahead market; the mean offer price to sell electricity by bumping electricity plants in the day-ahead market; the share of electricity sold by thermal plants in the day-ahead market; the one-period lagged water reservoir levels; the mean offer price to sell electricity by hydroelectric plants in the day-ahead market; the one-period lagged mean offer price to sell electricity by bumping hydroelectric plants in the intraday market; and the Sunday dummy variable.
Regarding the cost of TSO technical processes up to the real-time delivery of electricity, the unconditional permutation importance measure based on the random forest version provides the following ranking of the main predictors: the one-lagged cost of TSO processes; the share of electricity sold by combined cycle plants in the day-ahead market; the mean offer price to sell electricity by combined cycle plants in the day-ahead market; the share of electricity sold by renewable plants in the day-ahead market; the one-lagged TTF natural gas futures price; the share of electricity sold by hydroelectric plants in the day-ahead market; the one-lagged EUA carbon futures price; the one-lagged API2 coal futures index; the one-lagged mean offer price to sell electricity by hydroelectric plants in the intraday market; and the mean offer price to sell electricity by renewable energy plants in the day-ahead market.
These are the primary factors influencing the costs associated with the management of technical constraints and other balancing processes within the Iberian electricity market according to the random forest methodology, which has been shown to offer lower prediction errors. However, the conditional permutation importance measure calculated under the causal forest approach is more effective at inferring causality, particularly in our case where the variables included as potential predictors can be highly correlated. In contrast to the random forest approach, the causal forest approach permits the potential determinants to be correlated, rendering it more suitable for the purposes of our work and ensuring the reliability of the results.
The causal forest estimation indicates that the factors that can explain the cost of managing technical constraints are, in order of decreasing relevance: the holiday dummy variable; the one-period-lagged technical constraints cost; the Sunday dummy variable; the share of electricity sold by renewable plants in the day-ahead market; the spread between the Spanish and Portuguese day-ahead market price; the one-period lagged intraday market cost; the share of electricity sold by nuclear plants in the day-ahead market; the share of electricity sold by combined cycle plants in the day-ahead market; the Saturday dummy variable; and the Monday dummy variable (Table 3, Panel B). It is notable that three variables have been identified by both random forest and causal forest as being among the ten factors with the greatest impact on the cost of managing technical restrictions. These variables are the one-lagged technical restrictions cost, the Sunday dummy, and the share of electricity sold by renewable plants in the day-ahead market.
With regard to the cost of TSO processes, only the share of electricity sold by renewable plants in the day-ahead market is selected as one of the main determinants by both approaches. Apart from the share of renewable generation in the day-ahead market, the other variables that feature among the top ten determinants and may help to explain the cost of TSO technical processes, according to the conditional permutation importance measure based on the causal forest estimation, are the holiday dummy variable; the one-lagged TSO processes costs; the October, December, January, and August dummy variables; and the spread calculated as the difference between the Spanish and the French spot price.
Each of the two approaches allows us to identify the top determinants of each target. However, they do not provide further details, such as whether the relationship between each determinant and its target is direct or inverse, or whether it is shown to be stable or may have breakpoints. Therefore, to complete the analysis, we use the accumulated local effects plots (ALE plots) methodology.
The ALE plots display the mean effect of the variable at a certain value compared to the average prediction of each of its determinants. On the abscissa axis, we see the values of the predictor, while on the ordinate axis, we see the estimated local effect following the ALE method, which is recommended for explaining machine learning models when predictors are correlated (Apley, 2018)20. The estimated local effect is centred. For example, a negative (positive) ALE estimation value equal to −2 (+2) on the ordinate axis at x = 30 in the graph would indicate that the predicted value is estimated to be lower (higher) by two compared to the average prediction. Therefore, the relationship between each predictor and the target estimation can be observed by plotting the accumulated local effects.
20 We use the R package ALEPlot.
Figure 1 shows the ALE plots for the cost of managing technical constraints and the corresponding previously selected top ten significant factors. As can be seen, the cost of technical constraints is significantly higher than its average value for holidays, Sunday, and Saturday, while it is lower for Monday. It is also higher than its average value as long as the one-period lagged technical constraints cost is close to or higher than its mean. The cost of technical constraints is higher than its average value when the Spanish spot price is lower than the Portuguese spot price, as well as for negative values of the impact of the intraday market trading on final wholesale market prices, and lower than its average value otherwise. Finally, the greater or lesser participation of the different generation sources in the day-ahead market also has a significant effect on the cost associated with the management of technical constraints. Thus, according to our results, this cost is expected to be higher (lower) than the average cost when the share of renewable generation is above (below) 55%, as well as when the share of nuclear generation exceeds (does not reach) 11%, while it is expected to be lower (higher) when the share of combined cycle generation is higher (lower) than 7%.
Our results show that the cost of managing technical constraints appears to be higher in periods that are generally characterised by low day-ahead market prices. Indeed, this is the case for holidays, Saturdays, and Sundays, when electricity demand is often lower than on workdays. In line with these results, the cost of technical constraints is also found to be lower on (non-holiday) Mondays. Furthermore, an increase in the share of renewable generation and/or nuclear generation in the day ahead market would be followed by higher technical constraints costs. Both renewable and nuclear plants often bid at very low prices because their opportunity costs are close to zero. And due to the merit order effect, in general, the higher the share of renewable and/or nuclear generation in the day-ahead market, the lower the spot price.
These findings may seem counterintuitive, as one would associate network congestion problems with situations of high demand (which usually leads to higher prices), or at least with sufficiently high levels of demand concentrated around network points identified as critical due to insufficient network capacity. However, our findings could also be explained by the strategic bidding behaviour of flexible generators, such as combined cycle plants, which may have an economic incentive to avoid (at least partially) being dispatched in the day-ahead market in order to participate in the technical restrictions market and obtain a higher price for their electricity, thereby maximising their overall profit. It should be noted that generators in the Iberian market are obliged to submit offers to sell all their available electricity in the day-ahead market. However, they could stay out of the day-ahead market by submitting artificially high offer prices in the corresponding auctions in order to have spare capacity to generate electricity in the technical restrictions market, similar to the strategies described in Furió and Lucia (2009). Such strategic behaviour would be consistent with our findings that the cost of managing technical restrictions increases when the share of combined cycle generation in the day-ahead market decreases and/or when the share of renewable and/or nuclear generation in the day-ahead market increases21.
21 A substitution effect is usually observed between generation technologies with low variable costs (such as nuclear, wind, or solar) and generation technologies with high variable costs (such as combined cycle natural gas or thermal, among others).
The ALE plots for the cost of TSO technical processes (SO) and each of its ten most important drivers are displayed in Figure 2. As can be seen, the SO exhibits monthly and daily patterns. For example, there is a greater need for these technical processes during holidays, on Sundays, and in the months of October, January, and August, as the cost appears to be higher than its average for these periods, while it is lower for December. The SO variable lagged one period shows an effect on the current level of SO, in the sense that the higher the SO lagged one period, the higher the current TSO cost. The spread between the Spanish and the Portuguese spot prices is also one of its determinants. Similarly to the case of the cost of managing technical constraints, the TSO cost would be higher (lower) than its average cost for negative (positive) values of the spread. Finally, the share of electricity sold by renewable plants in the day-ahead market would cause the TSO cost to be higher than the average as long as the share exceeds 57%, while the lower (higher) the average offer price to sell electricity by renewable plants, the higher (lower) the SO. Regarding the potential impact of renewable generation on the cost of TSO technical processes, both findings align, as, ceteris paribus, lower offer prices are likely to lead to higher shares of renewables in the day-ahead market.
6.
Conclusions
The factors driving the components of final wholesale electricity prices, other than the day-ahead market price, are much less studied but are key to gaining further insight into the dynamics between the interrelated trading segments and the technical processes involved, and should therefore be taken into account when assessing changes in market design aimed at creating a more efficient and resilient electricity system.
The aim of this paper is to investigate the impact of renewable generation on the costs of managing network congestion and maintaining the energy balance between supply and demand, up to the real-time delivery of electricity under standards of reliability. In addition, the methodology employed has enabled us to identify the primary drivers of the costs associated with the technical processes required to ensure the security of supply.
The results of our study indicate that the share of renewable generation in the Spanish day-ahead market is a significant factor influencing both the cost of managing technical constraints, which aims to solve network capacity problems, and the cost of managing balancing processes and adjustment issues by the TSO. In particular, higher levels of renewable generation in the day-ahead market will lead to (i) an increased cost of managing technical constraints and (ii) a greater need for the management of deviations by the TSO, which in turn will result in higher costs. It is evident that both factors will contribute to pushing up final prices. It is important to note, however, that these costs represent a very small percentage of the final price and do not appear to have a significant impact on final wholesale electricity prices throughout the period analysed. This finding suggests that the price increase resulting from the elevated share of renewables in the generation mix is not appreciably greater than the price reduction attributable to the merit order effect.
In conclusion, the results obtained shed light on the overall impact of renewable generation on electricity prices and suggest interesting avenues for further investigation. First, we present insightful evidence-based information on the successful integration of large amounts of renewable energy into the electricity generation mix, as required by the energy transition, without incurring excessive costs. Second, the empirical analysis could be extended to other countries in order to further investigate whether and to what extent progressively increasing shares of renewable generation will entail additional costs that ultimately lead to rising prices. Third, in pursuit of greater market efficiency, it would be advisable for regulators to investigate strategic bidding behaviour in this new paradigm, with the objective of identifying any potential abuse of market power. Finally, we anticipate that our findings will be of interest to both practitioners and regulators, as they provide a more comprehensive understanding of the market's functioning and have implications for the restructuring of the market towards a more sustainable and competitive electricity system.
Future research could investigate a number of avenues to enhance our understanding of the impact of renewable generation on electricity markets and inform policy decisions. For instance, extending the empirical analysis to other countries would provide valuable insights into whether the findings observed in the Spanish market are consistent across different regulatory environments and market structures. Furthermore, an investigation into the dynamics of strategic bidding behaviour in the context of renewable energy integration, with a focus on identifying and mitigating possible interferences in price or market abuse, would be of significant importance for ensuring market efficiency and fairness.
Acknowledgments
Financial support from the Spanish Ministry of Science, Innovation and Universities (Project PGC2018-093645-B-100) is gratefully acknowledged. We would like to express our gratitude to two anonymous for their constructive comments, which contributed to improving the paper, and to the assistant editor for their valuable work in handling this manuscript. All errors are our own responsibility.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Conflict of interest
The authors declare that there are no conflicts of interest in this paper.