We study how to perform tests on samples of pairs of observations and predictions in order to assess whether or not the predictions are prudent. Prudence requires that the mean of the difference of the observation-prediction pairs can be shown to be significantly negative. For safe conclusions,we suggest testing both unweighted (or equally weighted) and weighted means and explicitly taking into account the randomness of individual pairs. The test methods presented are mainly specified as bootstrap and normal approximation algorithms. The tests are general but can be applied in particular in the area of credit risk,both for regulatory and accounting purposes.
Citation: Dirk Tasche. Proving prediction prudence[J]. Data Science in Finance and Economics, 2022, 2(4): 335-355. doi: 10.3934/DSFE.2022017
[1] | Dominic Joseph . Estimating credit default probabilities using stochastic optimisation. Data Science in Finance and Economics, 2021, 1(3): 253-271. doi: 10.3934/DSFE.2021014 |
[2] | Michael Jacobs Jr. . Validation of corporate probability of default models considering alternative use cases and the quantification of model risk. Data Science in Finance and Economics, 2022, 2(1): 17-53. doi: 10.3934/DSFE.2022002 |
[3] | Sami Mestiri . Credit scoring using machine learning and deep Learning-Based models. Data Science in Finance and Economics, 2024, 4(2): 236-248. doi: 10.3934/DSFE.2024009 |
[4] | Michael Jacobs, Jr . Benchmarking alternative interpretable machine learning models for corporate probability of default. Data Science in Finance and Economics, 2024, 4(1): 1-52. doi: 10.3934/DSFE.2024001 |
[5] | Lindani Dube, Tanja Verster . Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models. Data Science in Finance and Economics, 2023, 3(4): 354-379. doi: 10.3934/DSFE.2023021 |
[6] | Ying Li, Keyue Yan . Prediction of bank credit customers churn based on machine learning and interpretability analysis. Data Science in Finance and Economics, 2025, 5(1): 19-34. doi: 10.3934/DSFE.2025002 |
[7] | Changjun Zheng, Md Abdul Mannan Khan, Mohammad Morshedur Rahman, Shahed Bin Sadeque, Rabiul Islam . The impact of monetary policy on banks' risk-taking behavior in an emerging economy: The role of Basel II. Data Science in Finance and Economics, 2023, 3(4): 427-451. doi: 10.3934/DSFE.2023024 |
[8] | László Bokor . Alarm probabilities of simple tests of merger control: An analytic derivation. Data Science in Finance and Economics, 2024, 4(2): 188-217. doi: 10.3934/DSFE.2024007 |
[9] | Markus Haas . The Cowles–Jones test with unspecified upward market probability. Data Science in Finance and Economics, 2023, 3(4): 324-336. doi: 10.3934/DSFE.2023019 |
[10] | Dong Qiu, Tingyi Liu . Multi-indicator comprehensive evaluation: reflection on methodology. Data Science in Finance and Economics, 2021, 1(4): 298-312. doi: 10.3934/DSFE.2021016 |
We study how to perform tests on samples of pairs of observations and predictions in order to assess whether or not the predictions are prudent. Prudence requires that the mean of the difference of the observation-prediction pairs can be shown to be significantly negative. For safe conclusions,we suggest testing both unweighted (or equally weighted) and weighted means and explicitly taking into account the randomness of individual pairs. The test methods presented are mainly specified as bootstrap and normal approximation algorithms. The tests are general but can be applied in particular in the area of credit risk,both for regulatory and accounting purposes.
Testing if the means of two samples significantly differ or the mean of one sample significantly exceeds the mean of the other sample is a problem that is widely covered in the statistical literature [see for instance Casella and Berger, 2002, Davison and Hinkley, 1997, Venables and Ripley, 2002]. In this paper, we study how to perform such tests on samples of pairs of observations and predictions in order to assess whether or not the predictions are prudent. Prudence is here understood as the requirement that the mean of the differences of the observations and predictions can be shown to be significantly negative.
At the latest by the validation requirements for credit risk parameter estimates in the regulatory Basel II framework [BCBS, 2006, paragraph 501], such tests also became an important issue in the banking industry*:
* PD means 'probability of default', IRB means 'internal ratings based', LGD means 'loss given default' and EAD is 'exposure at default'.
● "Banks must regularly compare realised default rates with estimated PDs for each grade and be able to demonstrate that the realised default rates are within the expected range for that grade", and "banks using the advanced IRB approach must complete such analysis for their estimates of LGDs and EADs".
More recently, as a consequence of the introduction of new rules for loss provisioning in financial reporting standards, the validation of risk parameter estimates also attracted interest in the accounting community [see, e.g., Bellini, 2019]. Over the course of the past fifteen years or so, a variety of statistical tests for the comparison of realised and predicted values have been proposed for use in the banks' validation exercises. For overviews on estimation and validation as well as references see Blümke[2019, PD], Loterman et al. [2014, LGD], and Gürtler et al. [2018, EAD]. Scandizzo [2016] presents validation methods for all these kinds of parameters in the general context of model risk management.
In order to make validation results by different banks to some extent comparable, in February 2019, the European Central Bank [ECB, 2019]† asked the banks it supervises under the Single Supervisory Mechanism (SSM) to deliver standardised annual reports on their internal model validation exercises. In particular, the requested reports are assumed to include data and tests regarding the "predictive ability (or calibration)" of PD, LGD and CCF (credit conversion factor)‡ parameters in the most recent observation period. Predictive ability for LGD estimation is explained through the statement "the analysis of predictive ability (or calibration) is aimed at ensuring that the LGD parameter adequately predicts the loss rate in the event of a default i.e. that LGD estimates constitute reliable forecasts of realised loss rates" [ECB, 2019, Section 2.6.2]. The meanings of predictive ability for PD and EAD / CCF respectively are illustrated in similar ways.
† In May 2020, this document could be downloaded at https://www.bankingsupervision.europa.eu/banking/tasks/internal_models/shared/pdf/instructions_validation_reporting_credit_risk.en.pdf.
‡ EAD and CCF of a credit facility are linked by the relation EAD = DA + CCF*(limit-DA) where DA is the already drawn amount.
ECB [2019] proposed "one-sample t-test[s] for paired observations" to test the "null hypothesis that estimated LGD [or CCF or EAD] is greater than true LGD" (or CCF or EAD). ECB [2019] also suggested a Jeffreys binomial test for the "null hypothesis that the PD applied in the portfolio/rating grade at the beginning of the relevant observation period is greater than the true one (one sided hypothesis test)".
Recall that the possible outcomes of testing a null hypothesis against an alternative are 'the null hypothesis is not rejected' or 'the null is rejected and the alternative is accepted'. Not rejecting the null hypothesis does not mean accepting it because in hypothesis testing the type II error (not rejecting the null hypothesis although the alternative is true) cannot be controlled and, therefore, can be rather large. In contrast, the type I error (rejecting the null hypothesis although it is true) can be controlled and usually is kept small by choosing a significance level like 5% or 1%. Hence, if the null hypothesis is rejected the alternative can be accepted at properly controlled risk. In the following, we understand the acceptance of an alternative hypothesis by rejection of the null hypothesis as statistical 'proof' with an error probability tag (i.e. the significance level or p-value).
In this paper,
● we make a case for also testing the null hypothesis that the estimated parameter is less than or equal to the true parameter in order to be able to 'prove' that the estimate is prudent (or conservative),
● we suggest additionally using exposure- (or limit-)weighted§ sample averages in order to better inform assessments of estimation (or prediction) prudence, and
§ ECBintructions presumably only looks at "number-weighted" (i.e. equally weighted) averages because the Basel framework BaselAccord requires such averages for the risk parameter estimates. In banking practice, however, also exposure-weighted averages are considered [see, e.g., Li et al., 2009].
● we propose more elaborate statements of the hypotheses for the tests (by including 'variance expansion') in order to account for portfolio inhomogeneity in terms of composition (exposure sizes) and riskiness.
The proposal to look for a 'proof' of prediction prudence is inspired by the regulatory requirement [BCBS, 2006, paragraph 451]: "In order to avoid over-optimism, a bank must add to its estimates a margin of conservatism that is related to the likely range of errors".
As a matter of fact, the statistical tests discussed in this paper can be deployed both for proving prudence and for proving aggressiveness of estimates. However, an unsymmetric approach is recommended for making use of the evidence from the tests:
● For proving prudence, request that both the equal-weights test and the exposure-weighted test reject the null hypothesis of the parameter being aggressive.
● For an alert of potential aggressiveness, request only that the equal-weights test or the exposure-weighted test reject the null hypothesis of the parameter being prudent.
The paper is organised as follows:
● In Section 2, we introduce a general non-parametric paired difference test approach to testing for the sign of a weighted mean value (Section 2.1). We compare this approach to the t-test for LGD, CCF and EAD proposed in ECBintructions and note possible improvements of both approaches (Section 2.2). We then present in Section 2.3 a test approach to put into practice these improvements in the case of variables with values in the unit interval like LGD and CCF. Appendices A and B supplement Section 2.3 with regard to weight-adjustments as an alternative to sampling with inhomogeneous weights and to testing non-negative but not necessarily bounded variables like EAD.
● In Section 3, we discuss paired difference tests in the special case of differences between observed event indicators and the predicted probabilities of the events. We start in Section 3.1 with the presentation of a test approach that takes account of potential weighting of the observation pairs and variance expansion to deal with the individual randomness of the observations. In Section 3.2, we compare this test approach to the Jeffreys test proposed in ECBintructions for assessing the 'predictive ability' of PD estimates.
● In Section 4, the test methods presented in the preceding sections are illustrated with two examples of test results.
● Section 5 concludes the paper with summarising remarks.
The statistical tests considered in this paper are 'paired difference tests'. This test design accounts for the strong dependence that is to be expected between the observation and the prediction in the matched observation-prediction pairs which the analysed samples consist of. See Mendenhall et al. [2008, Chapter 10] for a discussion of the advantages of such test designs.
Starting point.
● One sample of real-valued observations Δ1,…,Δn.
● Weights 0<wi<1, i=1,…,n, with ∑ni=1wi=1.
● Define the weighted-average observation Δw as
Δw=n∑i=1wiΔi. | (2.1) |
Interpretation in the context of credit risk back-testing.
● Δ1,…,Δn may be a sample of differences (residuals) between observed and predicted LGD (or CCF or EAD) for defaulted credit facilities (matched pairs of observations and predictions).
● The weight wi reflects the relative importance of observation i. For instance, in the case of CCF or EAD estimates of credit facilities, one might choose
wi = limiti∑nj=1limitj, | (2.2a) |
where limitj is the limit of credit facility j at the time when the estimates were made.
● In case of LGD estimates, the weights wi could be chosen as [Li et al., 2009, Section 5]
wi = EADi∑nj=1EADj, | (2.2b) |
where EADj is the exposure at default estimate for credit facility j at the time when the estimates were made.
Goal. We consider Δw as defined by (2.1) the realisation of a test statistic to be defined below and want to answer the following two questions:
● If Δw<0, how safe is the conclusion that the observed (realised) values are on weighted average less than the predictions, i.e. the predictions are prudent / conservative?
● If Δw>0, how safe is the conclusion that the observed (realised) values are on weighted average greater than the predictions, i.e. the predictions are aggressive?
The safety of conclusions is measured by p-values which provide error probabilities for the conclusions to be wrong. The lower the p-value, the more likely the conclusion is right.
In order to be able to examine the properties of the sample and Δw with statistical methods, we have to make the assumption that the sample was generated with some random mechanism. The key idea for the mechanism is to interpret the weights wi as the probabilities of the corresponding observations Δi. Consequently, we look at an inhomogeneous version of the empirical distribution of the sample Δ1,…,Δn, with the weight wi replacing 1/n as the probability of observation Δi. The details of the mechanism are described in the following assumption.
Assumption 2.1. The sample Δ1,…,Δn consists of independent realisations of a random variable Xϑ with distribution given by
P[Xϑ=Δi−ϑ] = wi,i=1,…,n, | (2.3) |
where the value of the parameter ϑ∈R is unknown.
Note that (2.3) includes the case of equally weighted observations¶, by choosing wi=1/n for all i.
¶ See Appendix A for a more detailed discussion of special cases with equal weights.
Proposition 2.2. For Xϑ as described in Assumption 2.1, the expected value and the variance are given by
E[Xϑ]=Δw−ϑ, and | (2.4a) |
var[Xϑ]=n∑i=1wiΔ2i−Δ2w. | (2.4b) |
Proof. Obvious.
By Assumption 2.1 and Proposition 2.2, the questions on the safety of conclusions from the sign of Δw can be translated into hypotheses on the value of the parameter ϑ:
● If Δw<0, can we conclude that H0:ϑ≤Δw is false and H1:ϑ>Δw⇔E[Xϑ]<0 is true?
● If Δw>0, can we conclude that H∗0:ϑ≥Δw is false and H∗1:ϑ<Δw⇔E[Xϑ]>0 is true?
If we assume that the sample Δ1,…,Δn was generated by independent realisations of Xϑ then the distribution of the sample mean is different from the distribution of Xϑ, as shown in the following corollary to Proposition 2.2.
Corollary 2.3. Let X1,ϑ,…,Xn,ϑ be independent and identically distributed copies of Xϑ as in Assumption 2.1 and define ˉXϑ=1n∑ni=1Xi,ϑ. Then for the mean and the variance of ˉXϑ, it holds that
E[ˉXϑ]=Δw−ϑ, | (2.5a) |
var[ˉXϑ]=1n(n∑i=1wiΔ2i−Δ2w). | (2.5b) |
In the following, we use ˉXϑ as the test statistic and interpret Δw as its observed value||. Next we describe a bootstrap test to answer the above questions under Assumption 2.1 and then provide the rationale behind its design.
|| For arithmetic reasons, actually most of the time Δw cannot be a realisation of ˉXϑ. As long as the sample size n is not too small, however, by (2.5a) and the law of large numbers considering Δw as realisation of ˉXϑ is not unreasonable.
Bootstrap test. Generate a Monte Carlo sample** ˉx1,…,ˉxR from Δ1,…,Δn as follows:
** According to Davison and Hinkley [1997, Section 5.2.3], sample size R=999 should suffice for the purposes of this paper.
● For j=1,…,R: ˉxj is the equally weighted mean of n independent draws from the distribution of Xˆϑ as given by (2.3), with ˆϑ=0. Equivalently, ˉxj is the mean of n draws with replacement from the sample Δ1,…,Δn, where Δi is drawn with probability wi.
● ˉx1,…,ˉxR are realisations of independent, identically distributed random variables.
Then a bootstrap p-value for the test of H0:ϑ≤Δw against H1:ϑ>Δw can be calculated as††
†† #S denotes the number of elements of the set S.
p-value = 1+#{i:i∈{1,…,n},ˉxi≤2Δw}R+1. | (2.6a) |
A bootstrap p-value for the test of H∗0:ϑ≥Δw against H∗1:ϑ<Δw is given by
p-value∗ = 1+#{i:i∈{1,…,n},ˉxi≥2Δw}R+1. | (2.6b) |
Rationale. By (2.3), for each ϑ the distributions of X0−ϑ and Xϑ are identical. As a consequence, if under H0 the true parameter is ϑ≤Δw and (−∞,x] is the critical (rejection) range for the test of H0 against H1 based on the test statistic ˉXϑ, then it holds that
P[ˉXϑ∈(−∞,x]] = P[ˉX0≤x+ϑ] ≤ P[ˉX0≤x+Δw]. | (2.7) |
Hence, by Theorem 8.3.27 of Casella and Berger [2002], in order to obtain a p-value for H0:ϑ≤Δw against H1:ϑ>Δw, according to (2.7) it suffices to specify:
● The upper limit x of the critical range for rejection of H0:ϑ≤Δw as 'observed' value Δw of ˉXϑ, and
● an approximation of the distribution of ˉX0, as it is done by generating the bootstrap sample ˉx1,…,ˉxR.
This implies Equation (2.6a) for the bootstrap p-value‡‡ of the test of H0 against H1. The rationale for (2.6b) is analogous.
‡‡ We adopt here the definition provided by Davison and Hinkley [1997, Eq. (4.11)].
Normal approximate test. By Corollary 2.3 for ϑ=Δw, we find that the distribution of ˉXΔw can be approximated by a normal distribution with mean 0 and variance as shown on the right-hand side of (2.5b). With x=Δw, therefore, we obtain the following expression for the normal approximate p-value of H0:ϑ≤Δw against H1:ϑ>Δw:
p-value = P[ˉXΔw≤x] ≈ Φ(√nΔw√∑ni=1wiΔ2i−Δ2w). | (2.8a) |
Here Φ denotes the standard normal distribution function. The same reasoning gives for the normal approximate p-value of H∗0:ϑ≥Δw against H∗1:ϑ<Δw:
p-value∗ ≈ 1−Φ(√nΔw√∑ni=1wiΔ2i−Δ2w). | (2.8b) |
In Sections 2.6.2 (for LGD back-testing), 2.9.3.1 (for CCF back-testing) and 2.9.3.2 (for EAD back-testing) of ECBintructions, the ECB proposes a t-test for (in the terms of Section 2.1 of this paper) H∗0:ϑ≥Δw against H∗1:ϑ<Δw. Transcribed into the notation of Section 2.1, the test can be described as follows:
● n is the number of matched pairs of observations and predictions in the sample.
● Δi is the difference of
● the realised LGD for facility i and the estimated LGD for facility i in ECB Section 2.6.2,
● the realised CCF for facility i and the estimated CCF for facility i in ECB Section 2.9.3.1, and
● the drawings (balance sheet exposure) at the time of default of facility i and the estimated EAD of facility i in ECB Section 2.9.3.2.
- All wi equal 1/n.
- The right-hand side of (2.5b) is replaced by the sample variance
s2n = 1n−1(1nn∑i=1Δ2i−Δ21/n). |
● The p-value is computed as
p-value∗ = 1−Ψn−1(Δ1/nsn), | (2.9) |
where Ψn−1 denotes the distribution function of Student's t-distribution with n−1 degrees of freedom.
By the Central Limit Theorem, the p-values according to (2.6b), (2.8c) and (2.9) will come out almost identical for large sample sizes n and equal weights wi=1/n for all i=1,…,n. For smaller n, the value of (2.9) would be exact if the variables Xi,ϑ in Corollary 2.3 were normally distributed.
Criticisms of the basic approach. The basic approach as described in Sections 2.1 and 2.2 fails to take account of the following issues:
● The random mechanism reflected by (2.3) can be interpreted as an expression of uncertainty about the cohort / portfolio composition. The randomness of the loss rate / exposure of the individual facilities – the degree of which potentially can differ between facilities – is not captured by (2.3).
● The parametrisation of the distribution by a location parameter in (2.3) could result in distributions with features that are not realistic, for instance negative exposures or loss rates greater than one.
In the following section and in Appendix B, we are going to modify the basic approach for LGD / CCF on the one hand and EAD on the other hand in such a way as to take into account these two issues.
By definition, both LGD and CCF take values only in the unit interval [0,1]. This fact allows for more specific tests than the ones considered in the previous sections. In this section, we talk only about LGD most of the time. But the concepts discussed also apply with little or no modification to CCF or any other variables with values in the unit interval.
Starting point.
● A sample of paired observations (λ1,ℓ1),…,(λn,ℓn), with predicted LGDs 0<λi<1 and realised loss rates 0≤ℓi≤1.
● Weights 0<wi<1, i=1,…,n, with ∑ni=1wi=1,
● Weighted average loss rate ℓw=∑ni=1wiℓi and weighted average loss prediction λw=∑ni=1wiλi.
Interpretation in the context of LGD back-testing.
● A sample of n defaulted credit facilities / loans is analysed.
● The LGD λi is an estimate of loan i's loss rate as a consequence of the default, measured as percentage of the exposure at the time of default (EAD).
● The realized loss rate ℓi shows the percentage of loan i's exposure at the time of default that cannot be recovered.
● The weight wi reflects the relative importance of observation i. In the case of LGD predictions, one might choose (2.2b) for the definition of the weights, for CCF one might choose (2.2a) instead.
● Define Δi=ℓi−λi, i=1,…,n. If |Δi|≈0 then λi is a good LGD prediction. If |Δi|≈1 then λi is a poor LGD prediction.
Goal. We want to use the observed weighted average difference / residual Δw=∑ni=1wiΔi=ℓw−λw to assess the quality of the calibration of the model / approach for the λi to predict the realised loss rates ℓi. Again we want to answer the following two questions:
● If Δw<0, how safe is the conclusion that the observed (realised) values are on weighted average less than the predictions, i.e. the predictions are prudent / conservative?
● If Δw>0, how safe is the conclusion that the observed (realised) values are on weighted average greater than the predictions, i.e. the predictions are aggressive?
The safety of such conclusions is measured by p-values which provide error probabilities for the conclusions to be wrong. The lower the p-value, the more likely the conclusion is right.
In order to be able to examine the specific properties of the sample and Δw with statistical methods, we have to make the assumption that the sample was generated with some random mechanism. This mechanism is described in the following modification of Assumption 2.1.
Assumption 2.4. The sample Δ1,…,Δn consists of independent realisations of a random variable Xϑ with distribution given by
Xϑ = ℓI−Yϑ, | (2.10a) |
where I is a random variable with values in {1,…,n} and P[I=i]=wi, i=1,…,n. Yϑ is a beta(αi,βi)-distributed random variable§§ conditional on I=i for i=1,…,n. The parameters αi and βi of the beta-distribution depend on the unknown parameter 0<ϑ<1 by
§§ See Casella and Berger [2002, Section 3.3] for a definition of the beta-distribution.
αi = ϑi1−vv,andβi = (1−ϑi)1−vv. | (2.10b) |
In (2.10b), the constant 0<v<1 is the same for all i. The ϑi are determined by
ϑi = (λi)h(ϑ), | (2.10c) |
where 0<h(ϑ)<∞ is the unique solution h of the equation
ϑ = n∑i=1wi(λi)h. | (2.10d) |
Assumption 2.4 introduces randomness of the difference between loss rate and LGD prediction for individual facilities. Comparison between (2.13b) below and (2.4b) shows that this entails variance expansion of the sample Δ1,…,Δn.
Note that Assumption 2.4 also describes a method for recalibration of the LGD estimates λ1,…,λn to match targets ϑ with the weighted average of the ϑi. In contrast to (2.3), the transformation (2.10c) makes it sure that the transformed LGD parameters still are values in the unit interval. By definition of Yϑ, it holds that E[Yϑ|I=i]=ϑi.
The constant v specifies the variance of Yϑ conditional on I=i as percentage of the supremum ϑi(1−ϑi) of its possible conditional variance, i.e. it holds that
var[Yϑ|I=i] = vϑi(1−ϑi),i=1,…,n. | (2.11) |
The constant v must be pre-defined or separately estimated. We suggest estimating it from the sample ℓ1,…,ℓn as
ˆv = ∑ni=1wiℓ2i−ℓ2wℓw(1−ℓw). | (2.12) |
This approach yields 0≤ˆv≤1 because the fact that 0≤ℓi≤1, i=1,…,n, implies
n∑i=1wiℓ2i−ℓ2w ≤ ℓw(1−ℓw). |
A simpler alternative to the definition (2.10c) of ϑi would be linear scaling: ϑi=λiϑλw. However, with this definition ϑi>1 may be incurred. This is not desirable because then the beta-distribution for Yϑ|I=i would be ill-defined.
Proposition 2.5. For Xϑ as described in Assumption 2.4, the expected value and the variance are given by
E[Xϑ]=ℓw−ϑ, and | (2.13a) |
var[Xϑ]=n∑i=1wi(ℓi−ϑi)2−(ℓw−ϑ)2+vn∑i=1wiϑi(1−ϑi). | (2.13b) |
Proof. For deriving the formula for var[Xϑ], make use of the well-known variance decomposition
var[Xϑ]=E[var[Xϑ|I]]+var[E[Xϑ|I]]. |
In contrast to (2.4b), the variance of Xϑ as shown in (2.13b) depends on the parameter ϑ and has an additional component v∑ni=1wiϑi(1−ϑi) which reflects the potentially different variances of the loss rates in an inhomogeneous portfolio.
By Assumption 2.4 and Proposition 2.5, the questions on the safety of conclusions from the sign of Δw=ℓw−λw again can be translated into hypotheses on the value of the parameter ϑ:
● If Δw<0, can we conclude that H0:ϑ≤ℓw is false and H1:ϑ>ℓw⇔E[Xϑ]<0 is true?
● If Δw>0, can we conclude that H∗0:ϑ≥ℓw is false and H∗1:ϑ<ℓw⇔E[Xϑ]>0 is true?
If we assume that the sample Δ1,…,Δn was generated by independent realisations of Xϑ then the distribution of the sample mean is different from the distribution of Xϑ, as shown in the following corollary to Proposition 2.5.
Corollary 2.6. Let X1,ϑ,…,Xn,ϑ be independent and identically distributed copies of Xϑ as in Assumption 2.4 and define ˉXϑ=1n∑ni=1Xi,ϑ. Then for the mean and variance of ˉXϑ, it holds that
E[ˉXϑ]=ℓw−ϑ. | (2.14a) |
var[ˉXϑ]=1n(n∑i=1wi(ℓi−ϑi)2−(ℓw−ϑ)2+vn∑i=1wiϑi(1−ϑi)). | (2.14b) |
In the following, we use ˉXϑ as the test statistic and interpret Δw=ℓw−λw as its observed value.
Proposition 2.7. In the setting of Assumption 2.4 and Corollary 2.6, ϑ≤ˆϑ implies that
P[ˉXϑ≤x] ≤ P[ˉXˆϑ≤x],for all x∈R. |
Proof. Observe that ϑ≤ˆϑ implies ϑi≤ˆϑi for all i=1,…,n. For fixed i, the family of beta(αi,βi)-distributions, parametrised by ϑ∈(0,1), has got a monotone likelihood ratio in the sense of Definition 8.3.16 of Casella and Berger [2002]. This implies that for ϑ≤ˆϑ, conditional on I=i, the distribution of Yˆϑ is stochastically not less than the distribution of Yϑ, i.e. it holds that
P[Yϑ≤x|I=i] ≥ P[Yˆϑ≤x|I=i],for all x∈R. |
From this, it follows that for all i=1,…,n
P[Xϑ≤x|I=i] ≤ P[Xˆϑ≤x|I=i],for all x∈R. |
But this inequality implies for all x∈R that
P[Xϑ≤x] = n∑i=1wiP[Xϑ≤x|I=i] ≤ P[Xˆϑ≤x]. | (2.15) |
Property (2.15) is passed on to convolutions of independent copies of Xϑ and Xˆϑ. This proves the assertion.
Bootstrap test. Generate a Monte Carlo sample ˉx1,…,ˉxR from Xϑ with ϑ=ℓw as follows:
● For j=1,…,R: ˉxj is the equally weighted mean of n independent draws from the distribution of Xϑ as given by Assumption 2.4, with ϑ=ℓw.
● ˉx1,…,ˉxR are realisations of independent, identically distributed random variables.
Then a bootstrap p-value for the test of H0:ϑ≤ℓw against H1:ϑ>ℓw can be calculated as
p-value = 1+#{i:i∈{1,…,n},ˉxi≤ℓw−λw}R+1. | (2.16a) |
A bootstrap p-value for the test of H∗0:ϑ≥ℓw against H∗1:ϑ<ℓw is given by
p-value∗ = 1+#{i:i∈{1,…,n},ˉxi≥ℓw−λw}R+1. | (2.16b) |
Rationale. By Proposition 2.7, if under H0 the true parameter is ϑ≤ℓw and (−∞,x] is the critical (rejection) range for the test of H0:ϑ≤ℓw against H1:ϑ>ℓw based on the test statistic ˉXϑ, then it holds that
P[ˉXϑ∈(−∞,x]] ≤ P[ˉXℓw≤x]. | (2.17) |
Hence, by Theorem 8.3.27 of Casella and Berger [2002], in order to obtain a p-value for H0:ϑ≤ℓw against H1:ϑ>ℓw, according to (2.17) it suffices to specify:
● The upper limit x of the critical range for rejection of H0:ϑ≤ℓw as our realisation Δw=ℓw−λw of ˉXϑ, and
● an approximation of the distribution of ˉXℓw, as it has been done by generating the bootstrap sample ˉx1,…,ˉxR.
This implies Equation (2.16a) for the bootstrap p-value. The rationale for (2.16b) is analogous.
Normal approximate test. By Corollary 2.6, we find that the distribution of ˉXℓw can be approximated by a normal distribution with mean 0 and variance as shown on the right-hand side of (2.13b) with ϑ=ℓw. With x=ℓw−λw, one obtains for the approximate p-value of H0:ϑ≤ℓw against H1:ϑ>ℓw:
p-value = P[ˉXℓw≤x] ≈ Φ(√n(ℓw−λw)√n∑i=1wi(ℓi−ˆϑi)2+vn∑i=1wiˆϑi(1−ˆϑi)), | (2.18a) |
with ˆϑi=(λi)h(ℓw) as in Assumption 2.4. The same reasoning gives for the normal approximate p-value of H∗0:ϑ≥ℓw against H∗1:ϑ<ℓw:
p-value∗ ≈ 1−Φ(√n(ℓw−λw)√∑ni=1wi(ℓi−ˆϑi)2+v∑ni=1wiˆϑi(1−ˆϑi)). | (2.18b) |
Starting point.
● A sample of paired observations (p1,b1),…,(pn,bn), with probabilities 0<pi<1 and status indicators bi∈{0,1} (1 for defaulted, 0 for performing).
● Weights 0<wi<1, i=1,…,n, with ∑ni=1wi=1,
● Weighted default rate bw=∑ni=1wibi and weighted average PD pw=∑ni=1wipi.
Interpretation in the context of PD back-testing.
● A sample of n borrowers is observed for a certain period of time, most commonly one year.
● The PD pi is an estimate of borrower i's probability to default during the observation period, estimated before the beginning of the period.
● The status indicator bi shows borrower i's performance status at the end of the observation period. bi=1 means "borrower has defaulted", bi=0 means "borrower is performing".
● wi could be the relative importance of observation i. In the case of default predictions, one might choose weights as in (2.2b).
● Define Δi=bi−pi, i=1,…,n. If |Δi|≈0 then pi is a good default prediction. If |Δi|≈1 then pi is a poor default prediction.
Goal. We want to use the observed weighted average difference / residual Δw=∑ni=1wiΔi=bw−pw to assess the quality of the calibration of the model / approach for the pi to predict the realised status indicators bi. Again we want to answer the following two questions:
● If Δw<0, how safe is the conclusion that the observed (realised) values are on weighted average less than the predictions, i.e. the predictions are prudent / conservative?
● If Δw>0, how safe is the conclusion that the observed (realised) values are on weighted average greater than the predictions, i.e. the predictions are aggressive?
The safety of such conclusions is measured by p-values which provide error probabilities for the conclusions to be wrong. The lower the p-value, the more likely the conclusion is right. In determining the p-values, we take into account the criticisms of the basic approach as mentioned at the end of Section 2.2.
In order to be able to examine the PD-specific properties of the sample and Δw=bw−pw with statistical methods, we have to make the assumption that the sample was generated with some random mechanism. This mechanism is described in the following modification of Assumptions 2.1 and 2.4.
Assumption 3.1. The sample Δ1,…,Δn consists of independent realisations of a random variable Xϑ with distribution given by
Xϑ = bI−Yϑ, | (3.1a) |
where I is a random variable with values in {1,…,n} and P[I=i]=wi, i=1,…,n. Yϑ is a Bernoulli variable with
P[Yϑ=1|I=i] = ϑi,i=1,…,n. | (3.1b) |
Define ϱi=1−pipipw1−pw. Then the ϑi depend on the unknown parameter 0<ϑ<1 by
ϑi=ϑϑ+(1−ϑ)ϱih(ϑ), | (3.1c) |
where 0<h(ϑ)<∞ is the unique¶¶ solution of the equation
¶¶ See Tasche [2013a, Section 4.2.4].
1 = n∑i=1wiϑ+(1−ϑ)ϱih, | (3.1d) |
when solved for h.
Assumption 3.1 introduces randomness of the difference between status indicator and PD prediction for individual facilities. Comparison between (3.2b) below and (2.4b) shows that this entails variance expansion of the sample Δ1,…,Δn.
Note that Assumption 3.1 also describes a method for recalibration of the PD estimates p1,…,pn to match targets ϑ with the weighted average of the ϑi. In contrast to (2.3), the transformation (3.1c) makes it sure that the transformed PD parameters still are values in the unit interval. In principle, instead of (3.1c) also the transformation (2.10c) could have been used. (3.1c) was preferred because it has a probabilistic foundation through Bayes' theorem. By definition of Yϑ, it holds that E[Yϑ|I=i]=ϑi.
Another simple alternative to the definition (3.1c) of ϑi would be linear scaling: ϑi=piϑpw. However, with this definition ϑi>1 may be incurred. This is not desirable because then the Bernoulli distribution for Yϑ|I=i would be ill-defined.
Proposition 3.2. For Xϑ as described in Assumption 3.1, the expected value and the variance are given by
E[Xϑ]=bw−ϑ, and | (3.2a) |
var[Xϑ]=n∑i=1wi(bi−ϑi)2−(bw−ϑ)2+n∑i=1wiϑi(1−ϑi). | (3.2b) |
Proof. Similar to the proof of Proposition 2.5.
Note that ∑ni=1wi(bi−ϑi)2 is a weighted version of the Brier Score [see, e.g., Hand, 1997] for the observation-prediction sample (b1,ϑi),…,(bn,ϑn). This observation suggests that the power of the calibration tests considered in this section will be the greater, the better the discriminatory power of the PD predictions is (reflected by lower Brier scores).
By Assumption 3.1 and Proposition 3.2, the questions on the safety of conclusions from the sign of Δw=bw−pw again can be translated into hypotheses on the value of the parameter ϑ:
● If Δw<0, can we conclude that H0:ϑ≤bw is false and H1:ϑ>bw⇔E[Xϑ]<0 is true?
● If Δw>0, can we conclude that H∗0:ϑ≥bw is false and H∗1:ϑ<bw⇔E[Xϑ]>0 is true?
If we assume as before in Section 2 that the sample Δ1,…,Δn was generated by independent realisations of Xϑ then the distribution of the sample mean is different from the distribution of Xϑ, as shown in the following corollary to Proposition 3.2.
Corollary 3.3. Let X1,ϑ,…,Xn,ϑ be independent and identically distributed copies of Xϑ as in Assumption 3.1 and define ˉXϑ=1n∑ni=1Xi,ϑ. Then for the mean and variance of ˉXϑ, it holds that
E[ˉXϑ]=bw−ϑ. | (3.3a) |
var[ˉXϑ]=1n(n∑i=1wi(bi−ϑi)2−(bw−ϑ)2+n∑i=1wiϑi(1−ϑi)). | (3.3b) |
In the following, we use ˉXϑ as the test statistic and interpret Δw=bw−pw as its observed value.
Lemma 3.4. In the setting of Assumption 3.1, ϑ<ˆϑ implies that ϑi<ˆϑi for all i=1,…,n.
Proof. Assume ϑ<ˆϑ and let h=h(ϑ) and ˆh=h(ˆϑ). Along the same lines of algebra as in Section 3 of Tasche [2013b], it can be shown that (with wi and ϱi as in Assumption 3.1) for 0<t<1 and η>0 the following two equations are equivalent:
1 = n∑i=1wit+(1−t)ϱiη⟺0 = n∑i=1wi(1−ϱiη)t+(1−t)ϱiη. | (3.4) |
Define f(t,η)=∑ni=1wi(1−ϱiη)t+(1−t)ϱiη. Then we obtain
∂f∂t(t,η) = −n∑i=1wi(1−ϱiη)2(t+(1−t)ϱiη)2 < 0, | (3.5a) |
∂f∂η(t,η) = −n∑i=1wiϱi(t+(1−t)ϱiη)2 < 0. | (3.5b) |
By definition, (3.1d) holds for ϑ and h. From (3.4) and (3.5a) then it follows that
0 > n∑i=1wi(1−ϱih)ˆϑ+(1−ˆϑ)ϱih. |
However, by (3.4) we also have
0 = n∑i=1wi(1−ϱiˆh)ˆϑ+(1−ˆϑ)ϱiˆh. |
By (3.5b), this only is possible if it holds that h>ˆh. Hence it follows that
(1−ϑ)hϑ > (1−ˆϑ)ˆhˆϑ. |
By (3.1c) (i.e. the definition of ϑi and ˆϑi), this inequality implies ϑi<ˆϑi.
Theorem 3.5. In the setting of Assumption 3.1 and Corollary 3.3, ϑ≤ˆϑ implies that
P[ˉXϑ≤x] ≤ P[ˉXˆϑ≤x],for allx∈R. |
Proof. By Lemma 3.4, ϑ≤ˆϑ implies for all i=1,…,n that ϑi≤ˆϑi and therefore also
P[Yϑ≤x|I=i] ≥ P[Yˆϑ≤x|I=i],for all x∈R. |
The remainder of the proof is identical to the last part of the proof of Proposition 2.7.
Exact p-values. Since by definition up to the constant 1/n the test statistic ˉXϑ as defined in Assumption 3.1 and Corollary 3.3 takes only integer values in the range {−n,…,−1,0,1,…,n}, its distribution can readily be exactly determined by means of an inverse Fourier transform [Rolski et al., 1999, Section 4.7]. By Theorem 3.5 and Theorem 8.3.27 of Casella and Berger [2002], then a p-value for the test of H0:ϑ≤bw against H1:ϑ>bw can exactly be computed as
p-value = P[ˉXbw≤bw−pw]. | (3.6a) |
A p-value for the test of H∗0:ϑ≥bw against H∗1:ϑ<bw is given by
p-value∗ = P[ˉXbw≥bw−pw]. | (3.6b) |
Normal approximate test. By Corollary 3.3, we find that the distribution of ˉXbw can be approximated by a normal distribution with mean 0 and variance as shown on the right-hand side of (3.3b). With x=bw−pw, one obtains for the approximate p-value of H0:ϑ≤bw against H1:ϑ>bw:
p-value = P[ˉXbw≤x] ≈ Φ(√n(bw−pw)√∑ni=1wi(bi−ˆϑi)2+∑ni=1wiˆϑi(1−ˆϑi)), | (3.7a) |
with ˆϑi=bwbw+(1−bw)ϱih(bw) as in Assumption 3.1. The same reasoning gives for the normal approximate p-value of H∗0:ϑ≥ℓw against H∗1:ϑ<ℓw:
p-value∗ ≈ 1−Φ(√n(bw−pw)√∑ni=1wi(bi−ˆϑi)2+∑ni=1wiˆϑi(1−ˆϑi)). | (3.7b) |
In Section 2.5.3.1 of ECBintructions, the ECB proposes "PD back testing using a Jeffreys test". Transcribed into the notation of Section 3.1 of this paper, the starting point for the test can be described as follows:
● n=N, where "N is the number of customers in the portfolio/rating grade".
● ∑ni=1bi=D, where "D is the number of those customers that have defaulted within that observation period".
● 1n∑ni=1pi=PD, where PD means the "PD [probability of default] of the portfolio/rating grade".
● All wi equal 1/n.
The Jeffreys test for the success parameter of a binomial distribution.
● In a Bayesian setting, an "objective Bayesian" prior distribution beta(1/2,1/2) for the PD is chosen such that – assuming a binomial distribution for the number of defaults – the posterior distribution (i.e. conditional on the observed number of defaults) of the PD is beta(D+1/2,N−D+1/2). See Kazianka2016 for the rationale for choosing this method of test. If estimated as the mean of the posterior distribution, the Bayesian PD estimate is D+1/2N+1.
● The Null hypothesis is "the PD applied in the portfolio/rating grade … is greater than the true one (one sided hypothesis test)", i.e. H0:θ≤ˆθ with ˆθ= "applied PD" and θ= "true PD". In the notation of Section 3.1, this can be phrased as testing H∗0:ϑ≥b1/n against H∗1:ϑ<b1/n.
● ECBintructions: "The test statistic is the PD of the portfolio/rating grade." The construction principle for the Jeffreys test is to determine a credibility interval for the PD and then to check if the applied PD is inside or outside of the interval.
● The p-value for this kind of Jeffreys test is
p-valueJeffreys = FD+1/2,N−D+1/2(PD), | (3.8) |
where Fα,β denotes the distribution function of the beta(α,β)-distribution.
Comments.
● The standard (frequentist) one-sided binomial test would be: 'Reject H0 if D≥c' where c is a 'critical' value such the probability under H0 to observe c or more defaults is small. For this test, the p-value is
p-valuefreq = N∑i=D(Ni)PDi(1−PD)N−i = FD,N−D+1(PD). | (3.9) |
Hence, unless the observed number of default D is very small or even zero, from (3.8) it follows that in practice most of the time the Jeffreys test and the standard binomial test give similar results.
● For a 'fair' comparison of the Jeffreys test and the test proposed in Section 3.1, we have to modify Assumption 3.1 such that there is no variance expansion and all weights are equal, i.e. the random variable Xϑ is simply defined by
P[Xϑ=bi−ϑi] = 1n,i=1,…,n, | (3.10) |
where the ϑi depend on the unknown parameter 0<ϑ<1 in the way described by (3.1c) and (3.1d). The normal approximate p-value of H0 against H1 is then (using the ECB notation)
p-value ≈ 1−Φ(√N(D/N−PD)√D/N(1−D/N)). | (3.11) |
● The normal approximation of the frequentist (and by (3.8) and (3.9) also Jeffreys) binomial test p-value is
p-valuefreq ≈ 1−Φ(√N(D/N−PD)√PD(1−PD)). | (3.12) |
● The test for H0 as required by the ECB would typically be performed when D/N>PD, i.e. when there are doubts with regard to the conservatism of the PD estimate. Rejection of H0 would then be regarded as 'proof' of the estimate being aggressive while non-rejection would entail 'acquittal' for lack of evidence. In case of 1/2≥D/N>PD, it holds that PD(1−PD)<D/N(1−D/N) such that the p-value according to the ECB test is lower than the p-value according to (3.10) and (3.11), i.e. the ECB test would reject H0 earlier than the simplified version of the test according to Section 3.1.
The test methods of Section 2 and the appendices are illustrated in Section 4.1 below with numerical results from tests on a data set from Fischer and Pfeuffer [2014, Table 1]. The test methods of Section 3 are illustrated in Section 4.2 below with numerical results from tests on a data set consisting of simulated data. However, the exposures in the data set are again from Fischer and Pfeuffer [2014, Table 1]. A zip-archive with the R-scripts and csv-files that were used for computing the results can be downloaded from https://www.researchgate.net/profile/Dirk_Tasche.
![]() |
Explanations.
● Sample means: According to (2.1). Weights according to (2.2b) with EAD from the column 'raw.w' of the data set, and wi=1/100 in the equally weighted case.
● Sample standard deviations: First two values according to the square root of the right-hand side of (2.4b). Third value also according to (2.4b), but with ˜Δi from (A3.a) and equal weights.
● Weights according to (2.2b) with EAD from the column 'raw.w' of the data set.
● Sample quantiles: Based on sample Δ1,…,Δ100 computed as difference of columns 'obs' and 'pred' of the data set.
● Weight-adjusted sample quantiles: Based on sample ˜Δ1,…,˜Δ100 according to (A3.a).
● t-test results: 'Eq-weighted' according to (2.9) and 1−p-value∗ for the first row of the t-test results. 'Weighted' analogously adapted for the weighted case (but without strong theoretical foundation). 'W-adjusted' like 'Eq-weighted' but for the sample ˜Δ1,…,˜Δ100.
● 'Basic' results: Bootstrapped according to (2.6a) and (2.6b) respectively, with weights and samples like for the t-test rows.
● 'Basic normal' results: Normal approximations according to (2.8b) and (2.8c) respectively, with weights and samples like for the t-test rows.
● 'Expanded variance' results: With weights and samples like for the t-test rows, bootstrapped according to (2.16a) and (2.16b) respectively for the first two values, and according to (B6.a) and (B6.b) respectively for the third value.
● 'Exp var normal' results: With weights and samples like for the t-test rows, normal approximations according to (2.18b) and (2.18c) respectively for the first two values, and according to (B7.a) and (B7.b) respectively for the third value.
This example demonstrates that
● test results based on equally weighted means and means with inhomogeneous weights can lead to contradictory conclusions,
● variance expansion to capture the individual randomness of single observation-prediction pairs can have some impact on the degree of certainty of the test results, by entailing greater p-values, and
● the two different approaches to account for the weights of the observation-prediction pairs discussed in this paper can deliver similar but still clearly different results.
![]() |
![]() |
Explanations.
● See Section 4.1 for an explanation of the summary of the sample distribution.
● 'Jeffreys' results: The 'Eq-weighted' value for 'H0: mean(obs-pred)≤0 vs. H1: mean(obs-pred)>0' is computed according to (3.8). The 'Eq-weighted' value for 'H0: mean(obs-pred)≥0 vs. H1: mean(obs-pred)<0' is 1−p-valueJeffreys. No 'Weighted' results are computed because there is no obvious 'weighted mean'-version of the binomial Jeffreys test.
● 'Basic' results: Bootstrapped according to (2.6a) and (2.6b) respectively.
● 'Basic normal' results: Normal approximations according to (2.8b) and (2.8c) respectively.
● 'Expanded variance' results: Exact p-values by inverse Fourier transform according to (3.6a) and (3.6b) respectively.
● 'Exp var normal' results: Normal approximations according to (3.7b) and (3.7c) respectively.
This example demonstrates that
● as mentioned in Section 3.2, the Jeffreys test has a tendency to earlier reject 'H0: mean(obs-pred)≤0' than the other tests discussed in Section 3,
● test results based on equally weighted means and means with inhomogeneous weights can lead to different outcomes (no conclusion vs. rejection of the null hypothesis), and
● variance expansion to capture the individual randomness of single observation-prediction pairs can have some impact on the degree of certainty of the test results, by entailing greater p-values.
In this paper, we have made suggestions of how to improve on the t-test and the Jeffreys test presented in ECBintructions for assessing the 'preditive ability (or calibration)' of credit risk parameters. The improvements refer to
● also testing the null hypothesis that the estimated parameter is less than or equal to the true parameter in order to be able to 'prove' that the estimate is prudent (or conservative),
● additionally using exposure- or limit-weighted sample averages in order to better inform assessments of estimation (or prediction) prudence, and
● 'variance expansion' in order to account for sample inhomogeneity in terms of composition (exposures sizes) and riskiness.
The suggested test methods have been illustrated with exemplary test results. R-scripts with code for the tests are available.
The author is grateful to two anonymous reviewers whose comments redounded to significant improvements of the paper.
The author declares no conflicts of interest in this paper.
[1] | BCBS. International Convergence of Capital Measurement and Capital Standards. A Revised Framework, Comprehensive Version, 2006. |
[2] | Bellini T(2019) IFRS 9 and CECL Credit Risk Modelling and Validation: A Practical Guide with Examples Worked in R and SAS. Academic Press. |
[3] | Blümke O(2019) Out-of-Time Validation of Default Probabilities within the Basel Accord: A comparative study. SSRN Electron J. http://dx.doi.org/10.2139/ssrn.2945931 |
[4] | Casella G, Berger RL (2002) Statistical Inference. Duxbury Press, second edition. |
[5] | Davison AC, Hinkley DV (1997) Bootstrap Methods and their Application. Cambridge University Press. |
[6] | ECB. Instructions for reporting the validation results of internal models: IRB Pillar I models for credit risk. European Central Bank – Banking Supervision, 2019. |
[7] |
Fischer M, Pfeuffer M (2014) A statistical repertoire for quantitative loss given default validation: overview, illustration, pitfalls and extensions. J Risk Model Validation 8: 3–29. http://dx.doi.org/10.21314/JRMV.2014.115 doi: 10.21314/JRMV.2014.115
![]() |
[8] |
Gürtler M, Hibbeln MT, Usselmann P (2018) Exposure at default modeling – A theoretical and empirical assessment of estimation approaches and parameter choice. J Bank Financ 91: 176–188. http://dx.doi.org/10.1016/j.jbankfin.2017.03.004 doi: 10.1016/j.jbankfin.2017.03.004
![]() |
[9] | Hand DJ (1997) Construction and Assessment of Classification Rules. John Wiley & Sons, Chichester. |
[10] |
Kazianka H (2016) Objective Bayesian estimation of the probability of default. J R Stat Soc Ser C 65: 1–27, 2016. http://dx.doi.org/10.1111/rssc.12107 doi: 10.1111/rssc.12107
![]() |
[11] |
Li D, Bhariok R, Keenan S, et al. (2009) Validation techniques and performance metrics for loss given default models. J Risk Model Validation 3: 3–26. http://dx.doi.org/10.21314/JRMV.2009.045 doi: 10.21314/JRMV.2009.045
![]() |
[12] |
Loterman G, Debruyne M, Vanden BK, et al. (2014) A proposed framework for backtesting loss given default models. J Risk Model Validation 8: 69–90. http://dx.doi.org/10.21314/JRMV.2014.117 doi: 10.21314/JRMV.2014.117
![]() |
[13] | Mendenhall W, Beaver RJ, Beaver BM (2008) Introduction to probability and statistics. Cengage Learning, 13th edition. |
[14] | Rolski T, Schmidli H, Schmidt V, et al. (1999) Stochastic Processes for Insurance and Finance. Wiley Series in Probability and Statistics. John Wiley Sons. |
[15] | Scandizzo S (2016) The validation of risk models: A handbook for practitioners. Springer. https://doi.org/10.1057/9781137436962 |
[16] |
Tasche D (2013a) The art of probability-of-default curve calibration. J Credit Risk, 9: 63–103. https://doi.org/10.21314/JCR.2013.169 doi: 10.21314/JCR.2013.169
![]() |
[17] | Tasche D (2013b) The Law of Total Odds. Working paper. https://doi.org/10.48550/arXiv.1312.0365. |
[18] | Venables WN, Ripley BD (2002) Modern Applied Statistics with S. Springer, fourth edition. |
![]() |
![]() |
1. | Montaser Abdelsattar, Mohamed A. Ismeil, Mohamed M. A. Azim Zayed, Ahmed Abdelmoety, Ahmed Emad-Eldeen, Assessing Machine Learning Approaches for Photovoltaic Energy Prediction in Sustainable Energy Systems, 2024, 12, 2169-3536, 107599, 10.1109/ACCESS.2024.3437191 |