Research article

Resolutions to flip-over credit risk and beyond-least squares estimates and maximum likelihood estimates with monotonic constraints

  • Received: 03 December 2018 Accepted: 18 March 2019 Published: 22 March 2019
  • Given a risk outcome y over a rating system {Ri}ki=1 for a portfolio, we show in this paper that the maximum likelihood estimates with monotonic constraints, when y is binary (the Bernoulli likelihood) or takes values in the interval 0≤y≤1 (the quasi-Bernoulli likelihood), are each given by the average of the observed outcomes for some consecutive rating indexes. These estimates are in average equal to the sample average risk over the portfolio and coincide with the estimates by least squares with the same monotonic constraints. These results are the exact solution of the corresponding constrained optimization. A non-parametric algorithm for the exact solution is proposed. For the least squares estimates, this algorithm is compared with “pool adjacent violators” algorithm for isotonic regression. The proposed approaches provide a resolution to flip-over credit risk and a tool to determine the fair risk scales over a rating system.

    Citation: Bill Huajian Yang. Resolutions to flip-over credit risk and beyond-least squares estimates and maximum likelihood estimates with monotonic constraints[J]. Big Data and Information Analytics, 2018, 3(2): 54-67. doi: 10.3934/bdia.2018007

    Related Papers:

    [1] Bill Huajian Yang, Jenny Yang, Haoji Yang . Modeling portfolio loss by interval distributions. Big Data and Information Analytics, 2020, 5(1): 1-13. doi: 10.3934/bdia.2020001
    [2] Bill Huajian Yang . Modeling path-dependent state transitions by a recurrent neural network. Big Data and Information Analytics, 2022, 7(0): 1-12. doi: 10.3934/bdia.2022001
    [3] Jiaqi Ma, Hui Chang, Xiaoqing Zhong, Yueli Chen . Risk stratification of sepsis death based on machine learning algorithm. Big Data and Information Analytics, 2024, 8(0): 26-42. doi: 10.3934/bdia.2024002
    [4] Prince Peprah Osei, Ajay Jasra . Estimating option prices using multilevel particle filters. Big Data and Information Analytics, 2018, 3(2): 24-40. doi: 10.3934/bdia.2018005
    [5] Maria Gabriella Mosquera, Vlado Keselj . Identifying electronic gaming machine gambling personae through unsupervised session classification. Big Data and Information Analytics, 2017, 2(2): 141-175. doi: 10.3934/bdia.2017015
    [6] S. Chen, Z. Wang, M. Kelly . Aggregate loss model with Poisson-Tweedie frequency. Big Data and Information Analytics, 2021, 6(0): 56-73. doi: 10.3934/bdia.2021005
    [7] Wenxue Huang, Xiaofeng Li, Yuanyi Pan . Increase statistical reliability without losing predictive power by merging classes and adding variables. Big Data and Information Analytics, 2016, 1(4): 341-348. doi: 10.3934/bdia.2016014
    [8] Xiaoxiang Guo, Zuolin Shi, Bin Li . Multivariate polynomial regression by an explainable sigma-pi neural network. Big Data and Information Analytics, 2024, 8(0): 65-79. doi: 10.3934/bdia.2024004
    [9] Sunmoo Yoon, Maria Patrao, Debbie Schauer, Jose Gutierrez . Prediction Models for Burden of Caregivers Applying Data Mining Techniques. Big Data and Information Analytics, 2017, 2(3): 209-217. doi: 10.3934/bdia.2017014
    [10] Ugo Avila-Ponce de León, Ángel G. C. Pérez, Eric Avila-Vales . A data driven analysis and forecast of an SEIARD epidemic model for COVID-19 in Mexico. Big Data and Information Analytics, 2020, 5(1): 14-28. doi: 10.3934/bdia.2020002
  • Given a risk outcome y over a rating system {Ri}ki=1 for a portfolio, we show in this paper that the maximum likelihood estimates with monotonic constraints, when y is binary (the Bernoulli likelihood) or takes values in the interval 0≤y≤1 (the quasi-Bernoulli likelihood), are each given by the average of the observed outcomes for some consecutive rating indexes. These estimates are in average equal to the sample average risk over the portfolio and coincide with the estimates by least squares with the same monotonic constraints. These results are the exact solution of the corresponding constrained optimization. A non-parametric algorithm for the exact solution is proposed. For the least squares estimates, this algorithm is compared with “pool adjacent violators” algorithm for isotonic regression. The proposed approaches provide a resolution to flip-over credit risk and a tool to determine the fair risk scales over a rating system.


    1. Introduction

    Flip-over is a phenomenon where a low risk segment has a larger value of risk estimate than a high risk segment. It is usually caused by over-segmentation when practitioners seek discriminatory power greedily in the model development stage. This means that a segment is forced to split further into several small segments for a seemly in-sample increase of the discriminatory power, but they have no obvious difference from the population perspectives. When flip-over occurs, practitioners typically combine segments manually, or through hierarchical clustering.

    We show in this paper that the flip-over phenomenon can be resolved by approaches based on least squares estimates or maximum likelihood estimates with monotonic constraints.

    Let {Ri}ki=1 denote a segmentation or the non-default risk ratings for a risk-rated portfolio. Let y, -∞ < y < +∞, be a general risk outcome, for example, the loan loss, the exposure at default, or the default indicator. A monotonicity rule is assumed: a higher index Ri is expected to carry higher risk, i.e., the expected value of y is higher for a higher index rating.

    Monotonic constraints are widely used in learning processes. Examples of learnings, where monotonic constraints are imposed, include isotonic regression [2,3,5,8] risk scale estimation for a rating system [18] classification tree [11] rule learning [6] binning [1,4] and deep lattice network [19].

    We use the following notations: For a given a sample S, let yij denote the jth observation of the risk outcome over Ri and ni the total number of observations for Ri. We assume ni > 0. Let di=nij=1yij be the sum of all the observed y-values, and ri = di/ni, the average observed risk for Ri.

    We are interested in the least squares estimates {pi}ki=1 that minimize the sum squared error (1.1) subject to monotonic constraints (1.2) below:

    SSE=ki=1nij=1(yijpi)2, (1.1)
    p1p2pk. (1.2)

    When y is binary (e.g. the default indicator) or takes values in the interval 0 ≤ y ≤ 1, we are interested in the maximum likelihood estimates {pi}ki=1 that maximize the log-likelihood (1.3) below subject to (1.2):

    LL=ki=1[dilog(pi)+(nidi)log(1pi)], (1.3)

    where the additive term dilog(pi)+(nidi)log(1pi) corresponds to the Bernoulli log-likelihood when y is binary, i.e., we assume that the risk outcome y over rating Ri follows a Bernoulli distribution with probability pi. It corresponds to the quasi-Bernoulli log-likelihood when y takes values in the interval 0 ≤ y ≤ 1 [10].

    Main results. In this paper, we show that (see Propositions 3.1 and 4.1), for a given sample S = {yij | 1 ≤ ik, 1 ≤ jni}, there exist partition integers {ki}mi=0, where 0 = k0 < k1 < … < km = k, such that the values {pj}kj=1, given by (1.4) below, minimize (1.1) and maximize (1.3), subject to (1.2):

    pj=dki1+1+dki1+2++dkinki1+1+nki1+2++nki,ki1+1jki, (1.4)

    These {pj}kj=1 satisfy the equation below:

    n1p1+n2p2++nkpkn=dn (1.5)

    where

    n=n1+n2++nk, (1.6)
    d=d1+d2++dk. (1.7)

    These results are the exact solution for the corresponding constrained optimization and are proved in a more general setting under weighted least squares and weighted maximum likelihood.

    Given the above results, flip-over credit risk can be resolved by combining each group with indexes in ki1+1jki, and replacing their estimates by the average of the risk over the group.

    One of the most important estimations with monotonic constraints is the isotonic regression [2] Given values {rj}kj=1, the goal of isotonic regression is to find {pi}ki=1, subject to (1.2), that minimize the weighted sum squares k(i=1)wi(ripi)2, where {wi}ki=1 are the given weights. A unique exact solution to the isotonic regression problem exists and can be obtained by a non-parametric algorithm called Pool Adjacent Violators (PAV) [2,3,5,8].

    A non-parametric algorithm (Algorithm 5.1) with time complexity O(k2) is proposed in Section. 5 for finding these partition integers in (1.4), hence the estimates. For estimates with general monotonic constraints, we propose a parametric algorithm (Algorithm 5.2) for least squares estimates with constraints: pipi+1+ϵi for 1 ≤ ik and ϵi ≥ 0, and for maximum likelihood estimates with constraints:pi+1/pi ≥ 1+ϵ for 1 ≤ ik and ϵ ≥ 0. A detailed comparison between the PAV algorithm and the non-parametric algorithm proposed in this paper can be found in Section. 6.1.

    The key ideas to the proof of (1.4) and the algorithms proposed in this paper are the re-parameterization of the estimates so that (1.2) is automatically satisfied. Consequently, the constrained programming is transformed into a tractable non-constrained mathematical programming problem (see Section. 3, 4 and 5).

    The paper is organized as follows: In Section. 2, we define the partition integer for a given sample. A formula like (1.4) is shown in Section. 3 for weighted maximum likelihood estimates and in Section. 4 for weighted least squares estimates. The non-parametric algorithm for the exact solution is proposed in Section. 5. In Section. 6, we illustrate how this proposed non-parametric algorithm can be used to determine the fair risk scales over a rating system. Applications to risk-supervised monotonic binning are also discussed.


    2. The partition integers

    For a given sample S={yij|1ik,1jni}, let {wi}ki=1 denote the given weights, where wi > 0 is the weight assigned to the observed outcomes {yij}nij=1 for Ri. We use the notations introduced in Section. 1 and let ri = di/ni and di=nij=1yij.

    For 1 ≤ ijk, let

    u(i,j)=riniwi+ri+1ni+1wi+1++rjnjwjniwi+ni+1wi+1++njwj (2.1)
    =diwi+di+1wi+1++djwjniwi+ni+1wi+1++njwj. (2.2)

    By (2.1), u(i, j) is the weighted average of {ri , ri+1, …, rj} where rh is weighted by nhwh. Specifically, we have

    u(1,k)=d1w1+d2w2++dkwkn1w1+n2w2++nkwk=DN, (2.3)

    the weighted average of {ri}ki=1 over the portfolio, where N and D are defined respectively by (2.4) and (2.5) below:

    N=n1w1+n2w2++nkwk, (2.4)
    D=d1w1+d2w2++dkwk. (2.5)

    Let {ki}mi=0 be the partition integers, where 0 = k0 < k1 < … < km = k, such that (2.6) and (2.7) below hold for 0 < im:

    u(ki1+1,ki)=min{u(ki1+1,j)|ki1+1jk}, (2.6)
    u(ki1+1,ki)<u(ki1+1,ki+1). (2.7)

    That is, given ki-1, the integer ki is the largest index such that u(ki1+1,j) reaches its minimum at j = ki within all remaining indexes jki-1+1. When {ri}ki=1 are strictly increasing, we have m = k and {ki}mi=1={1,2,,k}.

    By (2.6) and (2.7), we have the following inequalities:

    u(1,k1)<u(k1+1,k2)<<u(km1+1,km). (2.8)

    This is because if, for example, u(1,k1)u(k1+1,k2), then we have:

    u(1,k2)=n1w1+n2w2++nk1wk1n1w1+n2w2++nk2wk2u(1,k1)+nk1+1wk1+1+nk1+2wk1+2++nk2wk2n1w1+n2w2++nk2wk2u(k1+1,k2)n1w1+n2w2++nk1wk1n1w1+n2w2++nk2wk2u(1,k1)+nk1+1wk1+1+nk1+2wk1+2++nk2wk2n1w1+n2w2++nk2wk2u(1,k1)=u(1,k1).

    This contradicts the fact that k1 is the largest index where u(1, j) reaches its minimum at j = ki for all jki1+1.


    3. Weighted maximum likelihood estimates with monotonic constraints

    Under weighted maximum likelihood framework, log-likelihood (1.3) becomes

    LL=ki=1wi[dilog(pi)+(nidi)log(1pi)]. (3.1)

    We are interested in the weighted maximum likelihood estimates {pi}k(i=1) that maximize (3.1) subject to (1.2).

    Let fi(pi)=dilog(pi)+(nidi)log(1pi) be an additive term of (3.1). Values fi(1) and fi(0) are defined as follows: By taking the limit of fi(pi) when pi approaches 1 from the left, we can assume fi(1)=0 if di=ni, and fi(1)= if di < ni. Similarly, by taking the limit of fi(pi) when pi approaches 0 from the right, we can assume fi(0)=0 if di = 0, and fi(0)= if di > 0. In absence of (1.2), the sample means {ri}ki=1 maximize (3.1), because each fi(pi) is maximized at pi = ri . This is true when ri = 0, 1. For 0 < ri < 1, one can see it by taking the derivative for the additive term with respect to pi and set it to zero (see [18]).

    Proposition 3.1. With the partition integers {ki}mi=0 defined by (2.6) and (2.7), the values {pj}kj=1 given by (3.2) below maximize (3.1) subject to (1.2):

    pj=u(ki1+1,ki)=dki1+1wki1+1+dki1+2wki1+2++dkiwkinki1+1wki1+1+nki1+2wki1+2++nkiwki,whereki1+1jki, (3.2)

    In addition, the following equation holds:

    n1w1p1+n2w2p2++nkwkpkn1w1+n2w2++nkwk=DN. (3.3)

    Proof. First, by (2.8), the estimates {pi}ki=1 given specifically by (3.2) satisfy (1.2). By (3.2) and (2.2), the sum of {njwjpj|ki1+1jki} is equal to the sum of {dwj|ki1+1jki}. Thus, we have:

    n1w1p1+n2w2p2++nkwkpk=d1w1+d2w2++dkwk=D.

    Therefore, with these specific values for {pi}ki=1, Eq (3.3) holds.

    For ij, let

    LL=ki=1wi[dilog(pi)+(nidi)log(1pi)]=mi=1LL(ki1+1,ki)

    where

    LL(ki1+1,ki)=kih=ki1+1wh[dhlog(ph)+(nhdh)log(1ph)].

    Because of (2.8), it suffices to show that each log-likelihood LL(ki1+1,ki) is maximized at pj=u(ki1+1,ki) for ki1+1jki, subject to (1.2) within the range ki1+1jki. We show only the case i = 1 where LL(ki1+1,ki) is LL(1,k1). The proof for other cases is similar. Without loss of generality, we assume k1 = k. In this case, m = 1, k1 = k, and LL(1,k)=LL.

    As the maximum likelihood estimates for probabilities, 0 ≤ pj ≤ 1 for 1 ≤ jk. Consider the following four cases: (a) pk = 1. Then the additive term fk(pk), hence LL, takes value -∞ if dk < nk. Hence dk = nk, and rk = 1. Because u(1, j) reaches its minimum at j = k for 1 ≤ jk, we must have rj = 1 for all 1 ≤ jk, by (2.1). Therefore u(1, k) = 1 and, by (3.2), pj = 1 for all 1 ≤ jk. These values of {pj}kj=1 do maximize LL subject to (1.2). (b) p1 = 0. As for case (a), we have pj = 0 for all 1 ≤ jk. We must have dj = 0 for all 1 ≤ jk, and u(1, k) = 0 (thus the proposition holds). Otherwise fk(pk), hence LL, takes value -∞. (c) u(1, k) = 1. Then rj = 1 for all 1 ≤ jk. As for case (a), the proposition holds. (d) u(1, k) = 0. Then rj = 0 for all 1 ≤ jk. As for case (b), the proposition holds.

    Therefore, we can assume 0<u(1,k)<1, pk < 1, and p1 > 0. Then we can parameterize pj for 1 ≤ jk by letting

    pk+1j=exp[(b1+b2++bj)],bj=a2j, (3.4)

    where -∞ < aj < +∞ for 1 ≤ jk. With this parameterization, (1.2) is satisfied. By plugging (3.4) into LL, we transform the constrained optimization problem to a non-constrained mathematical programming problem. The partial derivative of LL with respect to aj is given by:

    LLaj=ki=1ajwk+1i[dk+1ilog(pk+1i)+(nk+1idk+1i)log(1pk+1i)]=ki=jwk+1i[2ajdk+1i+2aj(nk+1idk+1i)pk+1i1pk+1i]=ki=jwk+1i[2ajdk+1i2aj(nk+1idk+1i)+2aj(nk+1idk+1i)1pk+1i]=2ajki=jwk+1i[(nk+1idk+1i)1pk+1ink+1i]=2ajk+1ji=1wi[(nidi)1pini]=2ajg(j)

    where

    g(j)=k+1ji=1wi[nidi1pini]. (3.5)

    Setting this partial derivative to zero we have either aj = 0 or g(j) = 0. For j = 1, we have a1≠0, otherwise pk = 1, contrary to our assumption. Thus, we have

    0=g(1)=ki=1wi[nidi1pini]. (3.6)

    We claim that aj = 0 for all 1 < jk. If this is true, then p1 = p2 = … = pk. Then by (3.6) we have p1=DN=u(1,k), and the proof follows. Suppose 1=i1<<iH (where 1 < H and iHk) are all the indexes such that g(ih)=0 and aih≠0 for 1 ≤ hH. For 1 < h < H, we have:

    0=g(ih1)g(ih)=k+1ih1i=1wi[nidi1pini]k+1ihi=1wi[nidi1pini]=k+1ih1i=k+2ihwi[nidi1pini]. (3.7)

    Since aj = 0 when ih1<j<ih, all {pi}k+1ih1i=k+2i0 are equal to pk+1ih1. Thus (3.7) becomes

    0=k+1ih1i=k+2ihwi[(nidi)ni(1pi)]. (3.8)

    Solving (3.8) for pk+1ih1, we have:

    pk+1ih1=dk+2iwk+2i+dk+3iwk+3i++dk+1jwk+1jnk+2iwk+2i+nk+3iwk+3i++nk+jiwk+ji=u(k+2i,k+1j). (3.9)

    where i = ih and j = ih-1. Similarly for iH, we have g(iH)=0, thus:

    0=k+1iHi=1wi[nidi1pini]. (3.10)

    Since all aj = 0 when iH<jk, all {pi}k+1iHi=1 are equal to p1. By (3.10) we have p1 = u(1,k+1iH). Consequently, each of {pi}ki=1 is either u(1,k+1iH) or is given by one of {u(k+2ih,k+1ih1). Thus Eq (3.3) holds, because by (2.2) the sum of {njwjpj|k+2ihjk+1ih1} is equal to the sum of {dwj|k+2ihjk+1ih1}, and the sum of {njwjpj|1jk+1iH} is equal to the sum of {dwj|1jk+1iH}.

    Now that the weighted sum in the right-hand-side of (3.3) must be larger than p1, because pi > p1 for all ik+2-iH. However, the left-hand-side of (3.3) is u(1,k), therefore we have:

    u(1,k+1iH)=p1<u(1,k).

    This contradicts to the assumption that i = k is the largest index within 1 ≤ ik such that u(1,k) reaches the minimum.


    4. Weighted least squares estimates with monotonic constraints

    We use the notations introduced in Section. 1. Under weighted least squares framework, (1.1) changes to (4.1) below:

    SSE=ki=1nij=1wi(yijpi)2. (4.1)
    SSE=ki=1nij=1wi(yijri)2+ki=1niwi(ripi)2=SSE1+SSE2

    where SSE1=ki=1nij=1wi(yijri)2 and

    SSE2=ki=1niwi(ripi)2. (4.2)

    Since SSE1 is a constant term, the weighted least squares estimates are the estimates {pi}ki=1 that minimize (4.2) subject to (1.2). Note that, in absence of (1.2), {ri}ki=1 minimize (4.1).

    Proposition 4.1. With the partition integers {ki}mi=0 defined by (2.6) and (2.7), the values {pi}ki=1 given as in Proposition 3.1 by (4.3) below, minimize (4.2) subject to (1.2):

    pj=u(ki1+1,ki),whereki1+1jki, (4.3)

    In addition, the following equation holds:

    n1w1p1+n2w2p2++nkwkpkn1w1+n2w2++nkwk=DN. (4.4)

    Proof. As shown the proof of Proposition 3.1, values {pi}ki=1 given by (4.3) satisfy (1.2) and (4.4). Next, for ij, let SSE=mi=1SSE(ki1+1,ki), where

    SSE(ki1+1,ki)=kih=ki1+1nhg=1wh(yhgph)2.

    Because of (2.8), it suffices to show SSE(ki1+1,ki) is minimized at pj=u(ki1+1,ki) subject to (1.2), where ki1+1jki. We show only the case i = 1 where SSE(ki1+1,ki) is SSE(1, k1). The proof for other cases is similar. Without loss of generality, we assume ki = k. In this case, m = 1 and k1 = k, and SSE(1, k) = SSE.

    Parameterize pj by letting p1 = a1, and for 2 ≤ jk,

    pj=a1+(b2++bj),bj=a2j (4.5)

    where -∞ < aj < +∞ for 1 ≤ jk. With this parametrization, (1.2) is satisfied. By plugging (4.5) into (4.1), we transform the constrained optimization problem to a non-constrained mathematical programming problem. We take the partial derivative of SSE with respect to aj. For j ≥ 2, we have

    SSEaj=ki=jnig=14ajwi(yigpi)=4ajki=jwi(dinipi)=4ajf(j)

    where f(j)==ki=jwi(dinipi). Setting this derivative to zero, we have either aj = 0 or f(j) = 0. For j = 1, we have

    SSEa1=ki=1nig=12wi(yigpi)=2ki=1wi(dinipi)=2f(1).

    Setting this derivative to zero, we have

    0=f(1)=ki=1wi(dinipi).

    This implies:

    n1w1p1+n2w2p2++nkwkpkn1w1+n2w2++nkwk=DN=u(1,k).

    This shows that the weighted least squares estimates {pi}ki=1, before their true values are found, satisfy (4.4). We claim that aj = 0 for all 1 < jk. If this is true, then p1 = p2 = … = pk. Then by (4.4) we have p1=DN=u(1,k), and the proof follows. Otherwise, let i0 > 1 be the smallest integer such that ai0≠0, and aj = 0 whenever 1 < j < i0. Then we have f(1) = 0 and f(i0) = 0. Thus

    0=f(1)f(i0)=i01i=1wi(dinipi). (4.6)

    Since aj = 0 when 1 < j < i0, all {pj}i01j=1 are equal to p1. Thus by (4.6) and (2.2), we have

    p1=d1w1+d2w2++di01wi01n1w1+n2w2++ni01wi01=u(1,i01). (4.7)

    However, ai0≠0, thus p1 < pi0. Thus by (4.4), (1.2), and (4.7), we have

    DN=ki=1niwiNpi>ki=1niwiNp1=p1=u(1,i01).

    Thus, we have u(1,i01)<DN=u(1,k). This contradicts the fact that j = k is the largest index where u(1, j) reaches its minimum for all j ≥ 1. Therefore, we have a2 = a3 = … = ak = 0, and all {pi}ki=1 are equal to p1.


    5. Algorithms for least squares estimates or maximum likelihood estimates with monotonic constraints

    First, we propose a non-parametric search algorithm, with time complexity O(k2), for finding the partition integers 0 = k0 < k1 < … < km = k defined by (2.6) and (2.7), and then calculate by (3.2) or (4.3) for the estimates {pi}ki=1 subject to (1.2).

    Algorithm 5.1 (Non-parametric). Set k0 = 0. Assume that partition integers {kh}i1h=1 have been found for an integer i > 0, and that {pj}ki1j=1 have been calculated by (3.2) or (4.3).

    (a) Scan into the remaining indexes range ki-1+1 ≤ jk for a value j = ki such that

    (ki1+1,j)=dki1+1wki1+1+dki1+2wki1+2++djwjnki1+1wki1+1+nki1+2wki1+2++njwj

    reaches its minimum in the range ki-1+1 ≤ jk, and j = ki is the largest index for this minimum.

    (b) Calculate pj, ki-1+1 ≤ jki, by (3.2) or (4.3) as u(ki-1+1, ki).

    Repeat steps (a) and (b) until there are no more remaining indexes to partition.

    For the optimization problems of (1.1) and (1.3), when general monotonic constraints are required (including strictly monotonic constraints), we propose the following parametric algorithm, which can be implemented by using SAS procedure PROC NLMIXED [14].

    Algorithm 5.2 (Parametric). For problem (1.1), parameterize pi by letting p1 = a1, and for 2 ≤ ik,

    pi=a1+(b2++bj),bi=a2i+ϵi,2ik (5.1)

    where {ϵi0}ki=1 are the given constants. Then pi-pi-1ϵi. For problem (1.3), let b1 = a12, and bi = ai2+ϵ for 2 ≤ ik, where ϵ ≥ 0. Parameterize pi by letting

    pk+1i=exp((b1+b2++pi)). (5.2)

    Then pi/pi-1 ≥ exp⁡(ϵ). Plug the corresponding parameterization into (1.1) or (1.3) and perform the non-constrained mathematical programming to obtain the estimates {ai}ki=1, hence {pi}ki=1 by (5.1) and (5.2).


    6. Applications


    6.1. Isotonic regression

    Given real numbers {ri}ki=1, the task of isotonic regression is to find {pi}ki=1 that minimize the weighted sum squares ki=1wi(ripi)2, where {wi}ki=1 are the given weights. When wi is 1 and ri takes value 0 or 1 for all i's, it is known [13] that the results for isotonic regression coincide with the maximum likelihood estimates subject to (1.2) for log-likelihood ki=1[rilog(pi)+(1ri)log(1pi)].

    A unique exact solution to the isotonic regression exists and can be found by a non-parametric algorithm called Pool Adjacent Violators (PAV) [2]. The basic idea as described in [5] is the following: Starting with r1, we move to the right and stop at the first place where ri > ri+1. Since ri+1 violates the monotonic assumption, we pool ri and ri+1 replacing both with their weighted average. Call this average ri* = ri+1 = (wiri+wi+1ri+1)/(wi+wi+1). We then move to the left to make sure that ri-1ri*—if not, we pool ri-1 with ri*and ri+1 replacing these three with their weighted average. We continue to the left until the monotonic requirement is satisfied, then proceed again to the right (see [2,3,5,8]). This algorithm finds the exact solution via forward and backward averaging. Another parametric algorithm, called Active Set Method, approximates the solution using the Karush-Kuhn-Tucker (KKT) conditions for linearly constrained optimization [3,9].

    The algorithm PAV repeatedly searches both backward and forward for violators and takes average whenever a violator is found. In contrast, Algorithm 5.1 determines explicitly the groups of consecutive indexes by a forward search for the partition integers. Average is to be taken over each of these groups. For Algorithm 5.2, the constrained optimization is transformed into a non-constrained mathematical programming, through a re-parameterization. No KKT conditions and active set method are used.


    6.2. An empirical example: the fair risk scales over a rating system

    In this section, we show an example how the non-parametric search algorithm (Algorithm 4.1, labelled as "NPSM") can be used for estimation of the default risk scales with monotonic constraints for a rating system. We use the following two benchmarks:

    EXP-CDF—The method proposed by Burgt [17]. The rating level PD is estimated by pi = exp⁡(a+bx), where x denotes, for a rating Ri, the adjusted sample cumulative distribution:

    x(i)=n1+n2++nin1+n2++nk (6.1)

    where {ni}ki=1 are defined as in Section. 1. Instead of estimating parameters via cap ratio [17], we estimate parameters by maximizing the log likelihood (1.3).

    LGST-INVCDF—The method proposed by Tasche [16]. The rating level PD is estimated by pi=11+exp(a+bΦ1(x)), where x is as in (6.1), and Φ-1 is the inverse of the cumulative distribution for the standard normal distribution. Parameters are estimated by maximizing the log likelihood (1.3).

    The sample consists of the default and non-default frequencies for six non-default ratings (labelled as "RTG" in Table 1 below). Table 1 shows the number of defaults by rating (labelled as "D") in the sample, the count by rating (labelled as "N"), and the default rate (labelled as "DFR"). The third row denotes the sample distribution (labelled as "Dist"). It is assumed that lower index ratings carry higher default risks. For the proposed method "NPSM" in table 1, we need to first reverse the indexes of ratings and then apply Algorithm 4.1.

    Table 1. Smoothing rating level default rate.
    RTG 1 2 3 4 5 6 LL AVG SSE
    D 1 11 22 124 62 170
    N 5529 11566 29765 52875 4846 4318
    Dist 5% 11% 27% 49% 4% 4%
    DFR 0.0173% 0.0993% 0.0739% 0.2352% 1.2833% 3.9442% -2208.01 0.003594 0
    NPSM 0.0173% 0.0810% 0.0810% 0.2352% 1.2833% 3.9442% -2208.33 0.003594 0.00053
    EXP-CDF 0.0061% 0.0086% 0.0294% 0.3431% 1.9081% 2.5057% -2264.46 0.003601 1.15966
    LGST-INVCDF 0.0104% 0.0188% 0.0585% 0.2795% 1.5457% 3.4388% -2223.17 0.003594 0.16221
     | Show Table
    DownLoad: CSV

    The quality of an estimation is measured by log-likelihood (labelled as "LL", larger values are better), the sum squared error (labelled as "SSE" in the sense of (1.1), smaller values are better), the portfolio level count-weighted average of the estimates (labelled as "AVG", closer to the sample portfolio default rate is better).

    As shown in Table 1, the sample default rate is not monotonic between ratings 2 and 3. The proposed non-parametric algorithm (NPSM) simply takes the average. It gets the highest log-likelihood, the lowest sum squared error, and its count-weighted average is the same as the sample portfolio default rate. While for the other two benchmarks, the sum squared error is higher. Both overestimate the risk for ratings 4, 5, and underestimate the risk for ratings 1, 2, 3 and 6.


    6.3. Risk-supervised monotonic binning for univariate data

    Given a sample, let Sx={xi}ki=1 be the order set of all the distinct sample values of an explanatory variable x ordered by xi<xi+1. Denote by {yij}nij=1 the set of all the observed y-values conditional on x = xi. Discretization of continuous attributes are usually required in machine learning processes [7]. Binning is also widely used in retail portfolio credit scoring [1,4,15]. A discretization or binning of a numerical variable x consists of a list of partition numbers {ci}Mi=1 and intervals {Ii}Mi=1, where

    =c0<c1<<cM1<cM=+,I1=(,c1],I2=(c1,c2],,IM1=(cM2,cM1],IM=(cM1,+),

    where each intersection Bi=SIi is non-empty for all 1 ≤ iM. Let Ni denote the number of observations in Bi and bi=1NixhBinhj=1yhj, the sample average of y over Bi. A monotonic binning for the explanatory variable x is a binning where {bi}M(i=1) satisfy the monotonic condition (6.2) or (6.3) below:

    b1<b2<<bM, (6.2)
    b1>b2>>bM. (6.3)

    The quality of a binning can be measured by its sum squared error (smaller values are better), which is defined as:

    SSE=Mi=1xjBinih=1(yjhbi)2=Mi=1xjBinih=1(yjhrj)2+Mi=1xjBini(rjbi)2=SSEA+SSEB

    where SSEA=Mi=1xjBinih=1(yjhrj)2 and

    SSEB=Mi=1xjBini(rjbi)2. (6.4)

    Because SSEA does not depend on the binning, the minimization of the sum squared error SSE by binning depends only on the minimization of SSEB.

    When y is binary or takes values in the range 0 ≤ y ≤ 1, the quality of the binning can also be measured by the log-likelihood (Bernoulli, or quasi-Bernoulli, or binomial) (high values are better) as

    LLB=Mi=1xjBi[djlog(bi)+(njdj)log(1bi)]. (6.5)

    With the estimates given by Propositions 3.1 and 4.1 and in absence of the bin size requirements, a preliminary but the best monotonic binning, in the sense of maximum likelihood or minimum sum square error subject to (6.2), can be obtained as:

    I1=(,xk1],I2=(xk1,xk2],,Im1=(xkm2,xkm1],Im=(xkm1,+)

    where {ki}mi=0 are the partition integers by (2.6) and (2.7).


    7. Conclusions

    This paper shows that the maximum Bernoulli likelihood (or quasi-Bernoulli likelihood) estimates with monotonic constraints are each given by the average risk observed over some consecutive indexes. These estimates coincide with the least squares estimates with the same monotonic constraints. The proposed non-parametric algorithm provides a resolution to flip-over credit risk, and a tool to determine the fair risk scales over a rating system.


    Acknowledgements

    The author is very grateful to the second reviewer for suggesting the title and the removal of (1.5) as a condition in Propositions 3.1 and 4.1 in the original version, and his verification of Propositions 3.1 and 4.1 by alternative approaches. The first paragraph comes directly from the comments by the second reviewer.

    The author also thanks Carlos Lopez for his supports for this research. Special thanks go to Clovis Sukam his critical reading for this manuscript, and Biao Wu, Glenn Fei, Wallace Law, Zunwei Du, Kaijie Cui, Lan Gong, Wilson Kan, and Amada Huang for many valuable conversations.


    Conflict of interest

    The views expressed in this article are not necessarily those of Royal Bank of Canada or any of its affiliates. Please direct any comments to the author Bill Huajian Yang at: h_y02@yahoo.ca.


    [1] Anderson R, (2007) The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation, Oxford: Oxford University Press.
    [2] Barlow RE, Bartholomew DJ, Bremner JM, et al. (1972) Statistical inference under order restrictions: The theory and application of isotonic regression, New York: Wiley.
    [3] Best MJ, Chakravarti N, (1990) Active set algorithms for isotonic regression; A unifying framework. Math Program 47: 425–439. doi: 10.1007/BF01580873
    [4] Eichenberg T, (2018) Supervised weight of evidence binning of numeric variables and factors, R-Package Woebinning.
    [5] Friedman J, Tibshirani R, (1984) The monotone smoothing of scatterplots. Technometrics 26: 243–250. doi: 10.1080/00401706.1984.10487961
    [6] Kotlowski W, Slowinski R, (2009) Rule learning with monotonicity constraints. Proceedings of the 26th Annual International Conference on Machine Learning, 537–544.
    [7] KotsiantisS, KanellopoulosD, (2006) Discretization techniques: A recent survey. GESTSInt Trans Comput Sci Engin 32: 47–58.
    [8] Leeuw JD, Hornik K, Mai P, (2009) Isotone optimization in R: Pool-adjacent-violators algorithm (PAVA) and active set methods. J stat software 32.
    [9] Nocedal J, Wright SJ, (2006) Numerical Optimization, 2 Eds.,New York: Springer.
    [10] Papke LE, Wooldrige JM, (1996) Econometric methods for fraction response variables with application to 401 (k) plan participation rates. J Appl Econometrics 11: 619–632. doi: 10.1002/(SICI)1099-1255(199611)11:6<619::AID-JAE418>3.0.CO;2-1
    [11] Potharst R, Feelders AJ, (2002) Classification trees for problems with monotonicity constraints. SIGKDD Explor 14: 1–10.
    [12] Ramsay JO, Wickham H, Graves S, et al. (2018) Package 'fda'-CRAN.R-Project, 265–267
    [13] Robertson T, Wright FT, Dykstra RL, (1998) Order Restricted Stat Inference, New Jersey: John Wiley and Sons.
    [14] SAS Institute Inc (2014) SAS/STAT(R) 13.2 User's Guide.
    [15] Siddiqi N, (2006) Credit risk scorecards: Developing and implementing intelligent credi scoring. Hoboken, New Jersey: John Wiley and Sons.
    [16] Tasche D, (2013) The art of PD curve calibration. J Credit Risk 9: 63–103. doi: 10.21314/JCR.2013.169
    [17] Van der Burgt M, (2008) Calibrating low-default portfolios, using the cumulative accuracy profile. J Risk Model validation 1: 17–33. doi: 10.21314/JRMV.2008.016
    [18] Yang BH, (2018) Smoothing algorithms by constrained maximum likelihood. J Risk Model Validation 12: 89–102.
    [19] You S, Ding D, Canini K, et al. (2017) Deep lattice networks and partial monotonic functions. 31st Conf Neural Inf Process Syst (NIPS).
  • Reader Comments
  • © 2018 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(5902) PDF downloads(2154) Cited by(0)

Article outline

Figures and Tables

Tables(1)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog