Loading [MathJax]/jax/output/SVG/jax.js
Research article Special Issues

A note on equivalent conditions for majorization

  • In this paper, we introduced novel characterizations of the classical concept of majorization in terms of upper triangular (resp., lower triangular) row-stochastic matrices, and in terms of sequences of linear transforms on vectors. We use our new characterizations of majorization to derive an improved entropy inequality.

    Citation: Roberto Bruno, Ugo Vaccaro. A note on equivalent conditions for majorization[J]. AIMS Mathematics, 2024, 9(4): 8641-8660. doi: 10.3934/math.2024419

    Related Papers:

    [1] Dizhen Ao, Yan Liu, Feng Wang, Lanlan Liu . Schur complement-based infinity norm bounds for the inverse of S-Sparse Ostrowski Brauer matrices. AIMS Mathematics, 2023, 8(11): 25815-25844. doi: 10.3934/math.20231317
    [2] Xiaoyan Xiao, Feng Zhang, Yuxin Cao, Chunwen Zhang . Some matrix inequalities related to norm and singular values. AIMS Mathematics, 2024, 9(2): 4205-4210. doi: 10.3934/math.2024207
    [3] Baifeng Qiu, Zhiping Xiong . The reverse order law for the weighted least square g-inverse of multiple matrix products. AIMS Mathematics, 2023, 8(12): 29171-29181. doi: 10.3934/math.20231494
    [4] Pablo Díaz, Esmeralda Mainar, Beatriz Rubio . Total positivity, Gramian matrices, and Schur polynomials. AIMS Mathematics, 2025, 10(2): 2375-2391. doi: 10.3934/math.2025110
    [5] Chen Chen, Xiangbing Chen, Yi Ai . Convex-structured covariance estimation via the entropy loss under the majorization-minimization algorithm framework. AIMS Mathematics, 2024, 9(6): 14253-14273. doi: 10.3934/math.2024692
    [6] Yanpeng Zheng, Xiaoyu Jiang . Quasi-cyclic displacement and inversion decomposition of a quasi-Toeplitz matrix. AIMS Mathematics, 2022, 7(7): 11647-11662. doi: 10.3934/math.2022649
    [7] Panpan Liu, Haifeng Sang, Min Li, Guorui Huang, He Niu . New criteria for nonsingular H-matrices. AIMS Mathematics, 2023, 8(8): 17484-17502. doi: 10.3934/math.2023893
    [8] Qin Zhong, Chunyan Zhao . Extended Perron complements of M-matrices. AIMS Mathematics, 2023, 8(11): 26372-26383. doi: 10.3934/math.20231346
    [9] Pattrawut Chansangiam, Arnon Ploymukda . Riccati equation and metric geometric means of positive semidefinite matrices involving semi-tensor products. AIMS Mathematics, 2023, 8(10): 23519-23533. doi: 10.3934/math.20231195
    [10] Jung-Chao Ban, Chih-Hung Chang . Entropy dimension of shifts of finite type on free groups. AIMS Mathematics, 2020, 5(5): 5121-5139. doi: 10.3934/math.2020329
  • In this paper, we introduced novel characterizations of the classical concept of majorization in terms of upper triangular (resp., lower triangular) row-stochastic matrices, and in terms of sequences of linear transforms on vectors. We use our new characterizations of majorization to derive an improved entropy inequality.



    The concept of majorization has a rich history in mathematics with applications that span a wide range of disciplines. Majorization theory originated in economics [1], where it was employed to rigorously explain the vague notion that the components of a given vector are "more nearly equal" than the components of a different vector. Nowadays, majorization theory finds applications in numerous areas, ranging from pure mathematics to combinatorics [2,3,4], from information and communication theory [5,6,7,8,9,10,11,12,13] to thermodynamics and quantum theory [14,15], from mathematical chemistry [16] to optimization [17], and much more.

    There are many equivalent conditions for majorization. We review the most common ones in Section 2. Successively, we present our new conditions for majorization in Sections 3 and 4. Finally, in Section 5, we present an application to entropy inequalities.

    Throughout the paper, we will consider vectors x=(x1,,xn)Rn+ ordered component-wise, that is, for which x1xn.

    Definition 2.1. [2, Def. A.1] For x,yRn+,

    xy if {ki=1xiki=1yi,k=1,,n1,ni=1xi=ni=1yi. (2.1)

    When xy, we say that vector x is majorized by y (equivalently, that y majorizes x).

    There are many equivalent conditions for majorization (e.g., see [2], Chapter 4). The conditions that are more closely related to the subject matter of our paper are expressed in terms of doubly stochastic matrices and T-transforms.

    Definition 2.2. An n×n matrix P=[Pij] is doubly stochastic if

    Pij0i,j{1,,n},

    and

    ni=1Pij=1,j=1,,n;nj=1Pij=1,i=1,,n.

    Definition 2.3. An n×n matrix T is a T-transform if

    T=λI+(1λ)Q,

    where 0λ<1, I is the n×n identity matrix, and Q=[Qm] is a permutation matrix such that

    Qm={1 for=m,and,m{i,j}1 for=jandm=i1 for=iandm=j0 otherwise, (2.2)

    for some indices i,j{1,,n}, ij.

    Thus, for an arbitrary x=(x1,xn)Rn+, it holds that the vector xT has the form

    xT=(x1,,xi1,λxi+(1λ)xj,xi+1,,xj1,λxj+(1λ)xi,xj+1,,xn). (2.3)

    Notice that each T-transform is a doubly stochastic matrix. It holds that:

    Theorem 2.1. [2, Proposition B.6, Ch. 2] For any x,yRn, the following conditions are equivalent:

    (1) xy;

    (2) x=yP for some doubly stochastic matrix P;

    (3) x can be derived from y by successive applications of at most n1 T-transforms, as described in Definition 2.3.

    We start this section by introducing the concept of A-transform.

    Informally, an A-transform of a vector x=(x1,,xn) is a transformation that involves two vector components, xi and xj, with i<j. The transformation operates on the vector x by increasing the value of the component xi by the quantity λxj and decreasing the value of the component xj by the same value λxj, where λ is a real number λ[0,1]. More formally, an A-transform can be described by the matrix:

    A=I+X, (3.1)

    where I is the n×n identity matrix and X=[Xm] is a matrix with all entries equal to 0 except for two elements Xji and Xjj, for a given pair of indices i,j, j>i, where Xji=λ and Xjj=λ. Thus, the vector xA has the form

    xA=(x1,,xi1,xi+λxj,xi+1,,xj1,xjλxj,xj+1,,xn). (3.2)

    Note that the matrix A=[Lm] is lower triangular and row-stochastic, that is,

    Am0  for each ,m, (3.3)
    Am=0  for <m, (3.4)
    nm=1Am=1  for all . (3.5)

    The following theorem holds.

    Theorem 3.1. Let x,yRn+. It holds that xy if, and only if, y can be derived from x by the successive applications of a finite number of A-transforms.

    Proof. Let xy. To avoid trivialities, we assume xy. We shall prove that y can be derived from x by the successive applications of a finite number of A-transforms.

    Since the first condition of (2.1) holds, there is an index j such that

    j1i=1xi=j1i=1yi   and   ji=1xi<ji=1yi. (3.6)

    For such an index j, it holds that xj<yj. From (3.6) and the second condition of (2.1), we get that there exists an index k>j such that xk>yk.

    Let j be the smallest index such that yj>xj, and let k be the smallest index greater than j such that xk>yk.

    Let

    δ=min{yjxj,xkyk}. (3.7)

    We define an A-transform as in (3.2), with λ=δ/xk and X=[Xm] defined as follows:

    Xm={λ if =k and m=j,λ if =m=k,0 otherwise. 

    The application of such a matrix A on the vector x gives the vector xA=x with components

    x=(x1,,xj1,xj+δ,xj+1,,xk1,xkδ,xk+1,,xn). (3.8)

    We pause here to illustrate the rest of our proof technique, which proceeds through the following steps:

    (1) We compute the smallest index j for which (3.6) holds. This means that vectors x and y coincide on the first j1 components.

    (2) We modify vector x according to the A operator defined above, to get vector x as described in (3.8).

    (3) We prove (3.9) below, without altering the order of the components of x=xA (this is crucial).

    (4) The number of components on which x and y now coincide is greater than the number of components on which x and y coincide.

    Let us show that the new vector x satisfies the following property:

    {i=1xii=1yi,=1,,n1,ni=1xi=ni=1yi. (3.9)

    From (3.8) and since the vector x satisfies the first condition of (2.1), we get

    i=1xi=i=1xii=1yi,=1,,j1. (3.10)

    From (3.7), we know that xj+δyj. Thus, from (3.8) and (3.10), we get

    ji=1xi=j1i=1xi+(xj+δ)j1i=1yi+yj=ji=1yi. (3.11)

    By definition, the index k is the smallest index greater than j for which xk>yk. It follows that

    xy,=j+1,,k1. (3.12)

    Therefore, from (3.8) and (3.11), we obtain that for each =j+1,,k1, it holds that

    i=1xi=j1i=1xi+(xj+δ)+i=j+1xjj1i=1yi+yj+i=j+1xi(since xj+δyj)j1i=1yi+yj+i=j+1yi(from (3.12))=i=1yi. (3.13)

    From (3.8) and since the vector x satisfies the first condition of (2.1), we get

    ki=1xi=ki=1,ij,kxi+(xj+δ)+(xkδ)=ki=1xiki=1yi. (3.14)

    Finally, since the vector x satisfies the first and second condition of (2.1), we have that

    {i=1xi=i=1xii=1yi,=k+1,,n1,ni=1xi=ni=1xi=ni=1yi. (3.15)

    Therefore, from (3.10), (3.11), and (3.13)–(3.15), we have that (3.9) holds.

    Notice that if δ=yjxj, then xj is equal to yj; equivalently, if δ=xkyk, then xk will be equal to yk. Thus, the vector x=xA has at least one additional component (with respect to x) that is equal to a component of y. Moreover, since each A-transform preserves the property (3.9), we can iterate the process starting from x=xA. It follows that y can be derived from x by the application of a finite number of A-transforms.

    Let us prove the converse part of the theorem. Hence, we assume that y can be derived from x by a finite number of A-transforms, and we prove that xy. Let

    x1=xA1, x2=x1A2,,xk=xk1Ak=y (3.16)

    be the vectors obtained by the successive applications of a number k of A-transforms. Given an arbitrary vector zRn+, let us denote with z the vector with the same components as z, ordered in a nonincreasing fashion. From the definition (3.8) of A-transform, it follows that the partial sums of x are certainly greater than or equal to the corresponding partial sums of x. Therefore,

    xx1x2xk=y.

    *Recall that x=xandy=y.

    By the transitivity property of the partial order relation , we get xy.

    Corollary 3.2. Let x,yRn+. If xy, then y can be derived from x by the successive application of, at most, n1 A-transforms.

    Proof. In the proof of Theorem 3.1, we have shown that the application of each A-transform equalizes at least one component of the intermediate vectors xj=xj1A to a component of y. Since all vectors appearing in (3.16) have an equal sum, the last A-transform always equalizes both the affected components to the respective components of y. As a result, y can be obtained by the application of at most n1 A-transforms.

    Although it is evident, let us explicitly mention that if y can be derived by the application of at most n1 A-transforms, then it holds that y=xL, where the matrix L is the product of the individual A-transforms.

    The following technical lemma is probably already known in the literature. Since we have not found a source that explicitly mentions it, we provide a proof of it to maintain the paper self-contained.

    Lemma 3.3. Let C and D be two n×n lower triangular row-stochastic matrices. The product matrix CD is still a lower triangular row-stochastic matrix.

    Proof. Since CD is the product of two lower triangular matrices, one can see that CD is a lower triangular matrix too. Thus, we need only to show that it is row-stochastic.

    First of all, each entry (CD)ij of CD is nonnegative since it is the sum of nonnegative values. Let us consider the sum of the elements of the i-th row:

    nj=1(CD)ij=nj=1(nk=1CikDkj)=nk=1Cik(nj=1Dkj)=nk=1Cik1(since nj=1Dkj=1)=1(since nk=1Cik=1).

    Thus, since the above reasoning holds for each i=1,,n, the matrix CD is a lower triangular row-stochastic matrix.

    It may be worth commenting that Lemma 3.3 also holds for a product of column-stochastic matrices, which gives a column-stochastic matrix. This holds since (CD)T=DTCT, where, for an arbitrary matrix D, DT denotes the transpose of D, and the transpose of a row-stochastic matrix is a column-stochastic matrix.

    The next Theorem characterizes majorization in terms of lower triangular row-stochastic matrices.

    Theorem 3.4. Let x,yRn+. It holds that xy if, and only if, there exists a lower triangular row-stochastic matrix L such that y=xL.

    Proof. The implication that if xy, then there exists a lower triangular row-stochastic matrix L such that y=xL directly follows from the results of Theorem 3.1 and Lemma 3.3. Indeed, from Theorem 3.1 we know that y can be obtained from x by the successive application of A-transforms, and from Lemma 3.3 we know that the product of two consecutive A-transforms is still a lower triangular row-stochastic matrix. Hence, the matrix L obtained as the product of all the A-transforms is a lower triangular row-stochastic matrix for which y=xL.

    We notice that a different proof of the above result has been given by Li in [18, Lemma 1] by an application of Algorithm 1 in [19].

    Let us now prove the converse implication. Assume that there exists a lower triangular row-stochastic matrix L=[Lij] for which y=xL. Let us prove that xy. For such a purpose, we can express each component of y in the following way:

    yi=nj=iLjixj,i=1,,n. (3.17)

    Hence, by using (3.17), we can rewrite the sum of the first k components of y as follows:

    y1++yk=nj=1Lj1xj+nj=2Lj2xj++nj=kLjkxj.

    By grouping the multiplicative coefficients of each xi, we get:

    y1++yk=x11i=1L1i+x22i=1L2i++xkki=1Lki++xnki=1Lni. (3.18)

    Since the matrix L is lower triangular row-stochastic, we have that for each j=1,,n, it holds that ji=1Lji=1. Hence, from (3.18), we get

    y1++ykx11i=1L1i+x22i=1L2i++xkki=1Lki=x1++xk

    for each k=1,,n, and

    y1++yn=x1++xn.

    Thus, xy.

    We point out the following interesting property of the matrix L that appears in the "if" part of Theorem 3.4.

    Corollary 3.5. Let x,yRn+. If xy, then there exists a lower triangular row-stochastic matrix L, with at most 2n1 nonzero elements, such that y=xL.

    Proof. From Corollary 3.2 and Theorem 3.4, we know that the matrix L, for which it holds that y=xL, is the product of at most n1 A-transforms. Let

    A1,,At (3.19)

    be the individual matrices associated with such t A-transforms, t<n. By Definition (3.1), the matrix A1 has n+1 nonzero elements.

    Let C be the matrix equal to the product of the first m1 A-transforms of (3.19), mt<n, that is, C=m1i=1Ai. We show that the product

    CAm (3.20)

    gives a matrix with only one additional nonzero element with respect to C.

    Let j,k be the pair of indices chosen to construct Am=I+Xm. From (3.20), we get

    CAm=C(I+Xm)=C+CXm. (3.21)

    For every A-transform, we recall that we always choose the smallest index j such that yj>xj, and the smallest index k greater than j for which yk<xk. Therefore, all the previous A-transforms have chosen indices less than or at most equal to k. Consequently, in the matrix C, all the rows after the k-th row are equal to the rows of the identity matrix. By the definition, the matrix Xm has nonzero elements only in the k-th row, (in positions Xkj and Xkk, respectively). Hence, the matrix CXm has nonzero elements only in the entries (CXm)kj and (CXm)kk. Since the element (CXm)kk will be added to the diagonal element of C, the only new nonzero element in CXm (with respect to C) is (CXm)kj. Hence, from (3.21), we get that the product CAm gives a matrix with only one new element with respect to C.

    Since each product generates a matrix with only one additional nonzero element with respect to the previous one, we obtain that the final matrix L has at most n+1+n2=2n1 nonzero elements.

    We summarize our results in the next theorem, mirroring the classic Theorem 2.1.

    Theorem 3.6. If x,yRn+, the following conditions are equivalent:

    (1) xy;

    (2) y=xL for some lower triangular row-stochastic matrix L;

    (3) y can be derived from x by successive applications of at most n1 A-transforms, as defined in (3.2).

    Proof. The equivalences follow from the results of Theorems 3.1 and 3.4.

    Let us look at an example of how the matrix L is constructed.

    Example 3.7. Let y=[0.6,0.2,0.1,0.1] and x=[0.3,0.3,0.3,0.1] with xy.

    The first A-transform affects the elements x1 and x2 with δ=min{y1x1,x2y2}=0.1 and λ=δ/x2=1/3. As a result, the associated matrix A1 is:

    A1=[1000010000100001]+[100013130000100001]=[100013230000100001]

    and x1=xA1=[0.4,0.2,0.3,0.1].

    The second A-transform affects the elements x11 and x13 with δ=min{y1x11,x13y3}=0.2 and λ=δ/x13=2/3. Hence, the matrix A2 is

    A2=[1000010000100001]+[100001002302300001]=[100001002301300001]

    and x2=x1A2=[0.6,0.2,0.1,0.1]=y. Therefore, the final matrix for which y=xL is

    L=A1A2=[10001323002301300001]. (3.22)

    It is worth noticing that the matrix L is not the inverse of the doubly stochastic matrix P, for which x=yP, obtained by applying a series of T-transforms (Theorem 2.1). In fact, the inverse of L is

    [100012320020300001]

    which is not a doubly stochastic matrix.

    We now present our characterization of majorization through upper triangular row-stochastic matrices.

    A B-transformation or, more briefly, a B-transform, is a transformation of a vector y=(y1,,yn) that involves two vector components, yi and yj, with i<j. The transformation operates on the vector y by decreasing the component yi by the quantity λyi and increasing the component yj by the same value λyi, where λ is a real number such that λ[0,1].

    We can describe a B-transform as a matrix

    B=I+Y, (4.1)

    where I is the identity matrix and Y=[Ym] is a matrix with all entries equal to 0 except for two elements Yii and Yij, where Yii=λ and Yij=λ. Thus, yB has the form

    yB=(y1,,yi1,yiλyi,yi+1,,yj1,yj+λyi,yj+1,,yn). (4.2)

    Note that the matrix B=[Bm] is upper triangular and row-stochastic, that is,

    Bm0  for each ,m, (4.3)
    Bm=0  for >m, (4.4)
    nm=1Bm=1  for all . (4.5)

    The next theorem relates the B-transforms to majorization.

    Theorem 4.1. Let x,yRn+. It holds that xy if, and only if, x can be derived from y by the successive applications of a finite number of B-transforms.

    Proof. Let x=(x1,,xn)y=(y1,,yn), xy. We shall prove that x can be derived from y by the successive applications of a finite number of B-transforms.

    Let j be the largest index for which yj>xj, and let k be the smallest index greater than j such that xk>yk. Note that such a pair j,k must exist (as we argued in the proof of Theorem 3.1). We set the quantity δ as

    δ=min{yjxj,xkyk}. (4.6)

    We define a B-transform as in (4.1), with λ=δ/yj and Y such that

    Ym={λ if =m=j,λ if =j and m=k,0 otherwise. 

    The application of such a matrix B on the vector y gives the vector yB=y with components

    y=(y1,,yj1,yjδ,yj+1,,yk1,yk+δ,yk+1,,yn). (4.7)

    Let us show that the new vector y still majorizes x.

    From (4.7) and since xy, we get

    i=1xii=1yi=i=1yi,=1,,j1. (4.8)

    From (4.6) and (4.7), we know that

    yjxj.

    Furthermore, by definition, the index j is the largest index such that yj>xj, and k is the smallest index greater than j such that yk<xk. It follows that

    y=y=x,=j+1,,k1. (4.9)

    Thus, from (4.8), we obtain that for each =j,,k1, it holds that

    i=1yi=j1i=1yi+yj+i=j+1yij1i=1xi+yj+i=j+1yij1i=1xi+xj+i=j+1yi(since yjxj)=j1i=1xi+xj+i=j+1xi(from (4.9))=i=1xi. (4.10)

    From (4.7), we get

    ki=1yi=i=1,ij,kyi+(yjδ)+(yk+δ)=ki=1yiki=1xi(since  xy). (4.11)

    Finally, since xy, we have that

    {i=1yi=i=1yii=1xi,=k+1,,n1,ni=1yi=ni=1yi=ni=1xi. (4.12)

    Therefore, from (4.8) and (4.10)–(4.12), we have that xy.

    Notice that if δ=yjxj, then yj is equal to xj; equivalently, if δ=xkyk, then yk is equal to xk. Moreover, since each B-transform preserves the majorization, we can iterate the process starting from y=yB. It follows that x can be derived from y by the application of a finite number of B-transforms.

    Let us now prove the converse part of the theorem. We prove it by contradiction. Hence, we assume that xy, and show that if x can be derived from y by the successive applications of a finite number of B-transforms, we get a contradiction.

    Since xy, there exists an index {1,,n}, such that

    i=1yi<i=1xi. (4.13)

    Moreover, by definition of B-transform (4.1), the quantity λyj can be moved between two components yj and yk of y, with j>k, only from yj to yk. Therefore, the sum of the first components of y cannot be increased in any way through B-transforms. Consequently, this leads to a contradiction, because not all components y1,,y can be transformed into their respective components of x. Thus, it must hold that xy.

    Corollary 4.2. Let x,yRn+. If xy, then x can be derived from y by the application of at most n1 B-transforms.

    Proof. In the proof of Theorem 4.1, we have shown that the application of each B-transform equalizes at least one component of the intermediate vectors yj=yj1Bj to a component of x. Observe that since all vectors appearing in the sequence of transformation from y to x have an equal sum, the last B-transform always equalizes both the affected components to the respective components of x. As a result, x can be obtained by the application of at most n1 B-transforms.

    With the same technique of Lemma 3.3, we can prove the following result.

    Lemma 4.3. Let C and D be two n×n upper triangular row-stochastic matrices. The product matrix CD is still an upper triangular row-stochastic matrix.

    The following theorem characterizes majorization in terms of the upper triangular row-stochastic matrix.

    Theorem 4.4. Let x,yRn+. It holds that xy if, and only if, there exists an upper triangular row-stochastic matrix U such that x=yU.

    Proof. The implication that if xy, there exists an upper triangular row-stochastic matrix U such that x=yU directly derives from the results of Theorem 4.1 and Lemma 4.3. Indeed, from Theorem 4.1, we know that x can be derived from y by the successive application of B-transforms, and from Lemma 4.3 that the product of B-transforms is still an upper triangular row-stochastic matrix. Hence, the matrix U obtained as the product of all B-transforms is an upper triangular row-stochastic matrix such that x=yU.

    We now prove the converse implication. Let U be an upper triangular row-stochastic matrix U such that x=yU. It follows that each component of x can be written as follows:

    xj=ji=1Uijyi,j=1,,n. (4.14)

    By (4.14), we can express the sum of the first k components of x, with k=1,,n, as follows:

    x1++xk=kj=1ji=1Uijyi=ki=1(kj=iUij)yiki=1yi    (since kj=iUij1, given that U is rowstochastic).

    Thus, xy.

    We now bound the number of nonzero elements in the matrix U of Theorem 4.4.

    Corollary 4.5. Let x,yR+n. If xy, then there exists an upper triangular row-stochastic matrix U, with at most 2n1 nonzero elements, such that x=yU.

    Proof. From Theorem 4.1 and Corollary 4.2, we know that the matrix U, for which it holds that x=yU, is the product of at most n1 B-transforms. Let

    B1,,Bt

    be the individual matrices associated with such t B-transforms, t<n. By (4.1), the matrix B1 has n+1 nonzero elements.

    Let C be the matrix equal to the product of the first m1 B-transforms, mt<n, that is, C=m1i=1Bi. We show that the product

    CBm (4.15)

    gives a matrix with only one additional nonzero element with respect to C.

    Let j,k be the pair of indices chosen to construct Bm=I+Ym. From (4.15), we get

    CBm=C(I+Ym)=C+CYm. (4.16)

    For every B-transform, we recall that we always choose the largest index j such that yj>xj, and the smallest index k greater than j for which yk<xk. Therefore, all the previous B-transforms have chosen pairs of indices i, such that i is greater than or at most equal to j. Consequently, in the matrix C, all the rows above the j-th row are equal to the rows of the identity matrix. In addition, by the definition, the matrix Y has nonzero elements only in the j-th row, in positions Yjj and Yjk, respectively. Hence, the matrix CYm has nonzero elements only in the entries (CYm)jj and (CYm)jk. Since the element (CYm)jj will be added to the diagonal element of C, the only new nonzero element is (CYm)jk. Hence, from (4.16), we get that the product gives a matrix with only one new additional element with respect to C.

    Since each product generates a matrix with only one additional nonzero element with respect to the previous one, we obtain that the final matrix U has at most n+1+n2=2n1 nonzero elements.

    We summarize our results in the next theorem, in the fashion of the classic Theorem 2.1.

    Theorem 4.6. If x,yRn+, the following conditions are equivalent:

    (1) xy;

    (2) x=yU for some upper triangular row-stochastic matrix U;

    (3) x can be derived from y by the successive applications of at most n1 B-transforms, as defined in (4.2).

    Proof. The equivalences are a direct consequence of the results of Theorem 4.1, Corollary 4.2, and Theorem 4.4.

    Let us now see an example of the construction of the matrix U.

    Example 4.7. Let y=[0.6,0.2,0.1,0.1] and x=[0.3,0.3,0.3,0.1] with xy.

    The first B-transform modifies the elements y1 and y2 with δ=min{y1x1,x2y2}=0.1 and λ=δ/y1=1/6. As a result, the associated matrix B1 is the following one:

    B1=[1000010000100001]+[161600010000100001]=[561600010000100001]

    and y1=yB1=[0.5,0.3,0.1,0.1].

    The second B-transform affects the elements y11 and y13 with δ=min{y11x1,x3y13}=0.2 and λ=δ/y11=2/5. Hence, the matrix B2 is the following one

    B2=[1000010000100001]+[250250010000100001]=[350250010000100001]

    and y2=y1B2=[0.3,0.3,0.3,0.1]=x. Hence, the final matrix is:

    U=B1B2=[1216260010000100001]

    It is worth pointing out that the above example also shows that the matrices U of this section are not simply the inverses of the matrices L of Section 3.

    We recall that a real-valued function ϕ:ARnR is said to be Schur-concave [2] if

    xyϕ(x)ϕ(y).

    In the rest of this section, we will assume that the set A corresponds to the (n1)-dimensional probability simplex Pn, defined as

    Pn={x=(x1,,xn)x1xn>0 and ni=1xi=1}.

    It is well known that the Shannon entropy H(x)=ni=1xilog2xi, is Schur-concave over Pn. Therefore, for any x,yPn, if xy, it holds that

    H(x)H(y). (5.1)

    The above inequality (5.1) is widely used in information theory. There are several improvements to the basic inequality (5.1). For instance, the authors of [20] proved that for any x,yPn, if xy, then it holds that

    H(x)H(y)+D(y||x), (5.2)

    where D(y||x)=iyilog(yi/xi) is the relative entropy between y and x.

    The paper [21] proved a different strengthening of (5.1) (see also Proposition A.7.e. of [2]). More precisely, Proposition A.7.e. of [2] states that if x,yPn and xy, then it holds that

    H(x)α(P)log2n+(1α(P))H(y)H(y), (5.3)

    where P=[Pij] is a doubly stochastic matrix for which x=yP, and α(P) is the Dobrushin coefficient of ergodicity of P, defined as:

    α(P)=min,mni=1min{Pi,Pmi}.

    It might be useful to recall that there are papers that intend the Dobrushin coefficient of ergodicity of P as 1α(P).

    We show that our results from the previous sections can be used to obtain a different improvement of the basic inequality (5.1). In fact, we prove the following result.

    Theorem 5.1. Let x,yPn and xy. Moreover, let U be the upper triangular matrix obtained through the sequence of B-transforms described in Theorem 4.1 for which x=yU. If xy, it holds that

    H(x)(1α(U))H(y)+ni=1xilog21(uU)i(1α(U))log2n (5.4)
    >α(U)log2n+(1α(U))H(y), (5.5)

    where u=(1/n,,1/n) and α(U) is the Dobrushin coefficient of ergodicity of U.

    To prove the theorem, we need some intermediate results. We recall that a matrix C is said to be column-allowable if each column contains at least one positive element [21]. From [21][Thm. 1.4], we obtain the following Lemma.

    Theorem 1.4 of [21] is for general f-divergences.

    Lemma 5.2. Let CRn×n+ be a row-stochastic and column-allowable matrix and let x,yPn, then

    D(xCyC)(1α(C))D(xy), (5.6)

    where D(xy)=ni=1xilog2(xi/yi) is the relative entropy between x and y.

    We note that Lemma 5.2 is a classical example of the role of contraction coefficients. Contraction coefficients are important quantities in strong data-processing inequalities [8].

    By exploiting the knowledge of the structure of the matrices U of Section 4, we obtain the following result.

    Lemma 5.3. Let x,yPn, such that xy. Moreover, let U be the upper triangular matrix obtained through the sequence of B-transforms described in Theorem 4.1 such that x=yU. If xy, it holds that

    ni=1xilog21(uU)i>log2n. (5.7)

    Proof. Let U=si=1Bi be the upper triangular matrix obtained as product of the s B-transforms B1,,Bs. To prove the lemma, we first show that

    ni=1xilog21(uB1)i>log2n. (5.8)

    Successively, we prove that for each =2,,s1, it holds that

    ni=1xilog21(u(CB))ini=1xilog21(uC)i=ni=1xilog2(uC)i(u(CB))i>0, (5.9)

    where C=1i=1Bi.

    Let j,k be the pair of indices, with j<k chosen in the first B-transform B1, then we know that

    B1m={1λ if =m=j,λ if =j and m=k,1 if =mj0 otherwise.  (5.10)

    Hence, from (5.10) and by noticing that for each i=1,,n, it holds that (uB1)i=(1/n)n=1B1i, we get the following series of equalities and inequalities:

    ni=1xilog21(uB1)ilog2n=ni=1xilog21(uB1)ini=1xilog2n=ni=1xilog21n(uB1)i=ni=1xilog21n1nn=1B1i=ni=1xilog21n=1B1i=ij,kxilog21B1ii+xjlog21B1jj     +xklog21B1kk+B1jk(from (5.10))=xjlog211λ+xklog211+λxklog21(1λ)(1+λ)(since xjxk)=xklog211λ2>0. (5.11)

    Thus, we have proved that (5.8) holds.

    Let C=1i=1Bi for an arbitrary {2,,s1}. Let j,k be the pair of indices with j<k chosen in the successive B-transform B=I+Y, then we know that

    Ytm={λ if t=m=j,λ if t=j and m=k,0 otherwise.  (5.12)

    Therefore, we get

    CB=C(I+Y)=C+CY. (5.13)

    For every B-transform, we recall that we always choose the largest index j such that yj>xj, and the smallest index k greater than j for which yk<xk. Therefore, all the previous B-transforms have chosen pairs of indices t,m such that t is greater than or at most equal to j. Consequently, in the matrix C, all the rows above the j-th row are equal to the rows of the identity matrix. Thus, from (5.12), it follows that the only nonzero entries in the matrix CY are (CY)jk=λCjj and (CY)jj=λCjj. Therefore, from (5.13), the only entries of C that are different from the corresponding entries of CB are Cjj and Cjk. In particular, in the matrix CB, the entry Cjj of C is decremented by λCjj and Cjk is incremented by λCjj. Therefore, we have

    (CB)tm={CjjλCjj if t=m=j,Cjk+λCjj if t=j and m=k,Ctm otherwise.  (5.14)

    Thus, we get

    ni=1xilog2(uC)i(u(CB))i=ni=1xilog2(1/n)nt=1Cti(1/n)nt=1(CB)ti=ni=1xilog2nt=1Ctint=1(CB)ti=ij,kxilog2nt=1Ctint=1Cti+xjlog2nt=1Ctjnt=1CtjλCjj+xklog2nt=1Ctknt=1Ctk+λCjj(from (5.14))=xjlog2nt=1Ctjnt=1CtjλCjj+xklog2nt=1Ctknt=1Ctk+λCjj. (5.15)

    Observe that by construction of C, it holds that Cjj1 and that the only nonzero element in the j-th column is Cjj. Thus, it follows that nt=1Ctj=Cjj. Similarly, it also holds that Ckk=1. In fact, since yk<xk, the B-transforms do not modify the k-th row (that remains equal to the row of the identity matrix). Thus, it follows that nt=1Ctk1. Therefore, we can rewrite (5.15) as follows

    ni=1xilog2(uC)i(u(CB))i=xjlog2CjjCjjλCjj+xklog2nt=1Ctknt=1Ctk+λCjj=xjlog211λ+xklog211+λCjjnt=1Ctkxklog21(1λ)(1+λCjjnt=1Ctk)(since xjxk)xklog21(1λ)(1+λ)(since  1+λ1+λCjjnt=1Ctk)>0. (5.16)

    Thus, we proved that (5.9) holds.

    Since both (5.8) and (5.9) hold, we have proved the Lemma, given that

    ni=1xilog21(uU)i>ni=1xilog21(uB1)i>log2n.

    We can now prove Theorem 5.1.

    Proof. Observe that the matrix U is a row-stochastic and column-allowable matrix. Thus, from Lemma (5.2), we obtain

    D(yUuU)(1α(U))D(yu),

    and, since x=yU,

    D(xuU)(1α(U))D(yu). (5.17)

    Expanding the divergences in (5.17), we get

    ni=1xilog21(uU)iH(x)(1α(U))(log2nH(y)). (5.18)

    From (5.18), we get the following lower bound on the entropy of x:

    H(x)(1α(U))H(y)+ni=1xilog21(uU)i(1α(U))log2n. (5.19)

    From Lemma 5.3, we know that

    ni=1xilog21(uU)i>log2n. (5.20)

    Thus, by applying (5.20) to (5.19), we finally get

    H(x)(1α(U))H(y)+ni=1xilog21(uU)i(1α(U))log2n>(1α(U))H(y)+log2n(1α(U))log2n=(1α(U))H(y)+α(U))log2nH(y)(since  H(y)log2n).

    In this paper, we have introduced two novel characterizations of the classical concept of majorization in terms of upper triangular (resp., lower triangular) row-stochastic matrices. The interesting features of our upper triangular (resp., lower triangular) row-stochastic matrices are that they are quite sparse in the sense that they have few nonzero elements this property might be useful in practical applications. Finally, we have used our new characterization of majorization in terms of upper triangular row-stochastic matrices to derive an improved entropy inequality. We mention that one could derive a similar (albeit nonequivalent) improved entropy inequality by using the characterization of majorization in terms of lower triangular row-stochastic matrices that we have given in Section 3. To do so, the way to proceed is similar to the one we presented in Section 5.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    The authors would like to thank the referees and the Guest Editor Professor I. Sason for their corrections and useful suggestions.

    This work was partially supported by project SERICS (PE00000014) under the NRRP MUR program funded by the EU-NGEU.

    The authors declare that they have no conflicts of interest.



    [1] B. C. Arnold, J. M. Sarabia, Majorization and the Lorenz Order with Applications in Applied Mathematics and Economics, Berlin: Springer, 2018.
    [2] A. W. Marshall, I. Olkin, B. C. Arnold, Inequalities: Theory of Majorization and its Applications, 2nd edition, Berlin: Springer, 2010.
    [3] G. H. Hardy, J. E. Littlewood, G. Pólya, Inequalities, Cambridge: Cambridge University Press, 1934.
    [4] M. Madiman, L. Wang, J. O. Woo, Majorization and Rényi entropy inequalities via Sperner theory, Discrete Math., 342 (2019), 2911–2923. https://doi.org/10.1016/j.disc.2019.03.002 doi: 10.1016/j.disc.2019.03.002
    [5] M. Adil Khan, S. I. Bradanovic, N. Latif, D. Pecaric, J. Pecaric, Majorization Inequality and Information Theory, Zagreb: Element, 2019.
    [6] M. P. Mueller, M. Pastena, A generalization of majorization that characterizes Shannon entropy, IEEE Trans. Inf. Theory, 62 (2016), 1711–1720. https://doi.org/10.1109/TIT.2016.2528285 doi: 10.1109/TIT.2016.2528285
    [7] I. Sason, Tight bounds on the Rényi entropy via majorization with applications to guessing and compression, Entropy, 20 (2018), 896. https://doi.org/10.3390/e20120896 doi: 10.3390/e20120896
    [8] I. Sason, On data-processing and majorization inequalities for f-divergences with applications, Entropy, 21 (2019), 1022. https://doi.org/10.3390/e21101022 doi: 10.3390/e21101022
    [9] F. Cicalese, U. Vaccaro, Supermodularity and subadditivity properties of the entropy on the majorization lattice, IEEE T. Inform. Theory, 48 (2002), 933–938. https://doi.org/10.1109/18.992785 doi: 10.1109/18.992785
    [10] H. Witsenhausen, Some aspects of convexity useful in information theory, IEEE T. Inform. Theory, 26 (1980), 265–271.
    [11] D. P. Palomar, Y. Jiang, MIMO Transceiver Design via Majorization Theory, New York: Now Publishers, 2007.
    [12] E. Jorswieck, H. Boche, Majorization and matrix-monotone functions in wireless communications, Foundat. Trends Commun. Inform. Theory, 3 (2007), 553–701. http://dx.doi.org/10.1561/0100000026 doi: 10.1561/0100000026
    [13] J. Wang, D. P. Palomar, Majorization theory with applications in signal processing and communication systems, In: Mathematical Foundations for Signal Processing, Communications and Networking, 2011. https://doi.org/10.1201/9781351105668
    [14] G. Bellomo, G. Bosyk, Majorization, Across the (Quantum) Universe, Cambridge: Cambridge University Press, 2019. https://doi.org/10.1017/9781108562218.018
    [15] T. Sagawa, Entropy, Divergence, and Majorization in Classical and Quantum Thermodynamics, Berlin: Springer, 2022.
    [16] M. Bianchi, G. P. Clemente, A. Cornaro, J. L. Palacios, A. Torreiro, New trends in majorization techniques for bounding topological indices, In: Bounds in Chemical Graph Theory-Basics, 2017, 3–66.
    [17] G. Dahl, Principal majorization ideals and optimization, Linear Algebra Appl., 331 (2001), 113–130. https://doi.org/10.1016/S0024-3795(01)00268-3 doi: 10.1016/S0024-3795(01)00268-3
    [18] C. Li, Efficient approximate minimum entropy coupling of multiple probability distributions, IEEE T. Inform. Theory, 67 (2021), 5259–5268. https://doi.org/10.1109/TIT.2021.3076986 doi: 10.1109/TIT.2021.3076986
    [19] F. Cicalese, L. Gargano, U. Vaccaro, Minimum-entropy couplings and their applications, IEEE T. Inform. Theory, 65 (2019), 3436–3451. https://doi.org/10.1109/TIT.2019.2894519 doi: 10.1109/TIT.2019.2894519
    [20] S. W. Ho, S. Verdú, On the interplay between conditional entropy and error probability, IEEE T. Inform. Theory, 56 (2010), 5930–5942. https://doi.org/10.1109/TIT.2010.2080891 doi: 10.1109/TIT.2010.2080891
    [21] J. Cohen, Y. Derriennic, G. Zbaganu, Majorization, monotonicity of relative entropy, and stochastic matrices, Contemp. Math., 149 (1993), 251–259.
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1349) PDF downloads(146) Cited by(0)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog