Research article

(s,S) Inventory policies for stochastic controlled system of Lindley-type with lost-sales

  • Received: 21 February 2023 Revised: 05 June 2023 Accepted: 06 June 2023 Published: 09 June 2023
  • MSC : 49J53, 49K99

  • This paper presents a characterization of (s,S)-inventory policies for Lindley systems with possibly unbounded costs, where the objective is to minimize the expected discounted total cost by ordering (production) strategies. Moreover, the existence of a subsequence of minimizers of the value iteration functions that converge to a (s,S) optimal inventory system policy is shown. A numerical example is given to illustrate the theory.

    Citation: Rubén Blancas-Rivera, Hugo Cruz-Suárez, Gustavo Portillo-Ramírez, Ruy López-Ríos. (s,S) Inventory policies for stochastic controlled system of Lindley-type with lost-sales[J]. AIMS Mathematics, 2023, 8(8): 19546-19565. doi: 10.3934/math.2023997

    Related Papers:

    [1] Xiaolong Li . Asymptotic optimality of a joint scheduling–control policy for parallel server queues with multiclass jobs in heavy traffic. AIMS Mathematics, 2025, 10(2): 4226-4267. doi: 10.3934/math.2025196
    [2] K. Jeganathan, S. Selvakumar, N. Anbazhagan, S. Amutha, Porpattama Hammachukiattikul . Stochastic modeling on M/M/1/N inventory system with queue-dependent service rate and retrial facility. AIMS Mathematics, 2021, 6(7): 7386-7420. doi: 10.3934/math.2021433
    [3] YeongJae Kim, YongGwon Lee, SeungHoon Lee, Palanisamy Selvaraj, Ramalingam Sakthivel, OhMin Kwon . Design and experimentation of sampled-data controller in T-S fuzzy systems with input saturation through the use of linear switching methods. AIMS Mathematics, 2024, 9(1): 2389-2410. doi: 10.3934/math.2024118
    [4] Avijit Duary, Md. Al-Amin Khan, Sayan Pani, Ali Akbar Shaikh, Ibrahim M. Hezam, Adel Fahad Alrasheedi, Jeonghwan Gwak . Inventory model with nonlinear price-dependent demand for non-instantaneous decaying items via advance payment and installment facility. AIMS Mathematics, 2022, 7(11): 19794-19821. doi: 10.3934/math.20221085
    [5] Omar Kahouli, Amina Turki, Mohamed Ksantini, Mohamed Ali Hammami, Ali Aloui . On the boundedness of solutions of some fuzzy dynamical control systems. AIMS Mathematics, 2024, 9(3): 5330-5348. doi: 10.3934/math.2024257
    [6] Wentao Le, Yucai Ding, Wenqing Wu, Hui Liu . New stability criteria for semi-Markov jump linear systems with time-varying delays. AIMS Mathematics, 2021, 6(5): 4447-4462. doi: 10.3934/math.2021263
    [7] Mohammad H. Almomani, Mahmoud H. Alrefaei . Selecting a set of best stochastic inventory policies measured by opportunity cost. AIMS Mathematics, 2023, 8(2): 4892-4906. doi: 10.3934/math.2023244
    [8] Linhong Li, Wei Xu, Zhen Wang, Liwei Liu . Improving efficiency of the queueing system with two types of customers by service decomposition. AIMS Mathematics, 2023, 8(11): 25382-25408. doi: 10.3934/math.20231295
    [9] Doaa Basalamah, Bader Alruwaili . The weighted Lindley exponential distribution and its related properties. AIMS Mathematics, 2023, 8(10): 24984-24998. doi: 10.3934/math.20231275
    [10] Yuxin Lou, Mengzhuo Luo, Jun Cheng, Xin Wang, Kaibo Shi . Double-quantized-based H tracking control of T-S fuzzy semi-Markovian jump systems with adaptive event-triggered. AIMS Mathematics, 2023, 8(3): 6942-6969. doi: 10.3934/math.2023351
  • This paper presents a characterization of (s,S)-inventory policies for Lindley systems with possibly unbounded costs, where the objective is to minimize the expected discounted total cost by ordering (production) strategies. Moreover, the existence of a subsequence of minimizers of the value iteration functions that converge to a (s,S) optimal inventory system policy is shown. A numerical example is given to illustrate the theory.



    This paper considers a discrete-time, single-item, infinite-stock inventory system in which excess demand is not backlogged. The dynamics of the system is determined by a random walk with a lower barrier; in particular, the state zero is considered as a barrier. Also, it is assumed that the control variable is affected by a random noise in the dynamic of the system; see (2.1). Furthermore, the following costs are considered: ordering/production, holding, and shortage. Then, the objective function is the infinite-horizon expected total discounted cost. The goal of the paper is to guarantee the existence of stationary policies and characterize the optimal stationary policies as (s,S) policies (see [5,18,23,28]). The (s,S)-policies operate in the following manner: If the inventory level falls below a minimum s0, the controller will request a replenishment demand to restore the inventory stock to a maximum number Ss.

    As mentioned earlier, a crucial component of the inventory model is the dynamics of the system; in this paper is considered a controlled version of a Lindley random walk. This random walk has been used to model waiting times in a queuing system with a server (see [20]). A controlled version of this dynamic was introduced in [14] to illustrate the convergence of the value iteration algorithm for Markov control processes with average costs. Furthermore, this dynamic has been used in several contexts of Markov control processes; see, e.g., [16,27]. Despite this, in these works, the optimal policy is not characterized, except by [11], where it is considered as a compact action set and bounded cost. The characterization of (s,S) policies is an interesting problem in inventory theory, this problem can be traced back to the papers of [13,22,28]. One advantage of guaranteeing the existence of (s,S) policies is their ease of implementation. However, it is not always possible to guarantee the existence of a (s,S) inventory policy; see, for instance, [2]. Therefore, it is necessary to establish conditions in the stochastic control model that guarantee the existence of (s,S) inventory policies. In addition, there is recent work in applied areas where strategies (s, S) are implemented in the real world, such as in healthcare [1,29] and in machine learning [8], to name a few. A survey work presenting applications of inventory theory in the real world can be found in [22]. On the other hand, recent research from the theoretical point of view of the model is presented in [6], where the optimality of strategies (s, S) is proved as a particular case of their setup concave linear piecewise ordering costs. The perishable inventory with backlogging demand was studied in [30]. In contrast to the work described in this manuscript, there is an extensive literature on continuous-time inventories, e.g., the recent work of [4], which studies perishable inventories.

    Based on the review conducted, the main objectives of this work can be described as follows:

    ● To provide conditions that guarantee the optimality of inventory strategies for a system with lost sales (see Theorem 4.10).

    ● To propose approximation procedures for the strategies described in the previous point (see Theorem 5.3).

    It is important to note that the goal of the first point is a more complex challenge, as opposed to a backlog inventory system as described in [22].

    The methodology of the paper is as follows. First, the dynamic programming approach is validated, and the existence of optimal stationary policies is verified. Second, results of convex analysis are applied to guarantee that the minimizers of the value iteration functions and the optimal policies of the inventory system are (s,S)-policies. This characterization is achieved under assumptions of continuity and monotonicity in the components of the inventory control model; these conditions guarantee that the cost function is convex. The convexity property has been used previously in the inventory systems literature, see for example [9,10,13,18,23,25,26,28,31].

    The paper is organized as follows. Section 2 introduces the inventory control model and the assumptions about the components of the inventory control model. Section 3 introduces the dynamic programming approach. Section 4 presents the characterization of (s,S) policies. Section 5 contains an analysis of the convergence of minimizers of value iteration. Finally, a numerical example is given in Section 6.

    Consider a discrete-time inventory control system. If Xt denotes the inventory at time t=0,1,, the evolution of the system is modeled by the following Lindley-type dynamical system:

    Xt+1=(Xt+ηtatDt+1)+, (2.1)

    with X0=xX:=[0,) known, where

    i) {ηt} is a sequence of independent and identically distributed (i.i.d.) Bernoulli random variables with parameter p, 0<p<1, in this case the event {ηt=1} means that an order placed at instant t has been supplied.

    ii) at denotes the control or decision applied at time t and represents the quantity ordered by the inventory manager (or decision maker).

    iii) The sequence {Dt} is conformed by i.i.d. non-negative random variables with a common distribution F. Dt denotes the demand within period t. It is assumed that sequence {Dt} is independent of the sequence {ηt}.

    Note that the difference equation given in (2.1) induces a stochastic kernel that can be expressed on X by

    K:={(x,a):xX,a[0,)},

    is defined as follows

    Q(Xt+1(,y]|Xt=x,at=a)=p(F(x+a)F(x+ay))+p(1F(x+a))+(1p)(F(x)F(xy))+(1p)(1F(x))) (2.2)

    with y,x,a[0,) and

    Q(Xt+1(,y]|Xt=x,at=a)=0,

    if x,a[0,) and y<0.

    Suppose further that it is associated with a one-step cost function C:K[0,), defined as follows:

    C(x,a)=KI{a:a>0}(a)+ca+E[h((x+ηaD)+)]+E[l((D(x+ηa))+)], (2.3)

    where K0 is a fixed order price, c>0 is the order price per unit, h:[0,)[0,) denotes the holding cost per period, l:[0,)[0,) indicates the shortage cost for unfilled demand and E denotes the expectation with respect to the joint distribution of the random vector (η,D), where (η,D) is a generic element of the sequence {(ηt,Dt)}. In all the following sections, the cost function is determined by the sum of two cost functions:

    C(x,a)=g(a)+H(x,a),

    where g(a)=KI{a:a>0}(a)+ca and

    H(x,a)=E[h((x+ηaD)+)]+E[l((D(x+ηa))+)], (2.4)

    with (x,a)K.

    Then, the inventory control model is identified with a Markov control process [17], in short, MCP. The components of the associate Markov control model are as follows: X:=[0,) is the state space, A:=[0,) is the action space, the dynamic and the cost function are given by (2.2) and (2.3), respectively. Consequently, the inventory system evolves as follows: If the stock of the inventory systems occupies state Xt=x at time t and a controller (inventory manager) requests the quantity of product at=a. Then, a cost C(x,a) is incurred and the system jumps into a state Xt+1 according to the transition law Q(|x,a). Once the transition into the new state has occurred, a new order (control) is requested and the process is repeated. Thus, for each t1 an admissible history ht of the inventory system up to the transition t given by, ht=(X0,a0,,Xt1,at1,Xt). Let Ht, t1, be the set of all admissible histories of the system up to the transition t. The following definition will be used to characterize optimal strategies in the inventory control model.

    Definition: A policy π={πt} is a sequence of stochastic kernels πt on A given Ht, satisfying the constraint: πt(A|ht)=1, for each htHt, t1. The collection of all policies is denoted by Π. Define F as the set of all measurable functions f:XA. Thus, a Markov policy is a sequence ft such that ftF, for t1. In particular, a Markov policy π={ft} is said to be stationary if ft is independent of t, i.e., ft=fF, for all t1, in this case, ft is denoted by f and refers to F as the set of stationary policies.

    In the subsequent, for each xX and πΠ will be denoted by Pπx the measure defined on the measurable space Ω:=((X×A),F), where F is the corresponding product σ-algebra. The measure Pπx is induced by the theorem of Ionescu Tulcea [3]. The expectation operator with respect to Pπx is denoted by Eπx.

    Inventory control problem

    The goal of this subsection is to introduce the inventory control problem, then consider πΠ, xX and define the following objective function:

    v(π,x):=Eπx[t=0αtC(Xt,at)], (2.5)

    where α(0,1) denotes a discount factor. The performance criterion defined in (2.5) is called expected total discounted cost (see [17]). Hence, the optimal inventory control problem consists in determining a policy πΠ such that,

    v(π,x)=infπΠv(π,x), (2.6)

    for each xX, in this case π is denominated optimal policy. The function V defined by

    V(x):=infπΠv(π,x), (2.7)

    for each xX, it will be called the optimal value function.

    In subsequent sections, the following assumption will be considered.

    Basic Assumption (BA):

    i) h is a non-decreasing convex function such that h(0+)=h(0), where h(0+):=inf{h(y):y>0}.

    ii) l is a convex function such that l(0)=0 and l(u) exists and is non-negative for each u0.

    iii) E[h((uD)+)]< and E[l((Du)+)]<, for each u0 and D is a generic element of the sequence {Dt}.

    iv) If D is a generic element of the sequence {Dt}, it is assumed that D has a continuous density denoted by Δ.

    BA will not be mentioned in each lemma or theorem throughout the paper, and we will assume that it holds.

    Remark 2.1. i) Observe that the cost function (2.4) is convex, due to the convexity and monotonicity of functions h and l (see BA i) and ii)). Therefore, the cost function (2.3) is a convex function, this property will be used in the proof of Lemma 4.3.

    ii) Moreover, under BA i) and ii), h and l are continuous functions. This characterization will help us apply to demonstrate the validity of Lemma 3.1.

    Assuming BA, the inventory control model in the MCP literature is identified as a semi-continuous model [17]. Moreover, the following lemmas guarantee the existence of optimal stationary policies and the dynamic programming approach; see Theorem 3.7 below.

    Lemma 3.1. The cost function C, is a lower semi-continuous (l.s.c.) function, i.e., the set {(x,a)K:C(x,a)λ} is closed for all λR.

    Proof. First, we can see directly that the cost g is a continuous function for a>0 and l.s.c. at a=0. We now want to prove that the function H is continuous for any (x,a)K. For this purpose, we fix xX and aA. Take xnX, anA, n1, such that xnx, ana, when n goes to infinity. Consequently, Eq (2.4) gives H(xn,an)H(x,a) as n, by applying the dominated convergence theorem [3] and continuity of the functions h and l; the latter by Remark 2.1. Therefore, the cost function C is a l.s.c..

    Corollary 3.2. The cost C is an inf-compact function, namely the set {aA:C(x,a)λ} is compact for each xX and λR.

    Proof. Note that {aA:caλ}=[0,λ/c], for any λR. On the other hand, by Lemma 3.1, for all λR and xX the set defined as M:={aA:C(x,a)λ} is closed set. Consequently, since M is contained in the compact set {aA:caλ}, it is concluded that the cost function C is inf-compact.

    Lemma 3.3. Transition law Q (2.2) induced by the difference equation (2.1) is strongly continuous, i.e., w(x,a):=Xu(y)Q(dy|x,a) is continuous and bounded on K for every measurable bounded function u on X.

    Proof. Let u be a measurable and bounded function defined on X and let {(xn,an)} be a sequence on K convergent to (x,a)K. Then, from (2.2), we have that

    w(xn,an)=Xu(y)Q(dy|xn,an)=p0u((xn+ans)+)Δ(s)ds+(1p)0u((xns)+)Δ(s)ds=pxn+an0u(xn+ans)Δ(s)ds+(1p)xn0u(xns)Δ(s)d(s)+(1p)u(0)xnΔ(s)ds+pu(0)xn+anΔ(s)ds.

    Observe by applying an adequate change of variable, the following identity holds:

    xn+an0u(xn+ans)Δ(s)ds=I(,xn+an](s)u(s)Δ(xn+ans)ds.

    On the other hand, it is easy to prove that {I(,xn+an]} converges to {I(,x+a]} almost everywhere (a.e.) with respect to the Lebesgue measure m on R. Moreover, let θ(0,) such that |u(x)|θ, for all xX and consider the following functions:

    rn(s)=I(,xn+an](s)u(s)Δ(xn+ans),r(s)=I(,x+a](s)u(s)Δ(x+as),gn(s)=θI(,xn+an](s)Δ(xn+ans),g(s)=θI(,x+a](s)Δ(x+as),

    with s[0,). Then, notice that the following statements hold:

    i) {rn} converges to r a.e. with respect to m.

    ii) {gn} converges to g a.e. with respect to m.

    iii) For each n=1,2,..., |rn()|gn().

    iv) gn(s)ds=g(s)=θ.

    Statements i)–iv) guarantee the hypothesis of the Dominated Convergence Theorem (see[3]), in consequence, it is obtained that limn+rn(s)ds=r(s)ds, i.e.,

    xn+an0u(xn+ans)Δ(s)dsx+a0u(x+as)Δ(s)ds. (3.1)

    Similarly, it is possible to prove that:

    xn0u(xns)Δ(s)d(s)=I(,xn](s)u(s)Δ(xns)dsI(,x](s)u(s)Δ(xs). (3.2)

    Besides, due to the continuity of the distribution function F, it yields that

    xnΔ(s)ds=1F(xn)1F(x)=xΔ(s)ds,%xn+anΔ(s)ds=1F(xn+an)1F(x+a)=x+aΔ(s)ds. (3.3)

    In consequence, from (3.1)–(3.3), it is obtained that

    limn+w(xn,an)=px+a0u(s)Δ(x+as)ds+(1p)x0u(s)Δ(xs)ds+(1p)u(0)xΔ(s)ds+pu(0)x+aΔ(s)ds=w(x,a).

    Therefore, w is a continuous function on K.

    The demonstration of Lemma 3.4 is based on a result of renewal processes used in classical theory of inventories (see, e.g., [4,12]). For this purpose, define for each xX, the renewal process

    N(x):=sup{t0|Wtx}, (3.4)

    where W0:=0 and Wt:=tj=1Dj. Observe that E[N(x)]< for each x0, a proof of this fact can be consulted in [15]. Furthermore, consider the residual lifetime defined as

    R(x):=WN(x)+1x,xX. (3.5)

    Lemma 3.4. For all xX, E[l(R(x))]<.

    Proof. Let xX be fixed. Observe that the tail of the distribution of the residual lifetime is given for z0 as follows

    P(R(x)>z)=n=1P(Wn1x,Wn>x+z)=1F(x+z)+x0(1F(x+zs))dU(s),

    where U:=E[N(x)] is the renewal function. On the other hand, from assumption BA i), it is obtained that

    E[l(R(x))]=l()0P(l(R(x)>z))dz=0P(l(R(x))>l(z)))dl(z)=0P(R(x)>z)l(z)dz,

    where l():=limul(u). In consequence, by substituting the tail distribution of function R in the last equation, it yields that

    E[l(R(x))]=0l(z)(1F(x+z))dz+0(x0l(z)(1F(x+zs))dU(s))dz.

    Now, observe that

    0l(z)(1F(x+z))dz0l(z)P(D>z)dz=E[l(D)],

    and from Fubini's theorem [3], it is obtained that

    0x0l(z)(1F(x+zs))dU(s)dzx0(0P(D>z)l(z)dz)dU(s)=x0E[l(D)]dU(s)=E[l(D)]E[N(x)].

    Therefore,

    E[l(R(x))]E[l(D)](1+E[N(x)])<.

    Since state x is arbitrary, the result follows.

    Lemma 3.5. There exists ˜πΠ such that v(˜π,x)<, for all xX.

    Proof. Consider xX fixed and the stationary policy ˜π={˜g,˜g,}F with ˜g(y)=0 for all yX. Hence, the stochastic path {X˜gt} generated by ˜π is given by

    X˜gt+1=(X˜gtDt+1)+, (3.6)

    with X˜g0=x. Consequently, observe that

    X˜gt={xWtift=0,,N(x),0iftN(x)+1. (3.7)

    Then, due to (2.3) and (2.4), it yields that

    v(˜π,x)=E˜πx[t=0αtH(X˜gt,0)]=E˜πx[t=0αt(E[h((X˜gtDt+1)+)]+E[l((Dt+1X˜gt)+)])].

    Now, from (3.6) and the renewal process (3.4), it may be found that

    v(˜π,x)E[l(D)]+h(0)1α+E˜πx[N(x)t=0αt(E[h((X˜gtDt+1)+)]+E[l((Dt+1X˜gt)+)])].

    On the other hand, observe that

    N(x)t=0αt(E[h((X˜gtDt+1)+)]+E[l((Dt+1X˜gt)+)])=N(x)t=0αt(E[h((xWt+1)+)]+E[l((Wt+1x)+)])1αN(x)t=1αt(E[h((xWt)+)]+E[l((Wtx)+)])+h(0)+E[l(WN(x)+1x)]11α(h(x)+l(0))+h(0)+E[l(WN(x)+1x)],P˜πxa.s,

    where the second inequality was obtained since h is a non-decreasing function (see BA i)). Finally, both inequalities follow from (3.4). In consequence, since l(0)=0, it is obtained that

    v(˜π,x)E[l(D)]+h(x)+h(0)1α+E[l(WN(x)+1x)].

    By applying Lemma 3.4 to the last relation and due to E[l(D)]< as a consequence of BA iii), it is obtained that v(˜π,x)<. Since xX is arbitrary, the result holds.

    Remark 3.6. Note that Lemma 3.5 guarantees that V(x)<, for all xX, due to V(x)v(˜π,x), for all xX.

    As a consequence of the previous results, the following theorem is valid (see Theorem 4.2.3 and Lemma 4.2.8 in [17]).

    Theorem 3.7. The following statements hold:

    i) For each xX, the optimal value function satisfies the dynamic programming equation:

    V(x)=minaA{C(x,a)+αE[V((x+ηaD)+)]}.

    ii) There exists an optimal stationary policy fF such that for each xX the following equation holds

    V(x)=C(x,f(x))+αE[V((x+ηf(x)D)+)].

    iii) The value iteration functions, defined as V0(x):=0 and

    Vn(x):=minaA{C(x,a)+αE[Vn1((x+ηaD)+)]},

    n1,xX, converge monotonically increasing to the optimal value function V.

    Remark 3.8. Minimizers of the value iteration functions will be denoted by fn, n0. These minimizers satisfy the following, f0(x)=0 and for each n1: Vn(x)=C(x,fn(x))+αE[Vn1((x+ηfn(x)D)+)], for all xX.

    This section deals with the characterization of the minimizers of the value iteration functions and the optimal policies by (s,S) policies.

    Definition 4.1. Let fF be a stationary policy, if there exists (s,S)R2 such that 0sS and

    f(x)={Sxifxs,0ifx>s, (4.1)

    f is called a (s,S) stationary policy.

    Define the following functions

    ˆVn(u):=E[Vn((uD)+)],n=0,1,, (4.2)

    and

    Gn(u):=cu+pˆH(u)+αpˆVn1(u),n=1,2,, (4.3)

    for uX, where ˆH(u)=H(u,0). In consequence, value iteration functions {Vn} can be expressed for each n1 and xX as follows,

    Vn(x)=min{cx+pˆH(x)+αpˆVn1(x),infa>0{K+c(x+a)+pˆH(x+a)+αpˆVn1(x+a)}}cx+(1p)ˆH(x)+α(1p)ˆVn1(x)=min{Gn(x),infa>0{K+Gn(x+a)}}cx+(1p)ˆH(x)+α(1p)ˆVn1(x).

    Making the change of variable y:=x+a, the previous equation is equivalent to

    Vn(x)=min{Gn(x),infyx{K+Gn(y)}}cx+(1p)ˆH(x)+α(1p)ˆVn1(x), (4.4)

    for n1 and V0(x)=0, xX.

    The following definition will be applied to characterize the optimal policy [21].

    Definition 4.2. A function ϑ:[0,)[0,) is called norm-like if ϑ(x) as x; this means that the sub-level sets {x:ϑ(x)r} are precompact for each r>0.

    Lemma 4.3. For each n=1,2,, the following statements hold

    i) Vn is a non-decreasing convex function,

    ii) Gn is a convex function on X,

    iii) Gn is norm-like.

    Proof. i) The proof is by induction. It will be proved that V1 is a non-decreasing function,

    mina[0,)C(x,a)mina[0,)C(y,a),

    if x,yX and xy, due to C(,a) is a non-decreasing function for each a[0,)] (see (2.3) and AIM). Then, V1 is a non-decreasing function. On the other hand, using Lemma 1 in [19] together with the fact that C is a convex function, it is obtained that V1 is a convex function. Now, suppose that Vn is a non-decreasing convex function. To prove that Vn+1 is a non-decreasing function, observe that, as a consequence of Vn is non-decreasing, the following inequality holds

    Vn((x+ηas)+)Vn((y+ηas)+),

    for all x,yX, with xy and a,s[0,). In consequence,

    C(x,a)+αE[Vn((x+ηaD)+)]C(y,a)+αE[Vn((y+ηaD)+)], (4.5)

    for all x,yX, with xy and a,s[0,). Consequently, taking the minimum with respect to a, in each side of inequality (4.5), it results that Vn+1(x)Vn+1(y), x,yX with xy. Now, it will be proved that Vn+1 is a convex function. To this end, note that C(x,a)+αE[Vn((x+ηaD)+)] is a convex function for each (x,a)K, this statement is a consequence of the induction hypothesis and the convexity of the cost function. Then, from Lemma 1 of [19], it may be concluded that Vn+1 is a convex function.

    ii) The previous statement implies that ˆVn is a convex function for each n1, as the following relations evidence it:

    ˆVn(λu1+(1λ)u2)=E[Vn((λu1+(1λ)u2D)+)]E[λVn((u1D)+)+(1λ)Vn((u2D)+)]=λˆVn(u1)+(1λ)ˆVn(u2),

    for each u1,u2X, λ[0,1]. Consequently, since ˆH is convex, (4.3) implies that Gn is a convex function for each n1.

    iii) Observe that Gn(u)cu, for n1 with c>0. Hence, for each n0, Gn(u), when u. This implies that Gn is a norm-like function for each n1.

    Remark 4.4. Observe that for each n1, we have that

    i)Gn (4.3) is a continuous function on (0,), as a consequence of convexity of function Gn.

    ii) Given that Gn is a norm-like function it follows that

    argminyXGn(y):={z0:Gn(z)=minyXGn(y)},

    is a non-empty set.

    Lemma 4.5 is a modified version of Lemma 2.1 of [7], the proof of this lemma is presented here for the completeness of the paper.

    Lemma 4.5. For each n1, let Snargminy0Gn(y) and sn:=inf{0xSn:Gn(x)K+Gn(Sn)} then the following statements hold:

    i)Gn(sn)=K+G(Sn), if sn>0.

    ii)Gn(x)K+Gn(Sn), 0snxSn.

    iii)K+Gn(Sn)Gn(x), 0x<sn.

    iv)Gn(x) is a decreasing function on [0,sn).

    Proof. Let n1. i) Suppose that sn>0 then Gn(0)>K+Gn(Sn)Gn(sn). Hence, there exists u(0,sn) such that Gn(u)=Gn(Sn)+K, because of the continuity of function Gn (see Remark 4.4). Now, observe that snu, due to the definition of sn, then sn=u and the result holds.

    ii) This statement follows directly from the definition of sn.

    iii) Consider 0x<sn, by Lemma 4.3 ii), it follows that

    Gn(sn)snxSnxGn(Sn)+SnsnSnxGn(x),

    then

    Gn(sn)+Snsnsnx(Gn(sn)Gn(x))=Snsn+snxsnxGn(sn)SnsnsnxGn(x)Gn(Sn)K+Gn(Sn).

    Thus,

    Gn(sn)+Snsnsnx(Gn(sn)Gn(x))K+Gn(Sn). (4.6)

    Now, due to K+Gn(Sn)Gn(sn)=0 given that and xsnSn, (4.6) implies that Gn(sn)Gn(x). Therefore, K+Gn(Sn)=Gn(sn)Gn(x) with 0x<sn.

    iv) Let x1,x2[0,sn) with x1x2. Hence, by Lemma 4.3 ii),

    K+Gn(Sn)Gn(x2)+Snx2x2x1(Gn(x2)Gn(x1)). (4.7)

    Now, from statement iii), it is obtained that Gn(x2)K+Gn(Sn), this relation and (4.7) together lead to 0Gn(x2)Gn(x1) and the result follows.

    A consequence of Lemma 4.5 is the following result.

    Theorem 4.6. Consider {(sn,Sn):n=1,2,} in Lemma 4.5. Then, the minimizers of the value iteration functions are given by

    fn(x)={Snxifxsn0ifx>sn, (4.8)

    with 0<snSn and n=1,2,...

    Proof. Let n1 and xX. The proof will proceed by considering three cases depending on the order relation between state x, sn, and Sn. The following simple but important claim is established.

    Claim 1: uargminyx{K+Gn(y)} if and only if uxargmina0{K+Gn(x+a)}.

    Case 1. Suppose that x<sn, Lemma 4.5 iii) yields that K+Gn(Sn)Gn(x). Consequently, by (4.4) and Claim, it follows that fn(x)=Snx.

    Case 2. Consider snxSn, then by Lemma 4.5 ii), it is obtained that Gn(x)K+Gn(Sn). In consequence, (4.4) implies that fn(x)=0.

    Case 3. Finally, assume that Sn<x, since Gn is a convex function, we have that Gn is a non-decreasing function on (Sn,), this fact implies that xargminyx{K+Gn(y)}. Now, from Claim 1, it follows that 0argmina0{K+Gn(x+a)}. Therefore, fn(x)=0.

    The previous cases guarantee the truth of Theorem 4.6.

    Remark 4.7. Observe that Theorem 3.7 and (4.4) imply that the optimal value function satisfies the following equation

    V(x)=min{G(x),infyxG(y)}cx+(1p)ˆH(x)+α(1p)V(x), (4.9)

    with

    G(u):=cu+pˆH(u)+αpV(u),u0, (4.10)

    and

    V(u):=E[V((uξ)+)].

    Since VnV because of Theorem 3.7 and by the Dominated Convergence Theorem, it yields that

    GnG. (4.11)

    Lemma 4.8. G is a norm-like and convex function.

    Proof. Observe that G(x)cx for each x0. Hence, G(x) as x goes to infinity. Thus, by Definition 4.2, G is a norm-like function. Furthermore, (4.11) implies that G is a convex function since Gn is a convex function for each n1 (see Lemma 4.3).

    The proofs of Lemma 4.9 and Theorem 4.10 are in similar lines with the proofs of Lemma 4.5 and Theorem 4.6, respectively. Thus, the proofs will be omitted.

    Lemma 4.9. Let Sargminy0G(y) and s:=inf{0xS:G(x)K+G(S)} then the following statements are valid.

    i)G(s)=G(S)+K, if s>0.

    ii)G(x)K+G(S), 0sxS.

    iii)G(S)+KG(x), 0x<s.

    iv)G(x) is a decreasing function on [0,s).

    Theorem 4.10. Consider (s,S) as in Lemma 4.9. Hence, the optimal policy for the inventory system is an (s,S) policy given by

    f(x)={Sxsixs,0six>s,

    with 0<sS.

    Remark 4.11. Observe that in Theorem 4.10, if s=0 then G(x)G(S)+K, for all x0, which implies that f(x)=0, for all x0. A similar argumentation guarantees the following assertion: fn(x)=0, xX, if sn=0 for n1.

    In this section, the convergence of the minimizers of the value iteration function will be analyzed. First, the next auxiliary results are exposed.

    Lemma 5.1. Let ˆC(0,) be a compact set. Then, the following statements hold.

    i){Gn} converges uniformly to G on ˆC.

    ii) For each {un}ˆC such that unuˆC, limnGn(un)=G(u).

    Proof. i) This statement is a direct consequence of (4.11) and Theorem 7.13 in [24].

    ii) Applying i) of Lemma 5.1, it follows that for all ϵ>0 there exists N11 such that for each xˆC,

    |Gn(x)G(x)|<ϵ/2, (5.1)

    if nN1. Furthermore, by continuity of function G on ˆC, there exists δ>0 such that

    |G(y)G(x)|<ϵ/2, (5.2)

    if |yx|<δ. Now, since un converges to u when n goes to infinity, there exists N21 such that

    |unu|<δ, (5.3)

    if nN2. Let N=max{N1,N2}. Thus, (5.2) and (5.3) together leads to

    |G(un)G(u)|<ϵ/2, (5.4)

    if nN. Then, taking x=un in (5.1), it yields that

    |Gn(un)G(un)|<ϵ/2. (5.5)

    Therefore, by (5.4) and (5.5), |Gn(un)G(u)|<ϵ. This last relation implies that Gn(un) converges to G(u), when n goes to infinity.

    Lemma 5.2. Let B:={x>0|G1(x)infy0G(y)+K} and {(sn,Sn):n1} as in Lemma 4.5. Then, the following statements hold

    i)B is a compact set in R.

    ii)sn,SnB, for each n=1,2,....

    iii) There exists a subsequence {(snk,Snk)} of {(sn,Sn)}, which converges to (s,S) such that Sargminy0G(y) and G(s)=K+G(S) with 0<sS.

    Proof. i) First, observe that B is a closed set, since G1 is a continuous function on (0,). Now suppose that B is not bounded, then there exists a sequence {bn}B such that bn converges to infinity. Thus,

    =lim infnG1(bn)infy0G(y)+K.

    The previous relation is a contradiction since infy0G(y)+K<. Therefore B is a compact set.

    ii) Observe that for each n=1,2,... the following inequalities are valid

    G1(Sn)Gn(Sn)=infy0Gn(y)infy0G(y)+K,G1(sn)Gn(sn)=infy0Gn(y)+Kinfy0G(y)+K,

    since {Gn} is a non-decreasing sequence whose limit is G and the equalities are valid due to Lemma 4.5. Then, for each n=1,2,..., sn, SnB.

    iii) The previous statements i) and ii) imply that there exists a subsequence {(snk,Snk)} convergent to (s,S)B2, with 0<sS. Furthermore, due to Lemma 5.1 ii), it yields that

    limkGnk(Snk)=G(S), (5.6)
    limkGnk(snk)=G(s). (5.7)

    Now, Lemma 4.5 implies that Gnk(Snk)Gnk(x), x0. Then, when k goes to infinity in the last inequality, it is obtained that G(S)G(x), x0. In consequence, Sargminy0G(y). On the other hand, as a consequence of Lemma 4.5 ii), (5.6) and (5.7), it follows that G(s)=K+G(S).

    Theorem 5.3. Let {fn} be the sequence of minimizers of value iteration functions. Then, there exists a subsequence {fnk} such that converges uniformly on X to an (s,S) optimal policy.

    Proof. First, in view of Theorem 4.6, for each n1, fn(x)=(Snx)I{x:xsn}(x) with 0<snSn. Then, using iii) of Lemma 5.2, it is obtained a subsequence (snn,Snk) such that (snn,Snk)(s,S) when k goes to infinity and (s,S) satisfies the statements i)iv) of Lemma 4.9. Thus, consider

    f(x)={Sxsixs0six>s,

    and observe that,

    supxX|fnk(x)f(x)||SnkS|0,

    when k. Furthermore, f is an optimal policy due to Theorem 4.10. This concludes the proof of the theorem.

    Consider an inventory control system with an exponential distribution on the demand with parameter λ=0.1 and suppose that the parameter of the variable η is p=0.5. On the other hand, for each u0, suppose that h(u)=γ1u, l(u)=γ2u, where γ1, γ2 are non negative constants. In this case, γ1 represents the cost per unit shortage and γ2 represents the cost per unit demand not supplied.

    Furthermore, observe that for each u0,

    E[h((uD)+)]=γ1λ(λu+eλu1),
    E[l((Du)+)]=γ2(eλuλ).

    Then by (4.3) for n=1, it yields that

    G1(u)=(pγ1+c)u+p(γ1+γ2)eλuλγ1λ,u0.

    Lemma 6.1. For each n=1,2,..., Gn is a strictly convex function.

    Proof. First, observe that the derivative of G1 is given by G1(u)=pγ1+cp(γ1+γ2)eλu and the second derivative G1(u)=λp(γ1+γ2)eλu>0,u0. Consequently, the function G1 is strictly convex. Then, since Gn(u)=G1(u)+αpˆVn1(u), u0 (see 4.3), it is concluded that for each n=1,2,..., Gn is a strictly convex function, due to ˆVn is a convex function for each n1 (see proof of Lemma 4.3) and G1 is a strictly convex function. Then, the result follows.

    Remark 6.2. A consequence of Lemma 6.1 is the uniqueness of minimizers of the value iteration functions (see (4.4)). Furthermore, observe that (4.10) can be rewritten for each u0 as follows G(u)=G1(u)+αpV(u), which is a strictly convex function, as a consequence of Lemma 6.1 and Remark 4.7. Hence, the optimal policy is unique due to the Eq (4.9).

    Lemma 6.3. The unique minimizer of function G1 is y=λ1(ln(γ1+c/p)ln(γ1+γ2)>0, if c/p<γ2.

    Proof. Observe that, due to Lemma 6.1, the minimizer y is characterized by the first order condition:

    pγ1+cp(γ1+γ2)eλy=0. (6.1)

    Thus, y=λ1ln((γ1+c/p)(γ1+γ2)1). In particular, since c/p<γ2, it is obtained that (γ1+c/p)(γ1+γ2)1<1, in consequence, y>0.

    The following algorithm is applied to approximate the optimal value function V and the optimal policy f, for each state x given as initial condition. The validity of the next algorithm is sustained in the following results: Lemma 4.9, Theorem 4.10, Lemmas 5.2 and 6.3.

    Consider the following parameters: γ1=γ2=30, K=1.5, c=2.5, ϵ=0.01, α=0.2 and x=40, observe that γ2>c/p. Table 1 illustrates the numerical results obtained by applying the previous Algorithm 1.

    Table 1.  The table displays convergence of value iteration function and their minimizers.
    n sn Sn Vn(x) fn(x)
    1 49.79 53.89 1200.22 13.89
    3 52.75 55.28 1438.65 14.80
    5 55.72 55.75 1496.62 15.28
    7 53.82 53.89 1497.04 15.75
    8 53.82 53.89 1498.92 15.75

     | Show Table
    DownLoad: CSV

    Algorithm 1 Approximation of the optimal policy f and the optimal value function V
    Require: K>0,c>0,λ>0,γ1>0,γ2>c/p>0,x>0 and ϵ,α(0,1)
    1: S1y
    2: G1(u)(pγ1+c)u+p(γ1+γ2)eλu/λ
    3: s1x such that G1(x)=G1(S1)+K
    4: Repeat
    5:  for n do
    6:  ˆVn(u)E[Vn((uD)+)]
    7:   if usn
    8:    Vn(u)Gn(Sn)+Kcu+(1p)ˆH(u)+α(1p)ˆVn1(u)
    9:   else
    10:    Vn(u)Gn(u)cu+(1p)ˆH(u)+α(1p)ˆVn1(u)
    11:  Gn(u)G1(u)+ˆVn(u)
    12:  Snargminu0Gn(u)
    13:  snx such that Gn(x)=Gn(Sn)+K
    14:  until |Sn1Sn|<ϵ and |sn1sn|<ϵ
    15: SSn and ssn
    16: f(x)Sx if xs else f(x)0
    17: V(x)Vn(x)
    18: return f(x) and V(x)

    Observe that the convergence of the optimal value function and (s,S) policy are illustrated, see Theorem 4.10 and Lemma 5.2, respectively. Therefore, f(x)=15.75 and V(x)=1498.92 for x=40 with an error ϵ=0.01.

    In this paper, a discrete-time inventory system was presented. In the manuscript, conditions for convexity and monotonicity in the components of the Markov control model were proposed. In this inventory system, the existence of stationary policies (s, S) was proved. To achieve this goal, the methodology of dynamic programming in discrete time was applied. Moreover, the existence of a subsequence of minimizers of the value iteration functions converging to a (s, S) optimal policy of the inventory system was proved. Finally, a numerical algorithm for approximating the optimal inventory cost and policy is presented and applied in a numerical example. Future work in this direction includes the following:

    ● Incorporating Markovian demand into the inventory model.

    ● Investigating the lost-sales inventory system under other performance measures, e.g., considering the long average or risk-sensitive criteria.

    ● Implementing the manuscript's proposal on real-world data by finding a suitable database for the assumptions presented in this manuscript.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    The authors would like to thank the referees for their valuable comments, corrections and suggestions, which lead to an improvement of the original paper.

    The authors declare that they have no competing interests.



    [1] E. Ahmadi, H. Mosadegh, R. Maihami, I. Ghalehkhondabi, M. Sun, G. A. Süer, Intelligent inventory management approaches for perishable pharmaceutical products in a healthcare supply chain, Comput. Oper. Res., 147 (2022), 105968. https://doi.org/10.1016/j.cor.2022.105968 doi: 10.1016/j.cor.2022.105968
    [2] Y. Aneja, A. Noori, The optimality of (s, S) policies for a stochastic inventory problem with proportional and lump-sum penalty cost, J. Manag. Sci., 33 (1987), 750–755. https://doi.org/10.1287/mnsc.33.6.750 doi: 10.1287/mnsc.33.6.750
    [3] R. Ash, Real analysis and probability, New York: Academic Press, 1972. https://doi.org/10.1016/C2013-0-06164-6
    [4] Y. Barron, O. Baron, QMCD Approach for perishability models: The (S, s) control policy with lead time, IISE Trans., 52 (2020), 133–150. https://doi.org/10.1080/24725854.2019.1614697 doi: 10.1080/24725854.2019.1614697
    [5] A. Bensoussan, Dynamic programming and inventory control, In: Studies in Probability, Amsterdam: IOS Press, 3 (2011). https://doi.org/10.3233/978-1-60750-770-3-i
    [6] A. Bensoussan, M. A. Helal, V. Ramakrishna, Optimal policies for inventory systems with piecewise-linear concave ordering costs, Available at SSRN 3601262, 2020. https://dx.doi.org/10.2139/ssrn.3601262
    [7] D. Bertsekas, Dynamic programming and optimal control, 2 Eds., Athena scientific, 2012.
    [8] B. Chen, X. Chao, C. Shi, Nonparametric learning algorithms for joint pricing and inventory control with lost sales and censored demand, Math. Oper. Res., 46 (2021), 726–756. https://doi.org/10.1287/moor.2020.1084 doi: 10.1287/moor.2020.1084
    [9] D. Cruz-Suárez, R. Montes-de-Oca, Uniform convergence of value iteration policies for discounted Markov decision processes, B. Soc. Mat. Mex., 12 (2006), 133–148.
    [10] D. Cruz-Suárez, R. Montes-de-Oca, F. Salem-Silva, Conditions for the uniqueness of optimal policies of discounted Markov decision processes, Math. Method. Oper. Res., 60 (2004), 415–436. https://doi.org/10.1007/s001860400372 doi: 10.1007/s001860400372
    [11] H. Daduna, P. Knopov, L. Tur, Optimal strategies for an inventory system with cost functions of general form, Cybern. Syst. Anal., 35 (1999), 602–618. https://doi.org/10.1007/bf02835856 doi: 10.1007/bf02835856
    [12] E. Feinberg, D. Kraemer, Continuity of discounted values and the structure of optimal policies for periodic-review inventory systems with setup costs, Nav. Res. Log., 2023, 1–13. https://doi.org/10.1002/nav.22108
    [13] E. Feinberg, Optimality conditions for inventory control, Optim. Chall. Complex Netw. Risk. Syst. INFORMS, 2016, 14–45. https://doi.org/10.1287/educ.2016.0145
    [14] E. Gordienko, O. Hernández-Lerma, Average cost Markov control processes with weighted norms: Value iteration, Appl. Math., 23 (1995), 219–237. https://doi.org/10.4064/am-23-2-219-237 doi: 10.4064/am-23-2-219-237
    [15] A. Gut, Stopped random walks, Limit Theorems and Applications, 2 Eds., New York: Springer, 2009. https://doi.org/10.1007/978-0-387-87835-5
    [16] X. Guo, Q. Zhu, Average optimality for Markov decision processes in Borel spaces: A new condition and approach, J. Appl. Probab., 43 (2006), 318–334. https://doi.org/10.1239/jap/1152413725 doi: 10.1239/jap/1152413725
    [17] O. Hernández-Lerma, J. Lasserre, Discrete-time Markov control processes: Basic optimality criteria, New York: Springer Science & Business Media, 2012.
    [18] D. Iglehart, Optimality of (s, S) policies in the infinite horizon dynamic inventory problem, Manag. Sci., 9 (1963), 259–267. https://doi.org/10.1287/mnsc.9.2.259 doi: 10.1287/mnsc.9.2.259
    [19] D. Iglehart, Capital accumulation and production for the firm: Optimal dynamic policies, Manag. Sci., 12 (1965), 193–205. https://doi.org/10.1287/mnsc.12.3.193 doi: 10.1287/mnsc.12.3.193
    [20] D. Lindley, The theory of queues with a single server, Math. Proc. Cambridge, 48 (1952), 277–289. https://doi.org/10.1017/S0305004100027638 doi: 10.1017/S0305004100027638
    [21] S. Meyn, R. Tweedie, Markov chains and stochastic stability, New York: Springer Science & Business Media, 1993. https://doi.org/10.1007/978-1-4471-3267-7
    [22] S. Perera, S. Sethi, A survey of stochastic inventory models with fixed costs Optimality of (s, S) and (s, S) type policies discrete-time case, Prod. Oper. Manag., 32 (2022), 131–153. https://doi.org/10.1111/poms.13820 doi: 10.1111/poms.13820
    [23] E. Porteus, On the optimality of generalized (s, S) policies, Manag. Sci., 17 (1971), 411–426. https://doi.org/10.1287/mnsc.17.7.411 doi: 10.1287/mnsc.17.7.411
    [24] W. Rudin, Principles of mathematical analysis (Vol. 3), New York: McGraw-hill (1976).
    [25] M. Schäl, On the optimality of (s, S)-policies in dynamic inventory models with finite horizon, SIAM J. Appl. Math., 30 (1976), 528–537. https://doi.org/10.1137/0130048 doi: 10.1137/0130048
    [26] S. Sethi, F. Feng, Optimality of (s, S) policies in inventory models with Markovian demand, Oper. Res., 45 (1997), 931–939. https://doi.org/10.1287/opre.45.6.931 doi: 10.1287/opre.45.6.931
    [27] O. Vega, R. Montes-de-Oca, Application of average dynamic programming to inventory systems, Math. Method. Oper. Res., 47 (1998), 451–471. https://doi.org/10.1007/bf01198405 doi: 10.1007/bf01198405
    [28] A. Veinott, H. Wagner, Computing optimal (s, S) inventory policies, Manag. Sci., 11 (1965), 525–552. https://doi.org/10.1287/mnsc.11.5.525 doi: 10.1287/mnsc.11.5.525
    [29] X. Xu, S. P. Sethi, S. H. Chung, Ordering COVID-19 vaccines for social welfare with information updating: Optimal dynamic order policies and vaccine selection in the digital age, IISE, 2023, 1–28. https://doi.org/10.1080/24725854.2023.2204329
    [30] H. Zhang, J. Zhang, R. Q. Zhang, Simple policies with provable bounds for managing perishable inventory, Prod. Oper. Manag., 29 (2020), 2637–2650. https://doi.org/10.2307/3214683 doi: 10.2307/3214683
    [31] Y. Zheng, A simple proof for optimality of (s, S) policies in infinite-horizon inventory systems, J. Appl. Probab., 28 (1991), 802–810. https://doi.org/10.2307/3214683 doi: 10.2307/3214683
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1501) PDF downloads(83) Cited by(0)

Figures and Tables

Tables(1)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog