Research article Special Issues

Predictability and identifiability assessment of models for prostate cancer under androgen suppression therapy

  • The past two decades have seen the development of numerous mathematical models to study various aspects of prostate cancer in clinical settings. These models often contain large sets of parameters and rely on limited data sets for validation. The quantitative analysis of the dynamics of prostate cancer under treatment may be hindered by the lack of identifiability of the parameters from the available data, which limits the predictive ability of the model. Using three ordinary differential equation models as case studies, we carry out a numerical investigation of the identifiability and uncertainty quantification of the model parameters. In most cases, the parameters are not identifiable from time series of prostate-specific antigen, which is used as a clinical proxy for tumor progression. It may not be possible to define a finite confidence bound on an unidentifiable parameter, and the relative uncertainties in even identifiable parameters may be large in some cases. The Fisher information matrix may be used to determine identifiable parameter subsets for a given model. The use of biological constraints and additional types of measurements, should they become available, may reduce these uncertainties. Ensemble Kalman filtering may provide clinically useful, short-term predictions of patient outcomes from imperfect models, though care must be taken when estimating ``patient-specific'' parameters. Our results demonstrate the importance of parameter identifiability in the validation and predictive ability of mathematical models of prostate tumor treatment. Observing-system simulation experiments, widely used in meteorology, may prove useful in the development of biomathematical models intended for future clinical application.

    Citation: Zhimin Wu, Tin Phan, Javier Baez, Yang Kuang, Eric J. Kostelich. Predictability and identifiability assessment of models for prostate cancerunder androgen suppression therapy[J]. Mathematical Biosciences and Engineering, 2019, 16(5): 3512-3536. doi: 10.3934/mbe.2019176

    Related Papers:

    [1] Alacia M. Voth, John G. Alford, Edward W. Swim . Mathematical modeling of continuous and intermittent androgen suppression for the treatment of advanced prostate cancer. Mathematical Biosciences and Engineering, 2017, 14(3): 777-804. doi: 10.3934/mbe.2017043
    [2] Tin Phan, Changhan He, Alejandro Martinez, Yang Kuang . Dynamics and implications of models for intermittent androgen suppression therapy. Mathematical Biosciences and Engineering, 2019, 16(1): 187-204. doi: 10.3934/mbe.2019010
    [3] Avner Friedman, Harsh Vardhan Jain . A partial differential equation model of metastasized prostatic cancer. Mathematical Biosciences and Engineering, 2013, 10(3): 591-608. doi: 10.3934/mbe.2013.10.591
    [4] Heyrim Cho, Allison L. Lewis, Kathleen M. Storey, Anna C. Zittle . An adaptive information-theoretic experimental design procedure for high-to-low fidelity calibration of prostate cancer models. Mathematical Biosciences and Engineering, 2023, 20(10): 17986-18017. doi: 10.3934/mbe.2023799
    [5] Alex Capaldi, Samuel Behrend, Benjamin Berman, Jason Smith, Justin Wright, Alun L. Lloyd . Parameter estimation and uncertainty quantification for an epidemic model. Mathematical Biosciences and Engineering, 2012, 9(3): 553-576. doi: 10.3934/mbe.2012.9.553
    [6] Cassidy K. Buhler, Rebecca S. Terry, Kathryn G. Link, Frederick R. Adler . Do mechanisms matter? Comparing cancer treatment strategies across mathematical models and outcome objectives. Mathematical Biosciences and Engineering, 2021, 18(5): 6305-6327. doi: 10.3934/mbe.2021315
    [7] Leo Turner, Andrew Burbanks, Marianna Cerasuolo . PCa dynamics with neuroendocrine differentiation and distributed delay. Mathematical Biosciences and Engineering, 2021, 18(6): 8577-8602. doi: 10.3934/mbe.2021425
    [8] Leonardo Schultz, Antonio Gondim, Shigui Ruan . Gompertz models with periodical treatment and applications to prostate cancer. Mathematical Biosciences and Engineering, 2024, 21(3): 4104-4116. doi: 10.3934/mbe.2024181
    [9] Yuganthi R. Liyanage, Nora Heitzman-Breen, Necibe Tuncer, Stanca M. Ciupe . Identifiability investigation of within-host models of acute virus infection. Mathematical Biosciences and Engineering, 2024, 21(10): 7394-7420. doi: 10.3934/mbe.2024325
    [10] Abeer S. Alnahdi, Muhammad Idrees . Nonlinear dynamics of estrogen receptor-positive breast cancer integrating experimental data: A novel spatial modeling approach. Mathematical Biosciences and Engineering, 2023, 20(12): 21163-21185. doi: 10.3934/mbe.2023936
  • The past two decades have seen the development of numerous mathematical models to study various aspects of prostate cancer in clinical settings. These models often contain large sets of parameters and rely on limited data sets for validation. The quantitative analysis of the dynamics of prostate cancer under treatment may be hindered by the lack of identifiability of the parameters from the available data, which limits the predictive ability of the model. Using three ordinary differential equation models as case studies, we carry out a numerical investigation of the identifiability and uncertainty quantification of the model parameters. In most cases, the parameters are not identifiable from time series of prostate-specific antigen, which is used as a clinical proxy for tumor progression. It may not be possible to define a finite confidence bound on an unidentifiable parameter, and the relative uncertainties in even identifiable parameters may be large in some cases. The Fisher information matrix may be used to determine identifiable parameter subsets for a given model. The use of biological constraints and additional types of measurements, should they become available, may reduce these uncertainties. Ensemble Kalman filtering may provide clinically useful, short-term predictions of patient outcomes from imperfect models, though care must be taken when estimating ``patient-specific'' parameters. Our results demonstrate the importance of parameter identifiability in the validation and predictive ability of mathematical models of prostate tumor treatment. Observing-system simulation experiments, widely used in meteorology, may prove useful in the development of biomathematical models intended for future clinical application.


    By 1941, Huggins and Hodes [1] had demonstrated that castration induces the regression of prostate tumors, which implies that the growth of prostate cancer cells is highly dependent on androgen (male sex hormones). Their work showed, perhaps for the first time, that some cancers potentially could be clinically treatable by chemical means, and Huggins shared the 1966 Nobel Prize in Medicine and Physiology for this discovery [2]. Nowadays, hormone therapy—specifically, androgen suppression therapy—is a common treatment for localized and locally advanced prostate cancer and has been shown to have a significant survival benefit, albeit with considerable side effects [3]. Typically, continuous androgen suppression (CAS) therapy is given until the tumor evolves resistance, at which point the patient's prognosis becomes unfavorable [4,5].

    Intermittent androgen suppression (IAS) therapy aims to reduce the side effects of CAS and potentially delay the onset of hormonal resistance [6]. During IAS, the patient is put on treatment until his blood serum level of prostate-specific antigen (PSA, a biomarker for prostate cancer) falls below a predetermined lower threshold. Treatment is halted until the patient's PSA level later rises above a predetermined upper threshold, when it is restarted. The on-off cycles are repeated until the tumor becomes resistant. Although IAS reduces the side effects of treatment, its ability to forestall treatment resistance is not yet clear [7].

    Over the last decade, various mathematical models have been developed to describe the dynamics of prostate cancer under IAS, as measured by clinical time series of blood serum levels of PSA, and possibly also of androgen, of individual patients in clinical trials. The models are clinically validated insofar as initial conditions and rate parameters can be selected so that the numerical solutions reproduce the observed time series with reasonable fidelity. Several such models are described in Section 1.2.

    One of the main clinical problems in any individual patient case, and one for which a validated mathematical model of prostate cancer might provide insight, is whether another round of IAS therapy will be helpful. Thus it is reasonable to ask, if a given mathematical model accurately reproduces past observations of PSA, can it predict future observations with an acceptable degree of reliability? If not, then why not?

    To illustrate, we consider a particular patient in a clinical trial of IAS whose serum PSA levels are measured approximately every 30 days. This patient starts treatment on day 0, goes off treatment around day 300, restarts around day 600, and so on for five intervals of approximately 300 days each. Using a fitting procedure described in Section 2.4, parameters for two models, which we call Model H and Model P in Section 1.2, are selected to provide "optimal" values of the model parameters, insofar as each model solution, starting from a fixed initial condition and integrated for 900 days, yields the smallest root-mean-square difference with the data. In addition, we vary the relative value of one of the model parameters by up to 10 percent from the "optimal" one.

    Figure 1 shows the results. The solid curves in each panel show the respective model's solutions obtained from both the "optimal" parameters and from varying one of the parameters slightly as just described. The models are integrated for 1500 days from a suitable initial condition to demonstrate 900 days of approximate agreement with clinical measurements of PSA (the "fitted" interval) and another 600 days of predicted PSA levels (the "predicted" interval), assuming that the last cycle of IAS therapy begins around day 1200. Although Model P (Figure 1(b)) reproduces the observed PSA levels through day 900 better than Model H, its predictions of PSA levels for days 900–1500 are much worse. Both Model H and Model P significantly overestimate—by factors of greater than 2 and 4, respectively—the observed PSA level in the second off-treatment interval, depending on the choice of parameter. One choice of parameter vector for Model P yields a pretty good prediction of the effect of the next cycle of treatment, but the others do not, even though all the choices accurately reproduce the observations over the "fitted" interval.

    Figure 1.  Results of fitting model parameters to approximately 900 days of clinical measurements of PSA, and the corresponding model predictions, for two models of prostate cancer dynamics under IAS for the same patient. Red dots: clinical measurements. Solid curves: model solutions for "optimal" parameters chosen according to the procedure described in Section 2.4 and for a parameter value with a relative difference of 10% from optimal. (a) Model H; (b) Model P.

    We investigate two principal mathematical questions in this paper. First, can the parameters of a given model be estimated reliably from available clinical data, which typically consists of serial measurements of serum PSA? This question motivates our investigation into parameter identifiability, which includes issues of structural and practical identifiability [8,9]. Second, what are some useful approaches to quantify the uncertainties in parameter estimates and the reliability of "patient-specific" forecasts of treatment response, based on a particular model? This question motivates our use of an ensemble Kalman filter (EnKF), which is widely used for data assimilation and parameter estimation in weather forecasting and similar predictive settings.

    Our main conclusions are that, while parameters for a deterministic model may reproduce clinical measurements well, they may be subject to large uncertainties, as may the predictions of such a model. Additional types of clinical measurements, if available, may help constrain the values of more parameters. To be clinically practicable, model design must take into consideration the nature of the available observations. The next sections provide an overview of some dynamical models of IAS therapy and some background on parameter identifiability and uncertainty quantification.

    Jackson [10] laid the groundwork for several data-driven mathematical models for prostate cancer [11,12,13]. The three models considered in this paper all assume that the tumor is composed of two or more subpopulations of cells that differ in their degree of androgen dependence. However, the models differ in the way they represent the biological details, particularly the role in which androgen fuels the growth of the tumor cells.

    Hirata et al. [14,15,16] introduced a piecewise-linear model ("Model H") of the dynamics of prostate cancer under IAS therapy. Model H proposes that there are three cancer cell subpopulations, x1, x2, and x3, that represent, respectively, treatment-sensitive, -insensitive, and irreversibly treatment-insensitive cells. It has been used to study parameter estimation and optimal control of IAS therapy as well as forecasting of treatment resistance and uncertainty quantification [14,17,18,19].

    Model H is constructed by assuming that the rates of biphasic increase and decrease of PSA levels during the respective off- and on-treatment intervals are linear combinations of the tumor cell subpopulations. At any given instant, cells in subpopulation j grow at the rate wjjxj; cells transition from subpopulation j to k at the rate wkjxj, j,k{1,2,3}, and w3j=0, j=1,2. There are two versions of each parameter: w0jk during the off-treatment intervals and w1jk during treatment. During the on-treatment intervals, Model H takes the form

    ddt(x1(t)x2(t)x3(t))=(w11,100w12,1w12,20w13,1w13,2w13,3)(x1(t)x2(t)x3(t)). (1.1)

    This form implicitly takes into account the lack of androgen, which retards the growth of the x1 and x2 subpopulations and also selects for the fully treatment-resistant phenotype, x3.

    During off-treatment intervals, Model H assumes that the partially treatment-resistant (x2) subpopulation reverts back to the sensitive one (x1) at a certain rate, while the growth of the x3 population depends only on its present size:

    ddt(x1(t)x2(t)x3(t))=(w01,1w01,200w02,2000w03,3)(x1(t)x2(t)x3(t)). (1.2)

    Model H also assumes that the level of serum PSA at time t is proportional to the sum of the three tumor subpopulations,

    P(t)=H(x(t))=α(x1(t)+x2(t)+x3(t)). (1.3)

    For simplicity, the units of tumor cell populations and PSA level are scaled such that α=1. Although the validity of the biological assumptions in Model H is debatable, it reproduces clinical time series of PSA data well [14,16,20].

    Model H has 10 parameters and 3 initial conditions that must be estimated from PSA data (androgen is not explicitly incorporated). There are no a priori constraints on the parameters, other than clinically realistic bounds on the net tumor growth rate.

    In contrast to the phenomenological approach in Model H, Portz, Kuang, and Nagy [12] have formulated a mechanistic model ("Model P"), inspired by Droop's nutrient-limiting theory [21], that regards androgen as the limiting nutrient for cancer growth and transformation. Model P has inspired several variants [9,20,22,23,24,25], but we focus on the original formulation here. Model P presumes that the tumor is comprised of treatment-sensitive and -insensitive prostate cancer cells, each of which requires a minimum level of androgen, called the cell quota. When the available androgen falls below the cell quota, the respective subpopulation declines. Cells may mutate from one subpopulation to the other in response to selection pressure from IAS therapy.

    Model P contains explicit equations that govern the intracellular androgen levels, Q1 and Q2, of the treatment-sensitive and -insensitive subpopulations x1 and x2, respectively. Androgen data is used to interpolate values for the serum androgen function A(t), which acts as an external forcing to represent the on- and off-treatment cycles. The growth and transformation rates for both cell subpopulations depend on the intracellular androgen levels, while the death rates are constant [12].

    Model P is defined by the following system of differential equations:

    dx1dt=μm(1q1Q1)x1growthd1x1deathc1λ1(Q1)x1+c2λ2(Q2)x2mutation (1.4)
    dx2dt=μm(1q2Q2)x2growthd2x2deathc2λ2(Q2)x2+c1λ1(Q1)x1mutation (1.5)
    dQ1dt=vm(qmQ1qmq1)(A(t)A(t)+vh)androgen influx to x1 cellsμ(Q1q1)x1 androgen uptakebQ1degradation (1.6)
    dQ2dt=vm(qmQ2qmq2)(A(t)A(t)+vh)androgen influx to x2 cellsμ(Q2q2)x2 androgen uptakebQ2degradation (1.7)
    dPdt=σ(x1+x2)baseline PSA production+σ1x1Qm1Qm1+ρm1+σ2x2Qm2Qm2+ρm2tumor PSA productionδPdegradation. (1.8)

    The cell quota for each subpopulation is qj, the death rate is dj, and the mutation rate from subpopulation i is λj(Qj)=Kj/(Qj+Kj), j=1,2. Since x2 is the less treatment-sensitive subpopulation, q2<q1. The PSA production rate depends on the tumor cell subpopulations. Further details about the development of this model are found in [12].

    Model P contains 21 parameters and a 5-dimensional state vector. It is possible to find parameters so that model solutions replicate clinically observed time series of serum PSA levels [12,20]. While the PSA data is utilized for estimating the model parameters, the available androgen data is used only for the interpolation of A(t). The parameters in Model P are biologically meaningful, and ranges for most of them can be found in the literature.

    One difficulty with Model P is that it cannot reproduce PSA relapse under CAS [26] within the parameter ranges given in [12]. The problem may arise from the assumption that the cell subpopulations can mutate from one to the other [9]. To address these issues, Baez and Kuang [22] introduced a simplification of Model P, which has been further refined into Model T, below. (See [24] for the derivation.) Key changes include the addition of an irreversibly treatment-resistant tumor subpopulation (similar to that of Model H) and the addition of a compartment for serum androgen, A:

    dx1dt=μ1(1q1Q)x1growth(D1(Q)+δ1x1)x1deathλ(Q)x1mutation (1.9)
    dx2dt=μ2(1q2Q)x2growth(D2(Q)+δ2x2)x2death+λ(Q)x1mutation (1.10)
    dQdt=m(AQ)androgen diffusion A  Qμ1(Qq1)x1+μ2(Qq2)x2x1+x2androgen uptake (1.11)
    dAdt=γ2+γ1(A0A)productionA0γ1u(t)suppression of production (1.12)
    dPdt=bQbaseline PSA production+σ(Qx1+Qx2)tumor PSA productionϵPdegradation, (1.13)

    where the respective androgen-dependent death and mutation rates are Dj(Q)=djRj/(Q+Rj), j=1,2 and λ(Q)=cK/(Q+K). The unit step function u(t) is 1 when when the patient is on treatment and is 0 otherwise; the treatment time intervals are determined directly from the data.

    Model T has 21 parameters and a 5-dimensional state vector. Biologically realistic estimates of the parameters are given in Table 1. In this paper, we fix the parameters c=0.00015, K=1, δ1=5, δ2=5, and γ2=0.005 in all simulations, as their values have relatively little effect on the model output [24]. For simplicity, we also set μ1=μ2=μm, which leaves a model with 13 adjustable parameters that we call T-13. Based on the results of a prior a sensitivity analysis [24], we also consider a version of Model T, which we call T-5, in which all but 5 of the parameters are fixed. (The adjustable parameters in Eqs. 1.9–1.13 in Model T-5 are μm, q2, d1, γ1, and A0.) The parameter estimation and time series prediction procedures processes described in Section 2 are carried out on versions T-13 and T-5 using time series of serum PSA and androgen levels.

    Table 1.  Estimated biologically realistic ranges of the parameters for Model T. The values of parameters with a single asterisk are fixed in all simulations described here (see Section 1.2.3), and column T-5 shows the values of the parameters that are fixed in Model T-5. Parameters with double asterisks may vary considerably between patients [24].
    Param Description Range T-5 Unit
    μmmax proliferation rate[0.001,0.09]varies[day]1
    μ1max proliferation rate (AD cells)[0.001,0.09]μm[day]1
    μ2max proliferation rate (AI cells)[0.001,0.09]μm[day]1
    q1min AD cell quota[0.41,1.73]0.6130[nmol][day]1
    q2min AI cell quota[0.01,0.41]varies[nmol][day]1
    bbaseline PSA production rate[0.0001,0.1]0.0379[ μg][nmol]1[day]1
    σtumor PSA production rate[0.001,1]0.8667[ μg][nmol]1[L]1[day]1
    ϵPSA clearance rate[0.0001,0.1]0.0565[day]1
    d1max AD cell death rate[0.001,0.09]varies[day]1
    d2max AI cell death rate[0.01,0.001]0.0633[day]1
    δ1density death rate[1,90]5[L]1[day]1
    δ2density death rate[1,90]5[L]1[day]1
    R1AD death rate half-saturation[0,3]1.2499[nmol][L]1
    R2AI death rate half-saturation[1,6]2.7351[nmol][L]1
    cmaximum mutation rate[105,104]0.00015[day]1
    Kmutation rate half-saturation level[0.8,1.7]1[nmol][day]1
    γ1primary androgen production rate[0.008,0.8]varies[day]1
    γ2secondary androgen production rate[0.001,0.1]0.005[day]1
    mdiffusion rate from A to Q[0.01,0.9]0.7188[day]1
    A0maximum serum androgen level[27,35]varies[nmol][day]1
    x1(0)Initial population of AD cells[0.009,0.02]0.01[L]
    x2(0)Initial population of AI cells[104,103]0.0001[L]

     | Show Table
    DownLoad: CSV

    As described in the introduction, we are concerned with two main questions: the reliability with which model parameters can be estimated from the available data, and the trustworthiness of model predictions after the parameters have been selected to maximize the agreement between model observables and prior clinical measurements of PSA. In this section, we briefly survey the data sources that we use and the mathematical approaches that we apply to each question.

    The data for this study comes from a clinical trial at the Vancouver Prostate Center, which admitted patients who demonstrated a rising serum PSA level after they received radiotherapy and had no evidence of metastasis [27]. The study admitted patients who had not had hormonal suppression therapy, with the exception of less than 3 months of neoadjuvant androgen suppression, and who experienced PSA relapses after radiotherapy with no evidence of distant metastasis (e.g., bone metastasis) [27]. The study contains patients with high serum PSA levels prior to androgen suppression therapy (i.e., PSA>6 μg/L).

    The data used in the parameter-fitting and prediction efforts in this paper come from two disjoint subsets totaling 54 patients, for whom we have approximately monthly measurements of serum PSA and androgen levels. One subetaoup of 28 patients has data for at least one cycle of androgen-suppression treatment, and the second subetaoup of 26 patients, 2.5 cycles. We do not have imaging data or other independent measures of tumor volume, nor any biopsy, genomic, or cell receptor information from these patients.

    Each of the models considered here defines a vector field F of an m-dimensional state vector x(t) that depends upon a fixed q-vector of parameters p. At each measurement time, the clinical observations are assumed to be a function H:RmRs of the tumor state; in this paper, s=1 or 2 depending on whether both androgen and PSA levels are observed. We may represent the mathematical setup as a system of the form

    dxdt=F(x(t,p)) (2.1)
    y(t,p)=H(x(t,p)). (2.2)

    Here y(t,p) represents the observables (predicted observations) from the given initial condition and parameter vector at time t. We represent by the s-vector y obs(t) the actual data (observations) at selected measurement times.

    If any component of p is an implicit function of some of the others, then, in general, different values of p yield the same solution vector x(t,p) for a given initial condition. In such a case, p is said to be structurally non-identifiable because p cannot be estimated uniquely from a given set of observables. In general, it is not possible to define a confidence interval for numerical estimates of structurally non-identifiable parameters.

    If no implicit functional relationship exists between the elements, then p is structurally identifiable [28,29]. Practical identifiability refers to the question of whether the components of p can be estimated with suitably small confidence bounds from observations with noise of finite variance. Structural identifiability is a necessary condition for practical identifiability [30,31,32,33].

    The Fisher information matrix (FIM) measures the sensitivity of the model output at measurement times tn with respect to the model's parameter vector. It can detect combinations of identifiable parameters [34,35,36]. Given a set of measurements at N distinct time points and an ODE model with an m-dimensional state vector and q-vector of parameters, the Nm×q sensitivity matrix S consists of N time-dependent m×q blocks A(tn):

    S=(A(t1)A(t2)A(tN)),whereAjk(tn)=xj(tn,p)pk,n=1,,N,k=1,,q. (2.3)

    The q×q Fisher information matrix is M=STS. (In practice, each Ajk must be approximated numerically.) The parameter vector p is structurally identifiable if and only if M has full rank [31]. The rank of M corresponds to the number of identifiable parameter combinations in p. The condition number of M is a measure of the practical identifiability of the parameters: if the condition number is large, then small amounts of measurement noise can render reliable estimation of p impossible.

    The Cramér-Rao bound covariance matrix is C=M1. The diagonal element Cjj is the asymptotic variance of the jth parameter; C1/2jj is the standard deviation (SDj) of pj. The coefficient of variation [34] for the jth parameter is

    CVj=SDj|pj|. (2.4)

    If CVj1, then the uncertainty in pj is at least as great as its value, which may be considered a useful upper bound for practical identifiability. If CVj is large (say, CVj>104), then pj simply may not be estimable from the data, which renders it structurally unidentifiable [8,34].

    The available observations in this study are time series of serum PSA, and, in some cases, of serum androgen, so the observation vector y obs(tn) has at most 2 components (with respective standard errors σjn). For a fixed time series of such observation vectors, and a given set of observables y(t,p) (Eq. 2.9), the negative log likelihood function takes the form

    LL(yp)=sN2ln(2π)+12sj=1Nn=1(yj(tn,p)y obsj(tn)σjn)2, (2.5)

    where p is the model parameter vector. There are N measurement time points and s observations at each time point (s=1 or 2 for the models discussed here). When the measurement variances σjn are all equal to the same known constant, the minimization of LL is equivalent to the minimization of the third term on the right-hand side of Eq. 2.5, and, thus, parameter estimation with the negative log likelihood is equivalent to least-squares minimization. (We take σjn=1 for the PSA and androgen time series used in this paper.)

    The profile likelihood is computed by fixing one element of p and choosing the remaining elements over appropriate ranges to minimize Eq. 2.5. The profile likelihood PLj is computed as

    PLj(p)=min{p|pj=c}LL(yp), (2.6)

    where all but the jth component of p is varied over an appropriate interval. The confidence interval α for the profile likelihood of parameter pj is

    CIα={c:PLj(c)minLL(yp)+Δ(α)/2}. (2.7)

    For a sufficiently large data set, Δ(α) takes the following form asymptotically:

    Δ(α)=icdf(χ2,α,df). (2.8)

    Here icdf(χ2,α,df) is the inverse cumulative distribution function for χ2 with α-quantiles (e.g., α=0.95 for a 95% confidence interval) and df degrees of freedom, which can be taken as 1 when all but one element of the parameter parameter vector is fixed, or as q for a simultaneous confidence interval for all components of the model parameter vector [8,29,37].

    The ensemble Kalman filter (EnKF) is a common approach to data assimilation [38,39]. It is used to estimate the m-dimensional state vector of a dynamical model given a set of noisy observations that can be regarded as a "predictor-corrector" scheme.

    The simulations presented here use the ensemble transform Kalman filter [40], which is one of several possible square-root filters. Details of the implementation are given in the appendix, but we briefly summarize the approach here. Starting with a set of K initial conditions at tn, we integrate the model to time tn+1 to produce a background ensemble {x(tn+1)b(k)}Kk=1. Let y obs(tn+1) denote the vector of measurements at tn+1. (Depending on the time series, the measurement is either a 1-vector of serum PSA level or a 2-vector of serum PSA and androgen levels.) Here we assume that the observables (i.e., predicted observations) are related to the model state vector by the function H, sometimes called the forward operator, as

    y(tn+1)=H(x(tn+1)). (2.9)

    In the models considered in this paper, H may be a linear (e.g., Eq. 1.3) or a nonlinear (e.g., Eqs. 1.8 and 1.13) function of the model state vector.

    The objective of the ensemble transform Kalman filter is to find a linear combination of the elements of the background ensemble that best match the data, in the sense that the weight vector w minimizes the sum of squares

    J(w)=(K1)wTw+[yobsH(¯xb+Xbw)]TR1[yobsH(¯xb+Xbw)], (2.10)

    where

    ¯xb=K1Kk=1x(tn+1)b(k) (2.11)

    is the ensemble mean, Xb is the m×K ensemble perturbation matrix whose ith column is x(tn+1)b(k)¯xb, k=1,,K, R is the covariance matrix of the errors in the observations, and H is the forward operator associated with the model. (The ensemble mean and perturbation matrix vary with time, as may H and R, but the explicit time dependence is omitted here to simplify the notation.) The filter produces the analysis ensemble {wa(k)}Kk=1, which in model space becomes

    x(tn+1)a(k)=¯xb+Xbwa(k),i=1,,K. (2.12)

    (See the appendix for further details.) The analysis ensemble defined in Eq. 2.12 forms a new set of initial conditions, which are then integrated to the next observation time tn+2, and the process is repeated until the end of the data series.

    The process begins with a preselected set of initial conditions at t0. The results described below include patients for whom there are at least 20 observation time points.

    State augmentation. The Kalman filter can be used to estimate q model parameters with the rest of the m-dimensional state vector by considering the (m+q)-dimensional vector x=(x,p)T that solves the augmented system

    dxdt=(dx/dtdp/dt)=(F(t,x,p)G(t,x,p)). (2.13)

    In our case, the associated forward operator is H=(H,0)T because the parameter vector is not observed directly. Here G can be any model for the evolution of the parameter vector p. In the simplest case, which is considered here, the parameters are assumed constant over the course of treatment, so G=0, but a more sophisticated model of parameter drift could be used.

    Our metric for the agreement between the observations and the model observables at the nth measurement time, n=1,2,,N is the mean squared error (MSE) defined by

    E(x0,p)=1sNsj=1Nn=1(Hj(x(tn,p))y obsj(tn)σjn)2, (2.14)

    where σjn is the variance of the measurement error in the kth component of the observation vector at time tn. (We take σjn=1 for the data used in this paper.) If the measurement errors are normally distributed, then E has a χ2 distribution, and minimizing E is equivalent to maximum likelihood estimation [29]. In the numerical investigations described in Section 3, the parameter estimation is carried out using the MATLAB built-in function fmincon.

    Because the exact analytical expressions for the solutions of the models in question are difficult or impossible to find, the blocks of the sensitivity matrix S (Eq. 2.3) are estimated numerically at the time of each clinical measurement. We illustrate the approach first with Model H, Eqs. (1.1)–(1.2), which has 10 parameters.

    Initially, the total tumor cell population (x1+x2+x3) in Model H numerically equals the first clinical measurement of serum PSA level. For parameter identification purposes, we fix the initial cell populations x1(0), x2(0), and x3(0) to be 0.95%, 0.05%, and 0%, respectively, of the initial PSA measurement. For our numerical computations, we use the time series of serum PSA data from Patient 1 in the Vancouver Prostate Center public dataset [27] (any patient will do). Clinical measurements are taken approximately at monthly intervals; P(t0) is the initial clinical measurement of PSA level at the start of treatment, and P(tn), the nth subsequent measurement. We select the parameters to minimize E(x0,p), Eq. (2.14), using the built-in MATLAB function fmincon.

    All 10 parameters are minimized simultaneously using the time series of PSA data from the first 1.5 cycles of treatment (i.e., the initial on-off-on treatment intervals). The allowed parameter ranges for Model H, used as constraints in fmincon, are taken from [20]. The output is an estimated parameter vector,

    p=(w11,1,w12,1,w12,2,w13,1,w13,2,w13,3,w01,1,w01,2,w02,2,w03,3)T,

    for which a model integration starting from the initial condition above produces observables (i.e., predicted serum PSA levels, Eq. 1.3), that minimize E(x0,p).

    An analogous procedure is performed for the 21 parameters of Model P, using the initial condition x1(0)=99, x2(0)=1, Q1(0)=0.5, Q2(0)=0.5. The initial PSA level, P(0), is taken as the first clinical measurement of PSA. The process is similar for Model T using the initial conditions in Section 3.2.

    After the parameter estimates are obtained from fmincon, we estimate the uncertainty in each parameter as follows. At time t0, i.e., at the time of the first clinical measurement of PSA after treatment begins, we perturb the first parameter, p1=w11,1, to generate two new values, w1,±1,1=(1±0.01)w11,1, from which we integrate the model to the first measurement time, t1. Then we approximate

    Aj1(t1)=xj(t1,p)p1

    (cf. Eq. 2.3) by finite differencing the two solutions at t1 with respect to the jth component of the model state vector, j=1,2,3. We proceed likewise for w12,1 through w13,3 to estimate the first block A(t0) of the sensitivity matrix. During the on-treatment intervals, the elements of A corresponding to parameters w01,1 through w03,3 are zero.

    This process begins anew at t1, starting with the state vector x(t1,p), to estimate A(t1). When we reach the end of the first on-treatment interval at tn1, the process continues with components w01,1, w01,2, w02,2, and w03,3 of p through the first off-treatment interval at tn2; the elements of A corresponding to w11,1 through w13,3 are zero. Finally, we proceed as above with parameters w11,1 through w13,3 for the second on-treatment interval through tn3. The numerical estimates of A(tn), n=1,,N3 yield the sensitivity matrix S, from which we compute the Fisher information matrix M=STS and the Cramér-Rao bound covariance matrix, C=M1.

    One naïve way to estimate the uncertainty in model parameters is to run MATLAB's fmincon function to determine a vector of parameters for which the model's observables of PSA agree closely with a given patient's clinical time series. For this purpose, we have used Model T-13 and minimized Eq. 2.14 subject to the biological constraints in Table 1. Such a calculation was run for 28 different patients over their first on-off-on treatment intervals.

    Table 2 shows the sample means and variances of the parameters returned by fmincon for this set of patients. The "optimal" parameter values are about the same for each patient for the most part. One may wonder whether the sample variances obtained in this manner are realistic. The analyses in Sections 3.1–3.2 suggest that they are not, and, in fact, that the actual uncertainties are much larger than the results in Table 2 might suggest.

    Table 2.  Sample mean and variance of parameter vector components of Model T-13 from running fmincon on a set of 28 patients from the Vancouver Prostate Center over their first on-off-on treatment intervals.
    Parameterμmq1q2bd1R1
    mean0.07100.61300.19710.03790.06871.2499
    variance0.00060.01110.00910.00080.00050.4059
    Parameterγ1σϵA0d2R2m
    mean0.37420.86670.056511.630.06332.73510.7188
    variance0.09280.06800.000623.690.00061.25270.0604

     | Show Table
    DownLoad: CSV

    In this section, we present the results obtained from studying the identifiability and uncertainty quantification of parameters and predictions from Models H, P, T-13, and T-5. We show that, based on the available time series data, the parameters of models H, P, and T-13 are not practically identifiable. The method of profile likelihood shows that the parameters of Model T-5 are identifiable under the biological constraints given in Table 1.

    We also use the ensemble Kalman filter on the associated augmented systems to assess the short- and long-term predictive capability of Model T-5. Although we obtain some promising initial results for many patients, there is evidence of significant model error.

    Table 3 shows the rank of the Fisher information matrix M obtained using the numerical procedure described in Section 2.5. In particular, the rank of M for Models H and P is 9 and 15, respectively. Because the rank is less than the number of model parameters, we conclude that neither model has a parameter vector that can be identified uniquely from the available observations.

    Table 3.  Rank of the numerically estimated Fisher information matrix for a fixed set of initial conditions. Only the parameters in Model T-5 are identifiable from the available data.
    ModelHPT-13T-5
    # fitted parameters1021135
    FIM rank915125

     | Show Table
    DownLoad: CSV

    To illustrate one of the implicit functional relationships between some of the model parameters, Figure 2 shows a plot of the profile likelihood (Eq. 2.6) between a selected pair of parameters from each of Models H and P. The negative log likelihood function can be minimized by multiple choices of the two selected parameters within the ranges specified on each graph. The two parameters are related by an implicit function within and beyond the indicated bounds on the axes, and it is not possible to associate a confidence interval to them. These results suggest that the parameters in Models H and P are structurally unidentifiable when the only observables are time series of PSA levels.

    Figure 2.  Profile likelihoods from selected pairs of parameters in Models H and P. The color coding represents the values of the profile likelihood, Eq. 2.6, as a function of each parameter. (a) Profile likelihood obtained from Model H during an off-treatment interval when fixing w01,2 and w03,3 (cf. Eq. 1.2). (b) Similarly for Model P (Eqs. 1.4–1.8), when all parameters are fixed except for μm and c1.

    Next we turn to the uncertainties in the parameters returned by fmincon, as estimated from the Crámer-Rao covariance matrix, M1. Table 4 shows the resulting coefficients of variation (Eq. 2.4). The coefficient of variation of all parameters except w11,1 in Model H and μm in Model P is greater than unity (i.e., the uncertainty is at least as large as the estimated parameter value). Thus, the parameters are practically unidentifiable given the available clinical time series.

    The numerical results presented here are based on time series from one representative patient (Patient 39) in the Vancouver Prostate Center clinical dataset; analogous results are obtained for other patients. Although it is possible to find estimates of the parameter vector for each of Model H and P that provide reasonably good agreement with clinical observations, such estimates are not unique, and they may be subject to large relative errors. The uncertainties in the parameters lead to unreliable predictions for the future clinical course of individual patients (Figure 1).

    The numerical estimation of the Fisher information matrix for these models proceeds in the same manner as described in Section 2.5. We fix the initial values of x1 and x2 to 0.01 and 0.0001, respectively, and A(0) and P(0) are taken as the first value of recorded androgen and PSA data; we assume that Q(0)=0.90A(0). The results shown in Table 2 indicate that Model T-13 has 2 unidentifiable parameters, as the rank of the Fisher information matrix is 11. In contrast, the rank of the Fisher information matrix is 5 for Model T-5, so all of the adjustable parameters are identifiable.

    The coefficients of variation in Table 4 indicate that estimates of all but two of the parameters in Model T-13 are subject to large uncertainties. In contrast, 4 of the 5 adjustable parameters in Model T-5 may be practically identifiable.

    Table 4.  Coefficients of variation (CV) for Models H, P, T-13, and T-5. A parameter for which CV1 has an uncertainty that is at least as large as its estimated value. The estimated uncertainties do not account for biological constraints on the parameter values.
    Hw111w121w122w131w132w133w011w012w022w033
    value-0.0550.002-0.0000.001-0.0030.0020.0030.1750.008-0.056
    CV21017100210211011101210121046.10521011101
    Pμmq1q2d1d2c1K1c2K2nqm
    value0.0180.2070.1930.0050.0050.0000.5240.0001.4992.2264.446
    CV41017103210331037104710421061108210811052104
    vmvhbσ1σ2ρ1ρ1mσ0ϵ
    value0.1444.4840.0870.0030.1171.0921.1934.4960.0020.070
    CV2103510371024105510441052105710251029101
    T-13μmq2d1γ1A0q1bR1σϵd2
    value0.0600.2400.0890.01211.0000.5000.0503.0000.5000.0500.009
    CV21013104210161021102210371001102610171007103
    R2m
    value4.0000.500
    CV11044101
    T-5μmq2d1γ1A0
    value0.0600.2400.0890.01211.000
    CV21012101510141011102

     | Show Table
    DownLoad: CSV

    The computations reported in Table 4 do not account for biologically realistic bounds on the parameters. To assess the practical identifiability of model T-5 with respect to the biological constraints in Table 1, we use the clinical time series from Patient 39 in the Vancouver database as a representative case to compute the profile likelihood for all 5 model parameters, shown in Figure 3. Each black curve represents the value of the negative log likelihood, Eq. 2.5, when all parameters are set to the values returned by fmincon (Section 2.4) except for the one that is varied within the interval shown on the horizontal axis.

    Figure 3.  The negative log likelihood (vertical axes), Eq. 2.6, of all adjustable parameters in Model T-5. Intervals where the black curves lie below the red dotted line are the 95% confidence intervals, which in some cases lie outside of the biologically realistic ranges indicated in Table 1.

    Also shown in the graphs is the 95% confidence threshold, Eq. 2.7, as a dotted red line. The corresponding confidence interval in the parameter consists of those parameter values for which the profile likelihood lies below the dotted line. In some cases, the confidence interval lies outside a biologically realistic range. For example, only nonnegative values of parameter d1 (Figure 3(c)) are biologically meaningful, yet the left endpoint of the 95% confidence interval is a negative value. The parameter q2 corresponds to the cell quota for the androgen-independent call subpopulation, and, by hypothesis, is less than q1. That the right endpoint of the 95% confidence interval extends beyond the biologically meaningful upper limit for q2 suggests that q2 may not be practically identifiable from the available observational data.

    The ensemble Kalman filter (EnKF) provides another way to quantify the uncertainties in model predictions and parameters, provided that the latter are identifiable. As outlined in Section 2.3, the Kalman filter updates the model state vector whenever new observations become available, and the updated state vector becomes the new initial condition for the subsequent time interval. We use an augmented system, Eq. 2.13, for the vector field F defined by Model T, Eqs. 1.9–1.13, in the results described here. We consider only version T-5 of this model, as the 5 adjustable parameters form an identifiable subset. Of these 5 parameters, we fix μm and d1, as their values do not vary significantly between patients. Only A0, γ1, and q2 are estimated as part of the augmented state vector. In general, a smaller state vector helps to reduce the ensemble variance, all other factors being equal. In this case, however, including q2 as a parameter to be estimated proves problematic, as we discuss in more detail below.

    An open question is a suitable choice of the starting ensemble. The sample variance of the initial conditions should reflect that of the underlying state vector, but we know neither the initial fraction of treatment-resistant cells nor the amount of intracellular androgen in each patient's tumor. Furthermore, the uncertainty in "patient-specific" model parameters may be large. As a guess, we form the ensemble by choosing 100 initial values of the initial state vector from 0.7 to 1.6 times the value of each component of the initial condition x0 estimated for each patient in [24]. The initial ensemble mean values of the parameters A0 and γ1 are given by the output of fmincon to minimize Eq. 2.14 for each respective patient, and the standard errors are approximated from the respective 95% confidence intervals in Figure 3. Altogether, the augmented state vector is x=(x1,,x5,A0,γ1,q2)T. We assume that the errors in the observations of serum PSA and androgen levels are independent and have variance 1.

    We integrate each initial condition of the 100-member ensemble to the first observation time t1, then compute the linear combination of ensemble members that minimizes Eq. 2.10, which becomes the new initial condition. The integration continues to the second observation time t2, and the process is repeated until the end of the dataset or until another prespecified ending time. Altogether, we ran the EnKF on time series from 26 patients from the Vancouver Prostate Center database. This set of patients is disjoint from the set used to estimate model parameters in Sections 3.1–3.2.

    Figure 4 shows some representative results for the PSA component of the ensemble state vectors for 3 different patients using the EnKF as a data assimilation system. The red curve in each plot shows the ensemble mean value of the PSA level, and the gray curves show each ensemble realization of the PSA levels. (Analogous plots can be made of the other components of the model state vector.)

    Figure 4.  Representative ensemble Kalman filtering results from slightly different initial ensembles for three patients. Black dots: clinical observations. Gray curves: individual ensemble solutions. Red curves: ensemble mean. Left panels, (a)–(c): the Kalman update is run for the first on-off-on treatment intervals for each patient, marked "filtered, " and the last analysis update ensemble is used as initial conditions for a free run of the model for the next off-on treatment interval, marked "predicted." Right panels, (d)–(f): the ensemble Kalman filter is run for the patient's entire clinical course.

    To return to one of the questions asked in the introduction, we consider what happens when the EnKF is run for the first on-off-on treatment intervals to compute an ensemble estimate of the augmented state vector. Using the analysis at the end of the second on-treatment interval, we run the model for a period of 400 to 800 days, depending on the patient, to predict the serum PSA levels for the next cycle, that is, over the next treatment hiatus and subsequent on-treatment interval. The ensemble predictions are shown for each patient to the right of the vertical dotted line in panels (a)–(c).

    The panels on the right, Figure 4(d)(f), show the outputs of the filter as it is run over the entire time series of clinical PSA observations (shown as round circles). At each observation time tn, we compute the mean of the predicted PSA level from the analysis at tn1 over all 100 ensemble members, i.e.,

    ˉyn=1100100k=1H(xb(k)(tn)),

    and, at the end of the time series, the root mean square (RMS) of the one-step ahead prediction error of serum PSA level averaged over all N observation time points is

    ˆP=(1NNn=1(ˉynyobs(tn))2)1/2.

    The respective values of ˆP for panels (d)–(f) are 1.80, 2.38, and 2.84 μg/L.

    In the case of Patient 15, the EnKF produces good agreement with observations throughout the clinical data record. When the forecast/update cycle is halted shortly after day 800 and the model is run freely for the next 500 days or so, the model provides a good prediction of the patient's clinical course during the next off- and on-treatment cycle. When the EnKF is run continuously throughout Patient 15's clinical course, the ensemble variance increases notably toward the end of the second off-treatment interval, and the filter takes 5 or so update cycles for the ensemble mean to settle near the observed clinical PSA values. As mentioned above, the RMS error of the ensemble mean's one-step ahead prediction of serum PSA is about 1.80 μg/L.

    The results are similar for Patient 39, panels (b) and (e), although the ensemble mean tends to underestimate the observed serum PSA levels towards the end of each on-treatment interval. As with Patient 15, the EnKF produces an augmented state vector at the end of the second on-treatment interval that yields a reasonable 450-day prediction of Patient 39's clinical course.

    Serious problems are evident in the case of Patient 29. It is not clear whether the filter is even convergent after more than a handful of observation time points. How can a filter initialized in the same way as for the other patients produce such poor results? The answer turns on the inclusion of Model T-5's adjustable parameter q2, the cell quota for the treatment-resistant tumor cell subpopulation, in the augmented state vector. The analysis in Section 3.2 suggests that q2 may not be practically identifiable; in any case, its 95% confidence interval as estimated from the profile likelihood includes values that are not biologically meaningful. A similar problem bedevils the augmented ensemble for this patient case.

    Figure 5 shows the time evolution of the EnKF's estimates of the adjustable parameters A0, γ1, and q2 for Patient 15 (Figure 4(a)). The green curve shows the ensemble mean value of each parameter as the forecast/update cycle proceeds, and the blue starred curves at the bottom and top represent the ensemble minimum and maximum values at each analysis time point. In particular, the EnKF's estimate of q2 remains within the biologically realistic interval given in Table 1 and is less than the value of q1=0.613 that is fixed for Model T-5.

    Figure 5.  Augmented ensemble Kalman filter parameter estimates of Patient 15 from Model T-5. Green curve: ensemble mean; blue curves: minimum and maximum ensemble values.

    In the case of Patient 29, however, the ensemble estimates of q2 are much larger. After a few analysis cycles, some of the ensemble estimates of q2 exceed q1 (not shown in the figure). At this point, the dynamical model no longer makes sense, because for these ensemble members, the model includes a treatment-resistant cell subpopulation with a greater cell quota for androgen than the treatment-sensitive one. The forecast ensemble quickly diverges from reality, and of course the model's predictions are not believable.

    We include this last example as a cautionary note in the application of ensemble Kalman filters. One must be careful that the parameters being estimated have a sufficiently small variance so that their ensemble updates remain within a sensible range for the model under consideration. It is possible to modify the EnKF to add a suitable penalty term to Eq. 2.10 to further constrain the parameter values, but the resulting minimization problem becomes nonlinear and is more expensive to solve numerically.

    Prostate cancer has attracted the development of numerous mathematical models to study various aspects of the cancer dynamics under different treatment scenarios, especially in the last two decades. However, deterministic ODE models tend to contain many rate parameters [12,15,22,24]. Because clinical data used for parameter calibration typically is limited to time series of serum PSA and possibly also androgen levels, parameter identifiability is a significant issue. It is not possible to quantify the uncertainty in unidentifiable parameters. Furthermore, the ability of the model to fit prior clinical data with suitable choices of parameters does not imply that the model's predictions of the future clinical course can be trusted.

    In this paper, we investigate three existing models (H, P, and T) for prostate cancer, each of which has been validated with clinical data [12,15,24]. All of the models divide the tumor cell population into two or more classes that differ in their sensitivity to androgen deprivation therapy and presume an autonomous relationship between the size of the tumor cell classes and the rate of serum PSA production.

    The limited set of observables leads to unidentifiable sets of parameters in each of Models H, P, and T. By restricting the set of adjustable parameters in Model T using a sensitivity analysis in [24], we show that a 5-parameter version of Model T is identifiable (Table 2) given only clinical time series of PSA levels. Nevertheless, given the available time series, 3 of the 5 identifiable parameters are subject to relative uncertainties of at least 40 percent (Table 4). Models H and P have only one practically identifiable parameter, i.e., only one of the parameters is estimable from the available data with a relative uncertainty of less than unity.

    Past is not prologue. Although it may be possible to find a set of parameters so that the model observables agree well with previous clinical measurements, the model's predictions of future time series of PSA need not be reliable.

    In some cases, fixing many of the parameters in a complex model may yield an identifiable version that can produce reasonable predictions of PSA dynamics. This is the approach taken with Model T, which contains 21 parameters, all but 5 of which can be fixed to yield an identifiable subset and for which confidence intervals can be constructed using the method of profile likelihood (Section 3.2). However, the confidence intervals may not lie within biologically realistic bounds. Conversely, however, biological constraints may help to reduce the uncertainty in numerical estimates of a given parameter.

    The ensemble Kalman filter is a useful way to assimilate data and quantify the uncertainty in model predictions. The EnKF provides a systematic and consistent method to account for new observations and update the associated model state vector accordingly. Nevertheless, for best results, one needs good estimates of the initial uncertainty in the state vector, which are not available for the models considered here. Systematic model errors (biases) can be accounted for in EnKF schemes [41], but we have not pursued this approach here.

    Improved models are needed. Resistance to androgen deprivation therapy, or, alternatively, the androgen cell quota, probably exists on a continuum in a typical prostate tumor. The division of the tumor cell population into two or three distinct subpopulations may be too much of an oversimplification for clinically useful predictions. Although the notion of a nutrient-limiting mechanism to tumor growth is biologically reasonable, the dynamics of androgen metabolism probably are not well characterized in Models P and T. More work is needed to improve them.

    The development of clinically applicable dynamical models of prostate cancer treatment, and of many other cancers, may benefit from the application of observing-system simulation experiments. OSSEs have been widely used in numerical weather prediction for over half a century. Broadly speaking, in an OSSE, synthetic data is assimilated into a complex dynamical model to assess measures like prediction accuracy (forecast skill) as a function of the accuracy, sparsity, and types of observations. An historical overview may be found in [42]; one early example was an attempt to test whether the wind field of an atmospheric model could be estimated from measurements of temperature alone. Another important application concerns "the potential improvements in climate analysis and weather prediction to be gained by augmenting the present atmospheric observing system with additional envisioned types of observations that do not yet exist" [43]. In the present context, OSSEs might be used to quantify errors in parameter estimates from data with various noise levels or to assess the impact of a new assay or other tumor diagnostic on the predictive ability of a given model. In the authors' view, such efforts will be necessary to develop clinically validated mathematical models that will be useful tools for physicians and patients.

    The research of this work was partially supported by NSF grants (to YK) DMS-1615879, DMS-1518529 and a grant (to EK and YK) by the Arizona Biomedical Research Commission. ZW was partially supported by a block grant from the Arizona State University Graduate College. We are grateful to the late Dr. Nicholas Bruchovsky for the clinical data set.

    The authors declare no conflict of interest.

    Kalman filter algorithm

    Here we outline the implementation of the ensemble transform Kalman filter used in the simulations described in Section 3.3. Additional details concerning the motivation and derivation of the filter are given in [40].

    At time tn, we begin with an ensemble of initial conditions, which can be regarded as a sample drawn from a normal distribution about a mean state that we wish to estimate. A forecast (often called the background) is made by integrating the model of interest from each ensemble member to time tn+1, at which point additional measurements have become available. The EnKF seeks a linear combination of the forecasts that best fits the data in a weighted least-squares sense; the linear combination serves as a new set of initial conditions, called the analysis, at tn+1 from which the model is integrated to time tn+2, when the process is repeated. The mean of the analysis ensemble is the updated estimate of the "true" state of the system, and the sample covariance represents the uncertainty in the analysis mean.

    All the computations below are done at tn+1, so we omit the time dependence to simplify the notation. Let {xb(k)}Kk=1 be the set of K state vectors (each of dimension m) that represent the background ensemble, and let ¯xb denote the ensemble mean, Eq. 2.11. Let Xb be the m×K matrix whose kth column is xb(k)¯xb, k=1,,K. If w is a Gaussian K-vector with mean 0 and covariance (K1)1I, then x=¯xb+Xbw has mean ¯xb and covariance (K1)XbXTb, which, in model space, is the ensemble estimate of the forecast uncertainty.

    Let y obs be the s-vector of observations, which is assumed to be related to the "true" model state vector xt by

    yobs=H(xt)+ϵ, (.1)

    where ϵ is a Gaussian random s-vector of mean 0 and covariance R (cf. Eq. 2.9). The ensemble transform Kalman filter minimizes the cost function given in Eq. 2.10, where, as necessary, we approximate a nonlinear forward operator H as

    H(¯xb+Xbw)¯yb+Ybw, (.2)

    where the s-vector ¯yb is the mean of the background observation ensemble

    yb(k)=H(xb(k)),k=1,,K (.3)

    and the kth column of the s×K matrix Yb is yb(k)¯yb. The minimizer of Eq. 2.10 is found from

    ¯wa=˜PaYTbR1(yobs¯yb) (.4)
    ˜Pa=[(K1)I+YTbR1Yb]1 (.5)

    which in model space yields the analysis mean and covariance

    ¯xa=¯xb+Xb¯wa (.6)
    Pa=Xb˜PaXTb. (.7)

    As mentioned above, the analysis ensemble is constructed from a linear combination of the background ensemble members. The first step in this procedure is to compute the symmetric square root

    Wa=[(K1)˜Pa]1/2. (.8)

    One rationale for the symmetric square root in Eq..8 is that the elements of the resulting matrix vary continuously with the observational data. The analysis ensemble {wa(k)}Kk=1 is formed by adding ¯wa to each column of Wa; in model space, the analysis ensemble is

    xa(k)=¯xb+Xbwa(k),k=1,,K. (.9)

    Further details on the derivation of the ensemble transform Kalman filter are given in [40].

    In the results described in the paper, we apply the ensemble transform Kalman filter to the augmented dynamical system defined by Eq. 2.13. Let ˜F=(F,G)T denote the augmented vector field (F is the vector field representing the ODE model in question, and G is the model for the time evolution of the parameters). The (m+q)-dimensional state vector of the augmented system is x=(x,p)T.

    The forecast-update cycle may be expressed in the following steps:

    1. Begin with the analysis ensemble at tn, {xna(k)}kk=1 with the associated analysis covariance matrix Pna, and set each xa(k), k=1,,K as the initial condition for the next forecast.

    2. Integrate the model ˜F(t,x) from tn to tn+1 for each initial ensemble member xna(k), k=1,,K. The K solutions now become the background ensemble at time tn+1: {xn+1b(k)}Kk=1.

    3. Interpolate between the background ensemble and observations at time tn+1 by applying the forward operator H to each of the background ensemble member and form a new ensemble yb(k)=H(xb(k)), k=1,,K.

    4. Minimize the Kalman filter cost function, Eq. 2.10, of the current time step to produce the analysis at time tn+1: {xn+1a(i)}ki=1 associated with analysis covariance matrix Pn+1a.

    5. Use {xn+1a(k)}Kk=1 as the initial conditions for the next background ensemble and repeat the steps.



    [1] C. Huggins and C. V. Hodges, Studies on Prostatic Cancer: I. The Effect of Castration, of
    [2] W. G. Nelson, Commentary on Huggins and Hodges: "Studies on Prostatic Cancer", Cancer Res., 76 (2016), 186–187.
    [3] S. Kumas, M. Shelley, C. Harrison, et al., Neo-adjuvant and adjuvant hormone therapy for localized and locally advanced prostate cancer, The Cochrane Database Syst. Rev., 18 (2006), CD006019.
    [4] B. J. Feldman and D. Feldman, The development of androgen-independent prostate cancer, Nat. Rev. Cancer, 1 (2001), 34–45.
    [5] R. Siegel, E. Ward, O. Brawley, et al., Cancer statistics, 2011: The impact of eliminating socioeconomic and racial disparities on premature cancer deaths, CA: A cancer journal for clinicians, 61 (2011) , 212–236.
    [6] N. A. Spry, L. Kristjanson, B. Hooton, et al., Adverse effects to quality of life arising from treatment can recover with intermittent androgen suppression in men with prostate cancer, Eur. J. Cancer, 42 (2006), 1083–1092.
    [7] N. D. Shore and E. D. Crawford, Intermittent Androgen Deprivation Therapy: Redefining the Standard of Care? Rev. Urol., 12 (2010), 1–11.
    [8] M. C. Eisenberg and H. V. Jain, A confidence building exercise in data and identifiability: Modeling cancer chemotherapy as a case study, J. Theor. Biol., 431 (2017), 63–78.
    [9] T. Phan, H. Changhan, A. Martinez, et al., Dynamics and implications of models for intermit- tent androgen suppression therapy, Math. Biosci. Eng., 16 (2019).
    [10] T. L. Jackson, A mathematical model of prostate tumor growth and androgen-independent re- lapse, Discrete Continuous Dyn. Syst. Ser. B, 4 (2003), 187–201.
    [11] A. M. Ideta, G. Tanaka, T. Takeuchi, et al., A mathematical model of intermittent androgen suppression for prostate cancer, J. Nonlinear Sci., 18 (2008), 593–614.
    [12] T. Portz, Y. Kuang and J. D. Nagy, A clinical data validated mathematical model of prostate cancer growth under intermittent androgen suppression therapy, AIP Adv., 2 (2012), 0–14.
    [13] H. Vardhan-Jain and A. Friedman, Modeling prostate cancer response to continuous versus intermittent androgen ablation therapy, Discrete & Continuous Dynamical Systems-Series B, 18 (2013) .
    [14] Y. Hirata, K. Akakura, C.S. Higano,et al., Quantitative mathematical modeling of PSA dy- namics of prostate cancer patients treated with intermittent androgen suppression, J. Mol. Cell. Biol., 4 (2012), 127–132.
    [15] Y. Hirata, N. Bruchovsky and K. Aihara, Development of a mathematical model that predicts the outcome of hormone therapy for prostate cancer, J. Theor. Biol., 264 (2010), 517–527.
    [16] Y. Hirata, G. Tanaka, N. Bruchovsky, et al., Mathematically modelling and controlling prostate cancer under intermittent hormone therapy, Asian J. Androl., 14 (2012), 270–277.
    [17] Q. Guo, Z. Lu, Y. Hirata, et al., Parameter estimation and optimal scheduling algorithm for a mathematical model of intermittent androgen suppression therapy for prostate cancer, Chaos, 23 (2013), 43125.
    [18] Y. Hirata, S. I. Azuma and K. Aihara, Model predictive control for optimally scheduling inter- mittent androgen suppression of prostate cancer, Methods, 67 (2014), 278–281.
    [19] Y. Hirata, K. Morino, K. Akakura, et al., Personalizing Androgen Suppression for Prostate Cancer Using Mathematical Modeling, Sci. Rep., 8 (2018), 2673.
    [20] R. A. Everett, A. M. Packer and Y. Kuang, Can Mathematical Models Predict the Outcomes of Prostate Cancer Patients Undergoing Intermittent Androgen Deprivation Therapy? Biophys. Rev. Lett., 9 (2014), 173–191.
    [21] M. Droop, Some thoughts on nutrient limitation in algae. J. Phycol., 9 (1973), 264–272.
    [22] J. Baez and Y. Kuang, Mathematical Models of Androgen Resistance in Prostate Cancer Pa- tients under Intermittent Androgen Suppression Therapy. Appl. Sci., 6 (2016), 352.
    [23] J. D. Morken, A. Packer, R. A. Everett, et al., Mechanisms of resistance to intermittent andro- gen deprivation in patients with prostate cancer identified by a novel computational method, Cancer Res., (2014).
    [24] T. Phan, K. Nguyen, P. Sharma, et al., The Impact of Intermittent Androgen Suppression Ther- apy in Prostate Cancer Modeling, Appl. Sci., 9 (2019), 36.
    [25] A. Zazoua and W. Wang, Analysis of mathematical model of prostate cancer with androgen deprivation therapy, Commun. Nonlinear Sci. Numer. Simul., 66 (2019), 41–60.
    [26] T. Hatano, Y. Hirata, H. Suzuki, et al., Comparison between mathematical models of intermit- tent androgen suppression for prostate cancer, J. Theor. Biol., 366 (2015), 33–45.
    [27] N. Bruchovsky, L. Klotz, J. Crook, et al., Final results of the Canadian prospective Phase II trial of intermittent androgen suppression for men in biochemical recurrence after radiotherapy for locally advanced prostate cancer: Clinical parameters, Cancer, 107 (2006), 389–395.
    [28] C. Cobelli and J. J .DiStefano-III, Parameter and structural identifiability concepts and ambigu- ities: a critical review and analysis, Am. J. Physiol. Regul. Integr. Comp. Physiol., 239 (1980), R7–24.
    [29] A. Raue, C. Kreutz, T. Maiwald, et al., Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood, Bioinformatics, 25 (2009), 1923–1929.
    [30] S. Audoly, G. Bellu, L. D'Angio, et al., Global identifiability of nonlinear models of biological systems, IEEE Trans. Biomed. Eng., 48 (2001), 55–65.
    [31] M. C. Eisenberg, S. L. Robertson and J. H. Tien, Identifiability and estimation of multiple transmission pathways in cholera and waterborne disease. J. Theor. Biol., 324 (2013), 84–102.
    [32] M. P. Saccomani and L. D'angiò, Examples of testing global identifiability with the DAISY software, IFAC Proc. Vol., 42 (2009), 48–53.
    [33] H. Wu, H. Zhu, H. Miao, et al., Parameter identifiability and estimation of HIV/AIDS dynamic models, Bull. Math. Biol., 70 (2008), 785–799.
    [34] M. C. Eisenberg and M. A. L. Hayashi, Determining identifiable parameter combinations using subset profiling, Math. Biosci., 256 (2014), 116–126.
    [35] H. Miao, X. Xia, A. S. Perelson, et al., On Identifiability of Nonlinear ODE Models and Ap- plications in Viral Dynamics, SIAM Rev. Soc. Ind. Appl. Math., 53 (2011), 3–39.
    [36] T. Quaiser and M. Monnigmann, Systematic identifiability testing for unambiguous mechanis- tic modeling - application to JAK-STAT, MAP kinase, and NF-κB signaling pathway models, BMC Syst. Biol., 3 (2009), 50.
    [37] C. Kreutz, A. Raue, D. Kaschek, et al., Profile likelihood in systems biology, FEBS J., 280 (2013), 2564–2571.
    [38] G. Evensen, Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics, J. Geophys. Res. Oceans, 99 (1994), 10143– 10162.
    [39] G. Evensen, The ensemble Kalman filter: Theoretical formulation and practical implementa- tion, Ocean Dyn., 53 (2003), 343–367.
    [40] B. R. Hunt, E. J. Kostelich and I. Szunyogh, Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter, Physica D, 230 (2007), 112–126,
    [41] S. J. Baek, B. R. Hunt, E. Kalnay, et al., Local ensemble Kalman filtering in the presence of model bias, Tellus A, 58 (2006), 293–306.
    [42] C. P. Arnold and C. H. Day, Observing-Systems Simulation Experiments: Past, Present, and Future. Bull. Am. Meteorol. Soc., 67 (1986), 687–695.
    [43] R. M. Errico, R. Yang, N. C. Privé, et al., Development and validation of observing-system simulation experiments at NASA's Global Modeling and Assimilation Office, Q. J. R. Meteorol. Soc., 139 (2013), 1162–1178.
  • This article has been cited by:

    1. Regina Padmanabhan, Nader Meskin, Ala-Eddin Al Moustafa, 2021, Chapter 7, 978-981-15-8639-2, 135, 10.1007/978-981-15-8640-8_7
    2. Roberta Coletti, Andrea Pugliese, Luca Marchetti, Modeling the effect of immunotherapies on human castration-resistant prostate cancer, 2021, 509, 00225193, 110500, 10.1016/j.jtbi.2020.110500
    3. Hope Murphy, Gabriel McCarthy, Hana M. Dobrovolny, Olga A Sukocheva, Understanding the effect of measurement time on drug characterization, 2020, 15, 1932-6203, e0233031, 10.1371/journal.pone.0233031
    4. Tin Phan, Sharon M. Crook, Alan H. Bryce, Carlo C. Maley, Eric J. Kostelich, Yang Kuang, Review: Mathematical Modeling of Prostate Cancer and Clinical Application, 2020, 10, 2076-3417, 2721, 10.3390/app10082721
    5. Trevor Reckell, Kyle Nguyen, Tin Phan, Sharon Crook, Eric J. Kostelich, Yang Kuang, Modeling the synergistic properties of drugs in hormonal treatment for prostate cancer, 2021, 514, 00225193, 110570, 10.1016/j.jtbi.2020.110570
    6. Masud M A, Md Hamidul Islam, Khondaker A. Mamun, Byul Nim Kim, Sangil Kim, COVID-19 Transmission: Bangladesh Perspective, 2020, 8, 2227-7390, 1793, 10.3390/math8101793
    7. Regina Padmanabhan, Nader Meskin, Ala-Eddin Al Moustafa, 2021, Chapter 2, 978-981-15-8639-2, 15, 10.1007/978-981-15-8640-8_2
    8. Kate L. Wootton, Alva Curtsdotter, Tomas Jonsson, H. T. Banks, Riccardo Bommarco, Tomas Roslin, Amanda N. Laubmeier, Peter Eklöv, Beyond body size—new traits for new heights in trait-based modelling of predator-prey dynamics, 2022, 17, 1932-6203, e0251896, 10.1371/journal.pone.0251896
    9. Christine Pho, Madison Frieler, Giri R. Akkaraju, Anton V. Naumov, Hana M. Dobrovolny, Using mathematical modeling to estimate time-independent cancer chemotherapy efficacy parameters, 2021, 10, 2193-9616, 10.1007/s40203-021-00117-7
    10. William Meade, Allison Weber, Tin Phan, Emily Hampston, Laura Figueroa Resa, John Nagy, Yang Kuang, High Accuracy Indicators of Androgen Suppression Therapy Failure for Prostate Cancer—A Modeling Study, 2022, 14, 2072-6694, 4033, 10.3390/cancers14164033
    11. Tin Phan, Changhan He, Irakli Loladze, Clay Prater, Jim Elser, Yang Kuang, Dynamics and growth rate implications of ribosomes and mRNAs interaction in E. coli, 2022, 8, 24058440, e09820, 10.1016/j.heliyon.2022.e09820
    12. Tin Phan, Justin Bennett, Taylor Patten, Practical Understanding of Cancer Model Identifiability in Clinical Applications, 2023, 13, 2075-1729, 410, 10.3390/life13020410
    13. Samara Sharpe, Hana M. Dobrovolny, Predicting the effectiveness of chemotherapy using stochastic ODE models of tumor growth, 2021, 101, 10075704, 105883, 10.1016/j.cnsns.2021.105883
    14. Emmanuel Fleurantin, Christian Sampson, Daniel Paul Maes, Justin Bennett, Tayler Fernandes-Nunez, Sophia Marx, Geir Evensen, A study of disproportionately affected populations by race/ethnicity during the SARS-CoV-2 pandemic using multi-population SEIR modeling and ensemble data assimilation, 2021, 3, 2639-8001, 479, 10.3934/fods.2021022
    15. Rong Yan, Aili Wang, Xueying Zhang, Jingmin He, Duo Bai, Dynamics of a non-smooth model of prostate cancer with intermittent androgen deprivation therapy, 2022, 442, 01672789, 133522, 10.1016/j.physd.2022.133522
    16. Tin Phan, Samantha Brozak, Bruce Pell, Anna Gitter, Amy Xiao, Kristina D. Mena, Yang Kuang, Fuqing Wu, A simple SEIR-V model to estimate COVID-19 prevalence and predict SARS-CoV-2 transmission using wastewater-based surveillance data, 2023, 857, 00489697, 159326, 10.1016/j.scitotenv.2022.159326
    17. Kyle Nguyen, Kan Li, Kevin Flores, Georgia D. Tomaras, S. Moses Dennison, Janice M. McCarthy, Parameter estimation and identifiability analysis for a bivalent analyte model of monoclonal antibody-antigen binding, 2023, 679, 00032697, 115263, 10.1016/j.ab.2023.115263
    18. Tin Phan, Allison Weber, Alan H. Bryce, Yang Kuang, The prognostic value of androgen to PSA ratio in predictive modeling of prostate cancer, 2023, 176, 03069877, 111084, 10.1016/j.mehy.2023.111084
    19. Aili Wang, Rong Yan, Haixia Li, Xiaodan Sun, Weike Zhou, Stacey R. Smith, A joint-threshold Filippov model describing the effect of intermittent androgen-deprivation therapy in controlling prostate cancer, 2024, 377, 00255564, 109301, 10.1016/j.mbs.2024.109301
    20. Adriana M De Mendoza, Soňa Michlíková, Paula S Castro, Anni G Muñoz, Lisa Eckhardt, Steffen Lange, Leoni A Kunz-Schughart, Generalized, sublethal damage-based mathematical approach for improved modeling of clonogenic survival curve flattening upon hyperthermia, radiotherapy, and beyond, 2025, 70, 0031-9155, 025022, 10.1088/1361-6560/ada680
    21. Krasimira Tsaneva-Atanasova, Giulia Pederzanil, Marianna Laviola, Decoding uncertainty for clinical decision-making, 2025, 383, 1364-503X, 10.1098/rsta.2024.0207
    22. Nadine Kuehle Genannt Botmann, Hana M. Dobrovolny, Assessing the role of model choice in parameter identifiability of cancer treatment efficacy, 2025, 11, 2297-4687, 10.3389/fams.2025.1542617
  • Reader Comments
  • © 2019 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(6141) PDF downloads(817) Cited by(22)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog