
Citation: Gianniantonio Petruzzelli, Francesca Pedron, Irene Rosellini. Bioavailability and bioaccessibility in soil: a short review and a case study[J]. AIMS Environmental Science, 2020, 7(2): 208-225. doi: 10.3934/environsci.2020013
[1] | Igor V. Bezmenov . Fast algorithm for cleaning highly noisy measurement data from outliers, based on the search for the optimal solution with the minimum number of rejected measurement data. Metascience in Aerospace, 2024, 1(1): 110-129. doi: 10.3934/mina.2024005 |
[2] | Yuan Hu, Zhihao Jiang, Xintai Yuan, Xifan Hua, Wei Liu . Isometric mapping algorithm based GNSS-R sea ice detection. Metascience in Aerospace, 2024, 1(1): 38-52. doi: 10.3934/mina.2024002 |
[3] | James H. Page, Lorenzo Dorbolò, Marco Pretto, Alessandro Zanon, Pietro Giannattasio, Michele De Gennaro . Sensitivity analysis of mixed analysis-synthesis flight profile reconstruction. Metascience in Aerospace, 2024, 1(4): 401-415. doi: 10.3934/mina.2024019 |
[4] | O.N. Korsun, Om Moung Htang . The practical rules for aircraft parameters identification based on flight test data. Metascience in Aerospace, 2024, 1(1): 53-65. doi: 10.3934/mina.2024003 |
[5] | Jiaqing Kou, Tianbai Xiao . Artificial intelligence and machine learning in aerodynamics. Metascience in Aerospace, 2024, 1(2): 190-218. doi: 10.3934/mina.2024009 |
[6] | Pasynok Sergey . Cumulative STF coefficients evaluation and validation. Metascience in Aerospace, 2024, 1(4): 371-378. doi: 10.3934/mina.2024017 |
[7] | Dulce M Graciano, Fernando Z Sierra-Espinosa, Juan C García . Numerical simulation of vortex-induced vibration in a bladeless turbine: Effects of separation distance between tandem harvesters. Metascience in Aerospace, 2024, 1(3): 309-328. doi: 10.3934/mina.2024014 |
[8] | Grégoire Dannet, Olivier Boucher, Nicolas Bellouin . Features and evolution of civil aviation CO2 emissions based on ADS-B data for the period between 2019–2024. Metascience in Aerospace, 2024, 1(4): 346-370. doi: 10.3934/mina.2024016 |
[9] | Jubilee Prasad Rao, Sai Naveen Chimata . Machine learning-based surrogates for eVTOL performance prediction and design optimization. Metascience in Aerospace, 2024, 1(3): 246-267. doi: 10.3934/mina.2024011 |
[10] | Wenchao Cai, Yadong Zhou . Preventing aircraft from wildlife strikes using trajectory planning based on the enhanced artificial potential field approach. Metascience in Aerospace, 2024, 1(2): 219-245. doi: 10.3934/mina.2024010 |
A significant problem in achieving the required level of accuracy of modern computing complexes and systems is the detection of coarse measurements (outliers) in the time series of data at the preprocessing stage. The proper detection and removal of outliers from measurement data is a necessary step in any scientific application to obtain the most accurate final result. This problem inevitably arises when solving problems in which the initial data is measurement data. Examples of such problems are transmitting time over long distances [1] and comparing remote time scales [2] using satellites, positioning using global navigation satellite systems (GNSS) [3], determining the Earth's rotation parameters [4], very long baseline radio interferometry (VLBI) signal processing [5], and others. The sequential least-squares wavelet and cross-wavelet analyses of VLBI time series performed in [6] make it possible to obtain accurate instantaneous frequency information along with phase differences in the time-frequency domain, as well as to investigate the components of temperature time series at antennae locations.
It should also be noted that the Global Geodetic Observing System (GGOS) provides measurements of the time-varying gravity, rotation, and shape of the Earth using geodetic and gravimetric instruments located on the ground and in space. These measurements need to be accurate to better than a part per billion in order to advance our understanding of the underlying processes that are causing the Earth's rotation, gravity, and shape to change. Mass transport in the global water cycle, sea level and climate change, and crustal deformation associated with geohazards are examples of particularly demanding applications of geodetic and gravimetric measurements with sub-mm accuracy [7]. On the one hand, it is impossible to achieve such accuracy without equipping the colocation sites with new-generation measuring instruments of VLBI, GNSS, and satellite laser ranging systems (SLR). On the other hand, even the availability of state-of-the-art equipment does not relieve the necessity of improved models and methods for both post- and pre-processing of the measurement data received from GNSS receivers forming a global network [8].
As noted in [9], outlier detection and trend detection are two sides of the same coin. The relationship between these two problems is obvious: an incorrectly found trend leads to incorrectly found outliers, and vice versa. The problem of detecting outliers is closely related to the problem of searching for an unknown trend in the data series (see, e.g., [9]). The detection of outliers depends on the solution to this problem: if the trend is determined incorrectly, knowingly accurate data can be rejected while inaccurate one can be left for further processing.
In measurement data processing theory, an outlier is defined as an observation that is at an abnormal distance from other values in a random sample from the population. The uncertainty of this definition lies in how to distinguish anomalous observations from the general data set. The well-known definition of outliers by V. Barnett and T. Lewis states them as being "an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data" [10]. However, the concepts of "inconsistent" and "remainder" are vague. Various algorithms have been proposed to detect outliers in accordance with our intuition (see [9]). An overview of the different approaches together with the classification of outlier types is given in [11].
The trend detection problem has long been known and is often solved by choosing a polynomial of a given degree that approximates the measurement data in RMS (root-mean-square) norm. Polynomial coefficients are found by the least squares (LS) method as a solution to the system of linear equations, and detection of rough measurements is carried out on the basis of the subsequent analysis of deviations of measured data from the values of the found polynomial. However, with this approach to the problem, the data remaining after the removal of the found outliers, as a rule, do not coincide with the original data set by which the trend was built, whereas the iterative process of sequential redefinition of the trend at a predetermined deviation level may lead to an infinite computational loop. To avoid this looping, it is common practice to reduce the outlier detection level, which can result in the loss of knowingly "good" data.
Some trend-building methods rely on regression techniques and machine learning. An example is the least absolute shrinkage and selection operator (LASSO) used for modeling the relationship between a dependent variable (which may be a vector) and one or more explanatory variables, by fitting the regularized least squares model (see [12]). Trained LASSO model can produce sparse coefficients due to the use of the regularization term. LASSO regression is widely used in feature selection tasks. For example, in the field of compressed sensing, it is used to effectively identify relevant features associated with the dependent variable from a few observations with a large number of features. LASSO regression is also used to overcome multicollinearity of feature vectors in the training data set. For more details, see [13]. The application of the method in medicine can be seen, for example, in [14], as well as in references ibid.
Among the numerous trend-detecting methods are those applied to irregularly distributed data, such as in seismology and marine seismic data. Least-squares spectral analysis (LSSA) was introduced by Vaniček in 1969 [15] to analyze unequally spaced time series. It estimates a frequency spectrum based on the least-squares fit of sinusoids to the entire time series by accounting for measurement errors, trends, and constituents of known forms. Subsequently, scholars have developed various versions of this method. One of them, the anti-leakage least-squares spectral analysis (ALLSSA), is an iterative method based on LSSA and proposed in [16]. Further development of this method to regularize data series was presented in articles [17,18] which also showed its robust performance on synthetic and marine seismic data examples (for details and additional references see ibid.).
A large number of works are devoted to the study of the spatial movement of the Earth's surface in order to identify land subsidence that threatens the human environment. Compared with the traditional leveling and global positioning system (GPS) measurement methods, the interferometric synthetic aperture radar (InSAR) technique can obtain a wide spatial range of information about the deformation of the Earth's surface with millimeter accuracy [19]. The output of InSAR processing is a collection of time series representing the ground displacements of measurement points (MPs) on the surface during the observed period. InSAR has been used for monitoring natural hazards such as landslides, earthquakes, volcanic activity, and ground subsidence [20], as well as for monitoring man-made structures such as dams, buildings [21], and bridges [22], and anthropogenic activities such as the pumping or injection of fluids. Various methods are used to process InSAR time series. For example, in [23] a processing InSAR time series method was proposed, using a convolutional neural network that can distinguish different ground motion features and detect nonlinear deformation signals on a large scale without human intervention. In [24], the authors performed a spatiotemporal analysis of land subsidence in Beijing using the InSAR method based on 47 TerraSAR-X SAR images from 2010 to 2015. Distinct variations in land sediment were identified in the study areas. Numerous examples of the application of InSAR time series processing can be found in the references of the articles above.
Overviews of trend-building methods applied to different research areas can also be found in [25,26,27,28]. These describe approaches and methods for modeling trends in geodetic time series, in time series of observations of the chemical composition of the atmosphere, in the series of physiological data in the medical field, as well as systematized methods for building a trend based on data in the social sphere in order to identify extraordinary events and incidents.
In [29], a new absolutely convergent algorithm for detection of coarse measurements in data with removed trends was proposed. A modification of this algorithm, leading to an optimal solution with a minimum number of rejected data, was proposed in [30], and a fast algorithm with O(NlogN) arithmetic operations performed by it to find the optimal solution was given in [31]. The author in [32] proposed a new trend-detection method in which the trend is searched in the power polynomial class, while the polynomials in the L2 norm are not fitted to all observational data but to their subset, called reference values, which are selected from these data through a convergent iterative process. One of the advantages of the proposed method is that the trend search procedure does not require a priori threshold values, which are usually used to eliminate outliers; all measurement data remain in processing until the trend is determined. The method was approved on the data obtained from GNSS receivers, as well as on the SLR measurement data obtained from the AJISAI geodetic satellite (EGS) [31]. The purpose of this article is to extend the applicability of the previously proposed trend detection method into two additional functional classes, which will make it possible to extend further investigations into many physical processes based on observed data.
The article includes examples of building a trend for noisy data with outliers generated using computer simulation. In the examples below, the trend search is carried out for three functional classes: power polynomials (see Section 6), trigonometric functions with a given set of frequencies (Section 9), and harmonic functions with unknown frequencies, amplitudes, and phases (Section 10). The peculiarity of the trend search in the last class of harmonic signals is that the desired harmonics depend nonlinearly on unknown frequencies and, for their search, the traditional least squares method does not lead to a system of linear equations. In this case, the values of unknown frequencies are sought by the conjugate gradients method combined with the MS method (see below). The results of numerical calculations are given.
The results yj=y(tj) of observations (measurements) of a certain random value Y(t), formed by measuring devices at time points {tj}Nj=1, in many cases can be represented in the form of a one-dimensional time series:
yj=fj+z+ξ j, j=1,…,N, | (1) |
where fj=f(tj) is a trend function dictated by the physical process under consideration and usually unknown; z – is an unknown average; ξ j=ξ (tj) is a centered random variable with an unknown distribution, which may contain outliers, resulting from adverse effects of external factors on the measurements [33]. The challenge is to find fj.
It is clear that this problem does not have a unique solution due to the lack of clear criteria when searching for a trend and the impact of outliers on the result of its search, as well as due to the absence of a strict definition of the concept "rough measurement" or "outlier". The trend detection strategy presented in this paper is based on the MS method, developed by the author, as well as on the search for optimal solution algorithms.
As mentioned above, trend and outlier detection problems are tightly coupled. To find a trend, we first need to find outliers and eliminate them from the values of series (1); after that, we need to find the desirable trend by adjusting some functions to the remaining values treated as reference ones. On the other hand, to detect outliers, it is necessary to know the trend. Initially, neither trend nor outliers are known. Both trend and outlier problems are found through iterations as described below. This Section describes the statement of the outlier detection problem for detrended data series, which is solved during search trend iterations (see Section 6).
If the trend fj is known, we can remove it from the time series (1):
ˆyj=z+ξ j; j=1,…,N, | (2) |
with ˆyj=yj−fj. The task of detecting outliers in series (2) is usually to find a set YL={ˆyj1,...,ˆyjL} of L numbers (length L) for which the following conditions are met:
σ YL=((L−1)−1∑j∈{j1,..,jL}(ˆyj−z)2)1/2⩽σmax, | (3) |
|ˆyj−z|⩽Δ;j∈{j1,..,jL}, | (4) |
L⩾Lmin, | (5) |
where σ YL and σmax are SD (standard deviation) and its preset threshold value, respectively; Δ is a parameter defining an outlier detection threshold (e.g., Δ=3σmax); Lmin is a specified parameter that limits the length of the search set from below (for example, Lmin=10); below we consider 1<Lmin<N. Values ˆyj not included in YL are considered outliers in series (2), and corresponding values yj – as outliers in series (1). The latter are removed from further processing. Often, conditions (3)–(5) are the only ones when searching coarse measurements in a series (2), while no additional conditions are imposed on the parameter L. Problem (2)–(5) is usually solved by iterations, which, as mentioned above, can lead to endless looping or loss of knowingly "good" data (for examples, see [29]).
In [30,31,32], the set YL search problems were formulated, where the conditions (3)–(5) are supplemented by two selection conditions: Of all possible sets satisfying (3)–(5), the set of maximum length Λ (the number of outliers is minimal) is selected as the desired set, and if there are several such sets, the set with the lowest SD value is selected. This solution was called optimal and designated YΛ,opt. The algorithms described in the above-listed works are guaranteed to lead to an optimal solution, if only it exists. However, their application to data containing a not-excluded trend can lead to an unreasonably large number of detected outliers or to the absence of a solution at all, when all measurement data will be detected as outliers. Therefore, our task is to build a suitable approximation ˜fj of an unknown trend and subtract it from the series (1). For this, we associate the concept of "suitable" with the requirement to maximize the number of data of series (1) remaining after the removal of outliers. Commonly, the trend is searched among the functions from some manifold chosen by the researcher depending on the physical problem being solved. In some cases, it can be power polynomials, in others it can be a class of trigonometric, rational functions, etc. Outliers in the measurement data can essentially distort the trend found. Below, we set out an approach to the trend-detection problem in noisy data, based on the minimizing sets method, which allows us to minimize the impact of outliers. We give a sequential exposition of the method for the class of power polynomials. Also, we will demonstrate the capabilities of the method when searching for a trend in other functional classes.
In the case of power polynomials, the trend is approximated by functions of the form
Pn,j(→a)=an(xj)n+an−1(xj)n−1+...+a0;j=1,…,N, | (6) |
where n∈N and →a={a0,...,an−1,an}∈Rn+1 are the degree and coefficients of the polynomial, respectively; xj=(tj−t1)/(tN−t1). The trend detection, in this case, is based on the fitting to reference values of series (1) of a polynomial (6), the degree and coefficients of which are sought through iterations (see Section 6).
Trend fitting is performed to L reference values, which are also searched through iterations. The L value is set in advance to satisfy the inequality L⩽N−−Nout, which is a necessary condition for outliers not to be included in the final set of reference values at the end of the iterative procedure.1 Because the number Nout of outliers is unknown, the parameter L in the below discussed algorithm (see Section 6) is set to be guaranteed to be smaller than the possible value of the quantity N−−Nout so that the above inequality is fulfilled.
1 In Section 7, we will waive this restriction.
Let us define a minimizing set consisting of L numbers of series (2), which plays an important role in the trend detection process.
Let YL={ˆyj1,...,ˆyjL} be a set of L values of the series (2). Denote zYL and σ YL the average and SD values for it, respectively. We have:
zYL=L−1∑j∈{j1,...,jL}ˆyj, | (7) |
σ 2YL=(L−1)−1∑j∈{j1,...,jL}(ˆyj−zYL)2. | (8) |
Definition 1. For a given value L and a given sequence {ˆyj}Nj=1, the set of values
YL,min={ˆyj∗1,...,ˆyj∗L}, |
at which the value σ 2YL determined by (8) reaches a minimum, we call the minimizing set of length L. The corresponding values of the average and SD for this set are denoted by zYL,min and σYL,min, respectively.
According to this definition, we have
zYL,min=L−1∑j∈{j∗1,...,j∗L}ˆyj, | (9) |
σ2YL,min=(L−1)−1∑j∈{j∗1,...,j∗L}(ˆyj−zYL,min)2=minYL{σ2YL}. | (10) |
The minimum in (10) is searched throughout all possible sets of length L composed of the numerical series {ˆyj}Nj=1. Note that the number of possible sets of length L that can be selected from the series (2) is equal to the number CLN of combinations of N elements taken L at a time. It follows that the algorithm for finding the minimizing set YL,min, based on the enumeration of all kinds of sets, as a rule, is not feasible in a reasonable time on any modern computer, even with relatively small values of N. For example, with N = 100, the number of sets of length L = 80 is of the order of 1021. To avoid global brute force when searching the minimizing set YL,min, we note that its composition is determined up to a permutation of the numbers {ˆyj∗1,...,ˆyj∗L} and does not depend on the arrangement of the numbers in the series (2). This means that when searching the minimizing set, we can arrange the numbers of the series (2) in any order. Let us arrange them in ascending order so that after renumbering (keeping the same notation for the numbers of series (2) and considering them different), the inequalities will be fulfilled:
ˆy1<ˆy2<...<ˆyN. | (11) |
Let us formulate an assertion that radically reduces the number of sets to be examined.
Assertion 1. Let YL,min={ˆyj∗1,...,ˆyj∗L} be a minimizing set of length L for a given sequence {ˆyj}Nj=1 and
ymin=min{ˆyj∗1,...,ˆyj∗L},ymax=max{ˆyj∗1,...,ˆyj∗L}. |
Then, the interval (ymin,ymax) does not contain values ˆyj that do not belong to the set YL,min.
See [32] for proof.
Without limiting generality, we consider the numbers ˆyj∗1,...,ˆyj∗L to be in ascending order, that is ˆyj∗1<...<ˆyj∗L. By virtue of (11), the indices j∗1,…,j∗L also follow in ascending order: j∗1<...<j∗L.
The following assertion is true.
Assertion 2. The ascending set of indices j∗1,...,j∗L in the minimizing set YL,min={ˆyj∗1,...,ˆyj∗L} contains all consecutive integers from j∗1 to j∗L.
Proof. Suppose the opposite: Let there be some index j0 such that j∗1<j0<j∗L, but j0∉{j∗1, ..., j∗L}. Then, the inequalities ˆyj∗1<ˆyj0<ˆyj∗L are satisfied for the number ˆyj0 but ˆyj0∉YL,min. This contradicts Assertion 1. It follows that the set of indices j∗1,j∗2,...,j∗L in YL,min coincides with the set of indices j∗1,j∗1+1,...,j∗1+L−1:
YL,min={ˆyj∗1,ˆyj∗1+1,...,ˆyj∗1+L−1}. |
Therefore, instead of a global enumeration of all possible sets (with a total number of order CLN), it is enough to search YL,min among the sets {ˆyk, ..., ˆyk+L−1} consisting of segments of length L of an ordered series (2), varying only one parameter k, subject to the condition 1⩽k⩽N−L+1. The total number of such sets is (N−L+1).
Thus, to find the minimizing set of length L, it is enough to consider N−−L+1 sets
{ˆy1,...,ˆyL},{ˆy2,...,ˆyL+1},…,{ˆyN−L+1,...,ˆyN} |
and select from them the one for which the SD value is minimal. The formulas below minimize the number of arithmetic operations in the search of YL,min.
Denote Y(k,L)={ˆyk, ..., ˆyk+L−1} a set of numbers corresponding to a pair (k,L), and z(k,L) and σ (k,L) as the arithmetic mean and SD value of the set Y(k,L), respectively:
z(k,L)=L−1k+L−1∑j=kˆyj; | (12) |
σ 2(k,L)=(L−1)−1k+L−1∑j=k[ˆyj−z(k,L)]2. | (13) |
The values z(k,L) and σ2(k,L) during the search YL,min can be calculated according to the diagram shown in Figure 1 (in which z(k,L) are shown only):
First, using formulas (12) and (13), the values z(1,L) and σ2(1,L) are calculated; then, by formulas
z(k+1,L)=z(k,L)+L−1(ˆyk+L−ˆyk); | (14) |
σ 2(k+1,L)=σ 2(k,L)+ |
+L−1(ˆyk+L−ˆyk){ˆyk+L−z(k,L)+L+1L−1(ˆyk−z(k,L))} | (15) |
the values {z(2,L) and σ2(2,L)} →, …, →{z(N−L+1,L) and σ2(N−L+1,L)} are successively calculated with the simultaneous selection of the smallest from the numbers:
σ2(m,L)=min1⩽k⩽N−L+1{σ2(k,L)}. |
The set YL,min={ˆym,ˆym+1,...,ˆym+L−1}, for which the SD is minimal, will be the desired minimizing set of length L for the ascending series {ˆyj}. As mentioned above, the same set will also be the minimizing set for the disordered series (2). Let us estimate the number of elementary operations needed to find it. Since the calculation of the values z(1,L) and σ2(1,L) requires 4L + 2 operations, and for each (N−L) the transitions in → direction (see Figure 1)–10, the total number of arithmetic operations when searching for the set YL,min in the ordered series (2) is estimated by the value 4L+2 +10(N−−L) ⩽ 10N.
In order to estimate the number of operations required to find the minimizing set in the unordered series (2), it is necessary to take into account the ordering costs, which are estimated by the value O(Nlog N) (see [34]).
Thus, we have proved the validity of the following assertion.
Assertion 3. For any (unordered) time series (2), a minimizing set YL,min of length L can be found in most O(Nlog N) arithmetic operations.
This section describes the trend-search algorithm using the MS method. Our immediate goal is to find a set of reference values, as a subset of the numbers of the series (1), used to fit the desired trend to them. Suppose that the subset we are looking for and denoted as YL,ref consists of L reference values: YL,ref = {yj1,...,yjL}, where L is not yet defined. As mentioned above, in order to obtain a proper approximation of the trend, the set of reference values should not contain outliers. This requires that the inequality L⩽N−−Nout be fulfilled, in which the parameter Nout, the number of outliers in series (1), is unknown.
Suppose that the number of outliers in series (1) does not exceed some value Nmaxout known a priori2. (For example, if it is known in advance that the number of outliers cannot exceed 10% of the total number of measurements, then we can set Nmaxout=0,1N.) Then, the number of "correct" values in a series yj that can be considered as reference ones is not lower than the value N−Nmaxout, and the problem is to find them.
2 In the next Section 7, we will abandon this assumption.
Below, we consider L to be fixed and associated with the number of reference values of series (1) used to build the trend.
Consider the following algorithm ([32]), containing two iterative cycles: external, by n, and internal, by s. Iterations over s will be marked with the superscript "s" in parentheses. Reference values chosen from the series (1) are marked with the flagj function, which takes values 1, if yj is considered as a reference value, or 0 otherwise.
Step 0. n = 0. Assume L=N−Nmaxout.
Step 1. n++3; Assume s = 0; (0)j=1, j=1,…,N. (Reference values are all numbers of the series (1): Y(0)N,ref = {y1,...,yN}.)
3 n incremented by 1.
Step 2. Using the least squares method, we find the coefficients →a(s)={a(s)0,...,a(s)n} of the polynomial of degree n:
→a(s)=argmina0,...,an−1,anΦ(s)(→a), | (16) |
Φ(s)(→a)≜N∑j=1(yj−Pn,j(→a))2⋅flag(s)j, | (17) |
where Pn,j(→a) is the polynomial defined by equation (6). The functional Φ(s)(→a) defined by equation (17) is calculated on L reference values Y(s)L,ref of series (1) found in the previous iteration4.
4 If s = 0, then reference values are all numbers of the series (1).
Step 3. We form the differences ˆy(s)j:
ˆy(s)j=yj−Pn,j(→a(s)), | (18) |
where Pn,j(→a(s)) is a polynomial with coefficients found in Step 2.
Step 4. For the series {ˆy(s)j}Nj=1 defined in (18), we find a minimizing set Y(s)L,min={ˆy(s)j∗(s)1,...,ˆy(s)j∗(s)L} of length L using the algorithm described in Section 5; using formulas (9) and (10), we can calculate the corresponding values of mean zY(s)L,min and SD σY(s)L,min.
Step 5. We redefine the reference points to find the polynomial at the next iteration (Step 2), marking them with the function flag(s+1)j:
flag(s+1)j={1,ifj∈{j∗(s)1,...,j∗(s)L}0,otherwise. | (19) |
At that Y(s+1)L,ref = {yj∗(s)1,...,yj∗(s)L}.
Step 6. s++. Increment the iteration counter s by 1 and go to Step 2.
We perform the sequence of Steps 2 to 6 until the convergence of the sequence σY(s)L,min, s = 0, 1, 2, ..., which always takes place (see Assertion 4).
After achieving convergence of the s-iterations (Steps 2–6), we proceed to Step 7 to find the optimal solution.
Step 7. Let the convergence of s-iterations be achieved at s=smax. In this case, the indices in both sets Y(smax)L,min and Y(smax)L,ref coincide up to permutation. The desired trend approximation ˜fn,j corresponding to the degree n of the polynomial is equal to ˜fn,j=Pn,j(→a(smax)).
Next, we find the optimal solution YΛ,opt={ˆyk1,...,ˆykΛ} for the series ˆy(smax)j=yj−Pn,j(→a(smax)), j= 1,…,N, using any of the algorithms described in [30,31,32]. As a result, the outliers will be the numbers yj of the series (1), for which j∉{k1,...,kΛ}. The resulting number of outliers detected is equal to Nout=N−−Λ.
If it turns out that Nout≤Nmaxout, then the solution is considered to be found, and the search process ends; at that, the desired trend is ˜fj=Pn,j(→a(smax)). Otherwise, if Nout>Nmaxout, then we transit to Step 1, at which we increase the polynomial degree by 1. The process (Steps 1–7) continues until a solution is found or n exceeds a preset value (e.g., 10). In the latter case, it will probably be necessary to select another functional class to search for a trend.
Example 1. We will generate synthetic noisy data using a computer simulation by the formula:
yj=5×ln(5+j)+2×randomj+{40×(0.6−randomj),if{j≡0(mod13)orj≡0(mod43)orj≡0(mod113)0,otherwise. | (20) |
Here, j=1...N, randomj are pseudo-random numbers evenly distributed on the segment [0, 1]. The first term on the right side (20) models the trend, the second models the white noise, and the third models outliers at points j, multiples of 13, 43, or 113 (Figure 2). Suppose N=1250, then the number of modeled outliers is 134. Next, we set σ max=0.6,Δ=1.8, Lmin=10 and will search for the trend and outliers in accordance with the algorithm described above, putting Nmaxout=140; at L=N−Nmaxout=1110.
The calculation results for this example are presented in Table 1, in which n is the degree of the polynomial; smax is the number of s-iterations performed until convergence; Λ is the length of the optimal solution found; and Nout is the number of outliers detected.
n — polynomial order | smax | Λ | Nout |
1 | 2 | 688 | 562 |
2 | 9 | 931 | 319 |
3 | 6 | 1049 | 201 |
4 | 6 | 1087 | 163 |
5 | 5 | 1105 | 145 |
6 | 7 | 1109 | 141 |
7 | 4 | 1116 | 134 |
8 | 4 | 1116 | 134 |
As can be seen from Table 1, the minimum polynomial order at which the maximum length Λ of the optimal solution is reached equals 7. Therefore, a 7th-degree polynomial can be chosen as a suitable trend approximation.
The values P7,j(→a(0)) of the 7th-degree polynomial, constructed over all reference values without performing the above iterations (s = 0), are shown in Figure 2 (solid blue line). The corresponding polynomial differences ˆyj=yj−P7,j(→a(0)) are shown in Figure 3. It can be seen that the values ˆyj are distributed unevenly relative to 0, which indicates the inaccuracy of the constructed polynomial trend, since the reference values contain outliers.
The optimal solution found for the obtained series ˆyj has the length Λ = 1067 (number of outliers = 183). Trend refinement occurs as a result of multiple repeats of Steps 2–6 of the above iterative procedure.
Figure 4 shows the values P7,j(→a(4)) of the found polynomial of the 7th-degree (blue line), approximating the data set over L = 1110 reference values found after achieving convergence at the 4th iteration (smax = 4) of the above iteration process (Step 2–6). The corresponding differences ˆyj=yj−˜fj, where ˜fj=P7,j(→a(4)), are shown in Figure 5. It can be seen that the values ˆyj are evenly distributed relative to 0. The optimal solution found for the obtained series ˆyj has a length Λ = 1116, while the number of detected outliers = 134, their positions being marked with white circles.
Note that in the example considered, the trend was modeled by a function that does not belong to the class of power polynomials.
As mentioned above, in order for the trend detection strategy described in the previous section to give an adequate result, it is necessary that the inequality L⩽N−Nout is fulfilled. The task of this section is to choose an L at which this inequality holds true. We cannot choose an L either too small or too large; otherwise, this will result in a distorted trend approximation ˜fj. Since Nout is unknown a priori, we cannot specify L to run a trend-search algorithm. To overcome this obstacle, we will use the L selection strategy proposed in [35]. Note that each value L corresponds to the optimal solution of length Λ depending on L (Λ = Λ(L)) obtained after removing from the series (1) trend, built on the L reference values. The L selection strategy proposed in [35] is based on searching the maximum of the length function Λ(L). By applying the algorithm discussed in Section 6, we can find a length sequence Λ(N), Λ(N−1),..., Λ(Lmin) for optimal solutions corresponding to the sequence of different L, starting with L = N and further decreasing L by 1. Then, we can find the maximum of this sequence denoted as Λ∗:
Λ∗=maxLmin⩽L⩽NΛ(L). | (21) |
Next, from all possible values of L for which the equality Λ(L)=Λ∗ is fulfilled, we choose the maximum that we designate as L*:
L∗=max{L:Λ(L)=Λ∗}. | (22) |
It corresponds to the optimal solution YΛ∗,opt={ˆyj1,...,ˆyjΛ∗} of length Λ∗, at which the number of outliers detected is equal to N – Λ∗.
Let us test this strategy presented by equations (21) and (22) on the data from Example 1, expressed by equation (20). We will approximate the trend with a 7th-degree polynomial (n = 7). As a result, we find the sequence Λ(N), Λ(N−1), … plotted in Figure 6. The maximum value Λ∗=1118; the values of L at which this maximum is achieved are all numbers in the range L∈[1109,1129], the maximum of them is equal to L∗=1129. The reference values of the original series found at L∗=1129 can be used as reference ones for building a polynomial trend.
The question of convergence of iterations described in the trend search algorithm (see Section 6) is formulated in the following statement.
Assertion 4. The sequence σY(s)L,min of SD values calculated in the trend-search algorithm decreases monotonically at s = 0.1, ...:
σY(s)L,min⩾σY(s+1)L,min⩾... | (23) |
and therefore converges.
For proof, see [32].
Remark. The monotonic decreasing of the sequence σY(s)L,min of SD values calculated in the trend-search algorithm may be violated during real calculations due to insufficient calculation accuracy. It may happen that calculations with 64 bits do not provide the necessary calculation accuracy. The situation is usually corrected by switching to a 128-bit grid.
The idea of the method described for finding a trend in a power polynomial class can also be implemented for other functional classes. Consider a periodic with period N function defined on a homogeneous grid Ω={xj=hj,j=0,±1,±2,..,Nh=l} that describes the sequence of measurements of a signal coming from a technical device:
f(j)=f(j+N),j=0,±1,..,. |
As known, the function f(j) can be expanded into a discrete Fourier series into a sum of trigonometric functions:
cos2kπ jN,sin2kπ jN,j=0,…,N−1. |
The trend search for such a signal can be carried out in the form of an n-th-order polynomial
˜fj=Тn,j(→a,→b)=n∑k=0akcos2kπjN+bksin2kπjN, | (24) |
in which the coefficients ak, bk are searched by the least squares method from the condition of the minimum functional, similar to (17), where Pn,j(→a) should be replaced by Тn,j(→a,→b). The trend-search algorithm, in this case, is no different from the polynomial trend-search algorithm described above.
The convergence of s-iterations (see Assertion 4) for a given class of functions also takes place, as proved by literally word-for-word repetition of the proof of Assertion 4.
Example 2. Consider a computer simulation of the wave packet in the form of the product of the Gaussian function by a sinusoid:
yj=1,5⋅cos(10×2π (j/N−0.5))×exp(−25.0×(j/N−0.5)2)+0.15×randomj+ |
+{3×(randomj−0.5),if{j≡0(mod13)orj≡0(mod17)orj≡0(mod43)0,otherwise, | (25) |
where j = 1...N. The first two terms on the right side simulate a trend in the form of a wave packet and additive white noise, respectively. The third term simulates outliers at points j that are multiples of 13, 17, and 43.
Figure 7 shows the values of this function at N = 500 (orange diamonds) and the values T12,j(→a(0),→b(0)) of trigonometric polynomial (24) of the 12th order, plotted over all reference points, i.e., without performing the s-iterations (s = 0) (blue circles). Figure 8 shows the differences ˆyj=yj−T12,j(→a(0),→b(0)). It can be seen that the residuals contain a non-deleted trend component, which leads to an unreasonably large number of rejected measurement results when finding the optimal solution.
Figure 9 shows the values T12,j(→a(2),→b(2)) of the trigonometric polynomial of the 12th order (blue circles), constructed over L = 450 reference values found after achieving convergence at the second iteration (smax=2) of the above iteration process (see Section 6).
The differences ˆyj=yj−T12,j(→a(2),→b(2)) shown in Figure 10 are evenly distributed relative to 0, which indicates a satisfactory approximation of the trend.
Consider the problem of finding a trend in data representing a superposition of unknown harmonic signals, on which noise and outliers are superimposed:
yj=fj+z+ξ j,j=1,…,N, |
fj=Hn,j(→a, →b, →ν)=a0+n∑k=1akcos(νkj)+bksin(νkj). | (26) |
Here, ak,bk,νk are the unknown parameters to be defined. An unknown parameter is also the number n of harmonics representing the signal under study.
The fundamental difference between this representation and decomposition (24) is its nonlinear dependence on unknown parameters νk, the search for which cannot be reduced to solving a system of linear equations. The search for reference values is carried out in accordance with the algorithm described above when searching for a polynomial trend, while it is necessary to replace Pn,j(→a) by Hn,j(→a, →b, →ν) in the expression (17) for the functional. The search for the minimum of the functional (see Step 2, Section 6), obtained as a result of such a replacement, is carried out by the conjugate gradients method.
Example 3. Figure 11 shows the result of a computer simulation of the harmonic signal generated by the formula
yj=4.4sin(0,009123j+2.42)+3.0×(randomj−0.5)+ |
+{25×(0.5−randomj),if{j≡0(mod13)orj≡0(mod23)orj≡0(mod113)0,otherwise, | (27) |
where j=1,...,N=5700. On the signal represented by the first term on the right side of equation (27), additive white noise (the second term) and outliers (the third term added to the previous two at points j, multiples of 13, 23, or 113) are superimposed.
A fragment of this signal in enlarged form is shown in Figure 12.
As a result of the search, the frequency of the signal was determined (see Table 2).
\boldsymbol k | {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} (exact) | {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} (found) | |\Delta {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}}|/{{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} |
1 | 0.0091230 | 0.0091240 | < 1.1×10−4 |
The detected trend is shown in Figure 13. The maximum value of the relative error of the found trend compared to the exact one is 0.008.
Example 4. Figure 14 shows the result of a computer simulation of a superposition of five harmonics:
\begin{align} {y_j} = &{\text{10}}{\text{.8}} \times {\text{sin}}\;{\text{(0}}{\text{.002500}} \cdot j{\text{)}}\;{\text{ + }}\;{\text{7}}{\text{.0}} \times {\text{sin}}\;{\text{(0}}{\text{.004555}} \cdot j{\text{ + 7}}{\text{.32)}}\;{\text{ + }} \hfill \\ +&{\text{8}}{\text{.1}} \times {\text{sin}}\;{\text{(0}}{\text{.007123}} \cdot j{\text{ + 4}}{\text{.51)}}\;{\text{ + }}\;{\text{4}}{\text{.4}} \times {\text{sin}}\;{\text{(0}}{\text{.009123}} \cdot j{\text{ + 2}}{\text{.42)}}\;{\text{ + }} \hfill \\ +&{\text{5}}{\text{.6}} \times {\text{sin}}\;{\text{(0}}{\text{.012345}} \cdot j{\text{ + 0}}{\text{.59)}}\; + \hfill \\ \end{align} |
+ {\text{3}}{\text{.0}} \times {\text{rando}}{{\text{m}}_j} + \left\{ {\begin{array}{*{20}{l}} {25 \times ({\text{random}}{}_j - \;0.5), \quad if}&{\left\{ \begin{array}{l} j \equiv 0\;(\bmod \;13)\;or \hfill \\ j \equiv 0\;(\bmod \;23)\;or \hfill \\ j \equiv 0\;(\bmod \;113)\; \hfill \\ \end{array} \right.} \\ {\quad \quad \quad 0, }&{otherwise} \end{array}} \right., | (28) |
where j = 1, ..., N = 5700. On the signal represented by the first five terms on the right side of equation (28), additive white noise (the penultimate term) and outliers (the last term added to the previous ones at j, multiples of 13, 23, and 113) are superimposed.
Table 3 shows the frequencies found by the conjugate gradients method implemented in the trend-search algorithm described above. Figure 15 shows the found trend (blue line). The maximum value of the relative error of the found trend compared to the exact one is 0.006. The lower value of the relative bias found in the case of five harmonics compared to the case of one harmonic is explained by different levels of modeled additive white noise. In the case of one harmonic, it is 34% of the carrier level; in the case of five harmonics, it is 6% [see formulae (27), (28)].
\boldsymbol k | {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} (exact) | {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} (found) | |\Delta {{\mathbf{\mathtt{ν}}}_k}|/{{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} |
1 | 0.002500 | 0.0025010 | 4.0 × 10−4 |
2 | 0.004555 | 0.0045550 | < 2.0 × 10−5 |
3 | 0.007123 | 0.0071210 | 2.8 × 10−4 |
4 | 0.0091230 | 0.0091260 | 3.3 × 10−4 |
5 | 0.012345 | 0.0123500 | 4.0 × 10−4 |
The developed method of minimizing sets allows us to find the appropriate approximations of the trend in the numerical series of noisy data containing outliers. The trend search can be carried out in various functional classes, such as power polynomials, trigonometric functions with a given set of frequencies, and harmonic functions with unknown frequencies, amplitudes, and phases, which is confirmed by the considered numerical examples. High efficiency of the trend detection method in noisy series of measurement data has been demonstrated. In the case of harmonic noisy signals, harmonic frequencies were found using the conjugate gradients method combined with the minimizing sets method. The relative error of the frequencies found in the considered examples did not exceed the magnitude {\text{4}} \times {\text{1}}{{\text{0}}^{ - {\text{4}}}} . After constructing a trend and subtracting it from the values of the original data series, the robust algorithms are used to search for outliers, which are guaranteed to lead to an optimal solution with a minimum number of rejected measurements.
The method can be effectively applied in solving numerous applied and fundamental problems in various scientific and technical fields such as space geodesy, gravimetry, and astrometry, as well as in the tasks of exploring near and deep space, Earth's remote sensing, and others. The choice of the functional class for the trend search is chosen by the researcher, taking into account the specifics of the problem being solved.
Finding a trend in other functional classes, such as the class of fractional rational functions used in Padé approximations, requires more research. Some problems concerning the prediction of stationary processes can be solved using the harmonic functions discussed above. At the same time, additional studies are also required to predict quasi-stationary time series, such as satellite GNSS orbits and Earth rotation parameters. In the future, the priority area of research is the use of the proposed method to solve problems with real measurement data.
The author thanks the anonymous reviewers for their comments that significantly improved the presentation of the paper.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
The author declares no conflict of interest.
[1] |
Oliver MA, Gregory PJ (2015) Soil, food security and human health: a review. Eur J Soil Sci 66: 257-276. doi: 10.1111/ejss.12216
![]() |
[2] |
Zhao FJ (2018) Soil and human health. Eur J Soil Sci 69: 158. doi: 10.1111/ejss.12528
![]() |
[3] | Petruzzelli G, Gorini F, Pezzarossa B, et al. (2010) The fate of pollutants in soil. CNR Environment and Health Inter-departmental Project, 1-25. |
[4] |
McLaren L, Hawe P (2005) Ecological perspectives in health research. J Epidemiol Community Health 59: 6-14. doi: 10.1136/jech.2003.018044
![]() |
[5] | NRC National Research Council. Bioavailability of contaminants in soils and sediments: processes, tools and applications; National Academies: Washington, DC, 2002. |
[6] | Petruzzelli G, Pedron F, Rosellini I, et al. (2015) The Bioavailability Processes as a Key to Evaluate Phytoremediation Efficiency, In: Ansari, A.A., Gill, S.S., Gill, S., Lanza, G.R., Newman, L. Editors, Phytoremediation: Management of Environmental Contaminants, Switzerland: Springer International Publishing. 1: 31-43. |
[7] |
Petruzzelli G, Pedron F, Rosellini I, et al. (2013) Phytoremediation towards the future: focus on bioavailable contam+inants, In: Gupta, D.K. Editor, Plant-based remediation processes. Soil biology, Berlin: Springer, 35: 273-289. doi: 10.1007/978-3-642-35564-6_13
![]() |
[8] |
Guney M, Zagury GJ, Dogan N, et al. (2010) Exposure assessment and risk characterization from trace elements following soil ingestion by children exposed to playgrounds, parks and picnic areas. J Hazard Mater 182: 656-664. doi: 10.1016/j.jhazmat.2010.06.082
![]() |
[9] |
Luo XS, Ding J, Xu B, et al. (2012) Incorporating bioaccessibility into human health risk assessments of heavy metals in urban park soils. Sci Total Environ 424: 88-96. doi: 10.1016/j.scitotenv.2012.02.053
![]() |
[10] |
Ng JC, Juhasz A, Smith E, et al. (2015) Assessing the bioavailability and bioaccessibility of metals and metalloids. Environ Sci Pollut Res 22: 8802-8825. doi: 10.1007/s11356-013-1820-9
![]() |
[11] |
Wilson R, Mitchell I, Richardson GM (2016) Estimation of dust ingestion rates in units of surface area per day using a mechanistic hand-to-mouth model. Hum Ecol Risk Assess 22: 874-881. doi: 10.1080/10807039.2015.1115956
![]() |
[12] |
Van Wijnen JH, Clausing P, Brunekreef B (1990) Estimated soil ingestion by children. Environ Res 51: 147-162. doi: 10.1016/S0013-9351(05)80085-4
![]() |
[13] | Abrahams PW (2002) Soils: their implications to human health. Sci Total Environ 219: 1-32. |
[14] | Plumlee GS, Morman SA, Zeigler TL (2006) The toxicological geochemistry of earth materials: An overview of processes and the interdisciplinary methods used to understand them, In: Sahai, N., Schoonen, M.A.A. Editors, Medical Mineralogy and Geochemistry Reviews in Mineralogy and Geochemistry, Washington DC: Mineralogical Society of America. 64: 5-57. |
[15] | Plumlee GS, Ziegler TL (2007) The medical geochemistry of dusts, soils, and other earth materials, In: Lollar BS, Holland HD, Turekian, KK. Editors, Treatise on Geochemistry, Oxford: Elsevier. 9: 1-61. |
[16] | Selinus O, Alloway B, Centeno J, et al. (2005) Essentials of Medical Geology: Impacts of the Natural Environment on Public Health, London, UK: Elsevier Academic Press. |
[17] | Purakayastha TJ, Chhonkar PK (2010) Phytoremediation of Heavy Metal Contaminated Soils, In: Sherameti, I, Varma, A. Editors, Soil Heavy Metals, Soil Biology, Berlin Heidelberg: Springer-Verlag. 389-429. |
[18] |
Li YM, Chaney R, Brewer E, et al. (2003) Development of a technology for commercial phytoextraction of nickel: economic and technical considerations. Plant Soil 249: 107-115. doi: 10.1023/A:1022527330401
![]() |
[19] | Chaney RL, Angle JS, McIntosh MS, et al. (2005) Using hyperaccumulator plants to phytoextract soil Ni and Cd. Z Naturforsch C 60: 190-198. |
[20] | Abdullah S, Sarem SM (2010) The potential of Chrysanthemum and Pelargonium for phytoextraction of lead-contaminated soils. J Civ Eng 4: 409-416. |
[21] | Pezzarossa B, Petruzzelli G (2001) Selenium contamination in soil: sorption and desorption processes, In: Selim, M.H., Sparks, D.L. Editors, Heavy metals release in soils, Boca Raton, CRC Press, 197-212. |
[22] |
Wang Q, Li Z, Cheng S, et al. (2010) Effects of humic acids on phytoextraction of Cu and Cd from sediment by Elodea nuttallii. Chemosphere 78: 604-608. doi: 10.1016/j.chemosphere.2009.11.011
![]() |
[23] |
Evans LJ (1989) Chemistry of metal retention by soils. Environ Sci Technol, 23: 1046-1056. doi: 10.1021/es00067a001
![]() |
[24] |
Cherlatchka R, Cambier P (2000) Influence of reducing conditions on solubility of trace metals in contaminated soils. Water Air Soil Pollut 118: 143-167. doi: 10.1023/A:1005195920876
![]() |
[25] |
Fitz WJ, Wenzel WW (2002) Arsenic transformations in the soil-rhizosphere-plant system: fundamentals and potential application to phytoremediation. J Biotechnol 99: 259-278. doi: 10.1016/S0168-1656(02)00218-3
![]() |
[26] |
Alexander M (2000) Aging, bioavailability, and overestimation of risk from environmental pollutants. Environ Sci Technol 34: 4259-4265. doi: 10.1021/es001069+
![]() |
[27] | Harmsen JW, Rulkens W, Eijsakers H (2005) Bioavailability: concept for understanding or tool to predicting. Land Cont Recl 13: 161-171. |
[28] | Petruzzelli G, Pedron F (2006) Bioavailability at heavy metal contaminated sites: a tool to select remediation strategies, In: Proceedings of the International conference on the remediation of polluted sites (BOSICON), Rome, Italy. 1-8. |
[29] |
Harmsen J (2007) Measuring bioavailability: from a scientific approach to standard methods. J Environ Qual 36: 1420-1428. doi: 10.2134/jeq2006.0492
![]() |
[30] |
Semple KT, Doick KJ, Wick LY (2007) Microbial interactions with organic contaminants in soil: definitions, processes and measurement. Environ Pollut 150: 166-176. doi: 10.1016/j.envpol.2007.07.023
![]() |
[31] | USEPA (2008), Standard operating procedure for an in vitro bioaccessibility assay for lead in soil. EPA 9200, US Environmental Protection Agency. 1-86. |
[32] |
Hu X, Zhang Y, Luo J, et al. (2011) Bioaccessibility and health risk of arsenic, mercury and other metals in urban street dusts from a mega-city, Nanjing, China. Environ Pollut 159: 1215-1221. doi: 10.1016/j.envpol.2011.01.037
![]() |
[33] |
Ruby MV, Davis A, Schoof R, et al. (1996) Estimation of lead and arsenic bioavailability using a physiologically based extraction test. Environ Sci Technol 30: 422-430. doi: 10.1021/es950057z
![]() |
[34] |
Rodriguez R, Basta N, Casteel SW, et al. (1999) An in vitro gastrointestinal method to estimate bioavailable arsenic in contaminated soils and solid media. Environ Sci Technol 33: 642-649. doi: 10.1021/es980631h
![]() |
[35] |
Oomen AG, Tolls J, Sips AJAM, et al. (2003) In vitro intestinal lead uptake and transport in relation to speciation. Arch Environ Contam Toxicol 44: 116-124. doi: 10.1007/s00244-002-1226-z
![]() |
[36] |
Drexler JW, Brattin WJ (2007) An in vitro procedure for estimation of lead relative bioavailability: with validation. Human Ecol Risk Assess 13: 383-401. doi: 10.1080/10807030701226350
![]() |
[37] |
Isikli B, Demir TA, Ürer SM, et al. (2003) Effects of chromium exposure from a cement factory. Environ Res 91: 113-118. doi: 10.1016/S0013-9351(02)00020-8
![]() |
[38] |
Schuhmacher M, Domingos JL, Garreta J (2004) Pollutants emitted by a cement plant: health risks for the population living in the neighbourhood. Environ Res 95: 198-206. doi: 10.1016/j.envres.2003.08.011
![]() |
[39] |
Schuhmacher M, Nadal M, Domingo JL (2009) Environmental monitoring of PCDD/Fs and metals in the vicinity of a cement plant after using sewage sludge as a secondary fuel. Chemosphere 74: 1502-1508. doi: 10.1016/j.chemosphere.2008.11.055
![]() |
[40] | Sparks DL (1996) Methods of Soil Analysis, Part 3-Chemical Method. Madison, USA: Soil Science Society of America Inc. |
[41] |
Takáč P, Szabová T, Kozáková Ľ, et al. (2009) Heavy metals and their bioavailability from soils in the long-term polluted Central Spiš region of SR. Plant Soil Environ 55: 167-172. doi: 10.17221/21/2009-PSE
![]() |
[42] | Italian Government (2006) Official Gazette No. 88 of the Italian Republic of 14-04-2006. Ordinary Supplement No. 96 (in Italian). Istituto Poligrafico e Zecca dello Stato, Rome. |
[43] |
Chandrasekaran A, Ravisankar R, Harikrishnan N, et al. (2015) Multivariate statistical analysis of heavy metal concentration in soils of Yelagiri Hills, Tamilnadu, India Spectroscopical approach. Spectrochim. Acta A. 137: 589-600. doi: 10.1016/j.saa.2014.08.093
![]() |
[44] |
Nolan AL, McLaughlin MJ, Mason SD (2003) Chemical speciation of Zn, Cd, Cu, and Pb in pore waters of agricultural and contaminated soils using Donnan dialysis. Environ Sci Technol 37: 90-98. doi: 10.1021/es025966k
![]() |
[45] |
Rasmussen PE, Beauchemin S, Nugent M, et al (2008) Influence of matrix composition on the bioaccessibility of copper, zinc, and nickel in urban residential dust and soil. Human Ecol Risk Assess 14: 351-371. doi: 10.1080/10807030801934960
![]() |
[46] |
Rieuwerts JS (2007) The mobility and bioavailability of trace metals in tropical soils: a review. Chem Spec Bioavail 19: 75-85. doi: 10.3184/095422907X211918
![]() |
[47] |
Ayoubi S, Jababri M, Khademi H (2018) Multiple linear modelling between soil properties, magnetic susceptibility and heavy metals in various land uses. Model Earth Syst Environ 4: 579-589. doi: 10.1007/s40808-018-0442-0
![]() |
[48] |
Zhao Z, Nie T, Yanga Z, et al. (2018) The role of soil components in the sorption of tetracycline and heavy metals in soils. RSC Adv 8: 32178-32187. doi: 10.1039/C8RA06631K
![]() |
[49] |
Siciliano SD, James K, Zhang GY., et al. (2009) Adhesion and enrichment of metals on human hands from contaminated soil at an Arctic urban brownfield. Environ Sci Technol 43: 6385-6390. doi: 10.1021/es901090w
![]() |
[50] |
Luo CL, Liu CP, Wang Y, et al. (2011) Heavy metal contamination in soils and vegetables near an e-waste processing site, south China. J Hazard Mater 186: 481-490. doi: 10.1016/j.jhazmat.2010.11.024
![]() |
[51] |
Ljung K, Oomen A, Duits M et al. (2007) Bioaccessibility of metals in urban playground soils. J Environ Sci Hlth A 42: 1241-1250. doi: 10.1080/10934520701435684
![]() |
[52] |
Caboche J, Denys S, Feidt C, et al. (2010) Modelling Pb bioaccessibility in soils contaminated by mining and smelting activities. J Environ Sci Hlth A 45: 1264-1274. doi: 10.1080/10934529.2010.493818
![]() |
[53] |
Pelfrêne C, Waterlot M, Mazzuca C, et al. (2011) Assessing Cd, Pb, Zn human bioaccessibility in smelter-contaminated agricultural topsoils (northern France). Environ Geochem Health 33: 477-493. doi: 10.1007/s10653-010-9365-z
![]() |
[54] | Baars AJ, Theelen RMC, Janssen PJ, et al. (2004) Re-evaluation of human-toxicological maximum permissibile risk levels. Bilthoven, The Netherlands: National Institute for Public Health and the Environment (RIVM), Report No. 711701025. |
[55] |
Calabrese EJ, Stanek EJ, James RC, et al. (1997) Soil ingestion: A concern for acute toxicity in children. Environ Health Persp 105: 1354-1358. doi: 10.1289/ehp.971051354
![]() |
[56] |
Ferreri SJ, Tamm L, Wier KG (2006) Using food aversion to decrease severe pica by a child with autism. Behav Modif 30: 456-471. doi: 10.1177/0145445504272970
![]() |
n — polynomial order | {s_{\max }} | Λ | {N_{{\text{out}}}} |
1 | 2 | 688 | 562 |
2 | 9 | 931 | 319 |
3 | 6 | 1049 | 201 |
4 | 6 | 1087 | 163 |
5 | 5 | 1105 | 145 |
6 | 7 | 1109 | 141 |
7 | 4 | 1116 | 134 |
8 | 4 | 1116 | 134 |
\boldsymbol k | {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} (exact) | {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} (found) | |\Delta {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}}|/{{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} |
1 | 0.0091230 | 0.0091240 | < 1.1×10−4 |
\boldsymbol k | {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} (exact) | {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} (found) | |\Delta {{\mathbf{\mathtt{ν}}}_k}|/{{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} |
1 | 0.002500 | 0.0025010 | 4.0 × 10−4 |
2 | 0.004555 | 0.0045550 | < 2.0 × 10−5 |
3 | 0.007123 | 0.0071210 | 2.8 × 10−4 |
4 | 0.0091230 | 0.0091260 | 3.3 × 10−4 |
5 | 0.012345 | 0.0123500 | 4.0 × 10−4 |
n — polynomial order | {s_{\max }} | Λ | {N_{{\text{out}}}} |
1 | 2 | 688 | 562 |
2 | 9 | 931 | 319 |
3 | 6 | 1049 | 201 |
4 | 6 | 1087 | 163 |
5 | 5 | 1105 | 145 |
6 | 7 | 1109 | 141 |
7 | 4 | 1116 | 134 |
8 | 4 | 1116 | 134 |
\boldsymbol k | {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} (exact) | {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} (found) | |\Delta {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}}|/{{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} |
1 | 0.0091230 | 0.0091240 | < 1.1×10−4 |
\boldsymbol k | {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} (exact) | {{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} (found) | |\Delta {{\mathbf{\mathtt{ν}}}_k}|/{{\mathbf{\mathtt{ν}}}_{\boldsymbol k}} |
1 | 0.002500 | 0.0025010 | 4.0 × 10−4 |
2 | 0.004555 | 0.0045550 | < 2.0 × 10−5 |
3 | 0.007123 | 0.0071210 | 2.8 × 10−4 |
4 | 0.0091230 | 0.0091260 | 3.3 × 10−4 |
5 | 0.012345 | 0.0123500 | 4.0 × 10−4 |