Research article

Multivariate polynomial regression by an explainable sigma-pi neural network

  • Received: 17 September 2024 Revised: 14 October 2024 Accepted: 17 October 2024 Published: 30 October 2024
  • Over the years, data-driven regression on univariate functions has been extensively studied. However, fast, effective, and stable algorithms for multivariate function fitting are still lacking. Recently, Kolmogorov-Arnold networks have garnered significant attention among scholars due to their superior accuracy and interpretability compared to multi-layer perceptrons. In this paper, we have demonstrated that the sigma-pi neural network, a form of Kolmogorov-Arnold networks, can efficiently fit multivariate polynomial functions, including fractional-order multivariate polynomials. Three examples were employed to illustrate the regression performance of the designed neural networks. The explainable sigma-pi neural network will lay the groundwork for further development of general tools for multivariate nonlinear function regression problems.

    Citation: Xiaoxiang Guo, Zuolin Shi, Bin Li. Multivariate polynomial regression by an explainable sigma-pi neural network[J]. Big Data and Information Analytics, 2024, 8: 65-79. doi: 10.3934/bdia.2024004

    Related Papers:

    [1] David E. Bernholdt, Mark R. Cianciosa, David L. Green, Kody J.H. Law, Alexander Litvinenko, Jin M. Park . Comparing theory based and higher-order reduced models for fusion simulation data. Big Data and Information Analytics, 2018, 3(2): 41-53. doi: 10.3934/bdia.2018006
    [2] Nickson Golooba, Woldegebriel Assefa Woldegerima, Huaiping Zhu . Deep neural networks with application in predicting the spread of avian influenza through disease-informed neural networks. Big Data and Information Analytics, 2025, 9(0): 1-28. doi: 10.3934/bdia.2025001
    [3] Marco Tosato, Jianhong Wu . An application of PART to the Football Manager data for players clusters analyses to inform club team formation. Big Data and Information Analytics, 2018, 3(1): 43-54. doi: 10.3934/bdia.2018002
    [4] Sayed Mohsin Reza, Md Al Masum Bhuiyan, Nishat Tasnim . A convolution neural network with encoder-decoder applied to the study of Bengali letters classification. Big Data and Information Analytics, 2021, 6(0): 41-55. doi: 10.3934/bdia.2021004
    [5] Bill Huajian Yang . Modeling path-dependent state transitions by a recurrent neural network. Big Data and Information Analytics, 2022, 7(0): 1-12. doi: 10.3934/bdia.2022001
    [6] Wenxue Huang, Qitian Qiu . Forward Supervised Discretization for Multivariate with Categorical Responses. Big Data and Information Analytics, 2016, 1(2): 217-225. doi: 10.3934/bdia.2016005
    [7] Jason Adams, Yumou Qiu, Luis Posadas, Kent Eskridge, George Graef . Phenotypic trait extraction of soybean plants using deep convolutional neural networks with transfer learning. Big Data and Information Analytics, 2021, 6(0): 26-40. doi: 10.3934/bdia.2021003
    [8] Dongyang Yang, Wei Xu . Statistical modeling on human microbiome sequencing data. Big Data and Information Analytics, 2019, 4(1): 1-12. doi: 10.3934/bdia.2019001
    [9] Bill Huajian Yang, Jenny Yang, Haoji Yang . Modeling portfolio loss by interval distributions. Big Data and Information Analytics, 2020, 5(1): 1-13. doi: 10.3934/bdia.2020001
    [10] Ugo Avila-Ponce de León, Ángel G. C. Pérez, Eric Avila-Vales . A data driven analysis and forecast of an SEIARD epidemic model for COVID-19 in Mexico. Big Data and Information Analytics, 2020, 5(1): 14-28. doi: 10.3934/bdia.2020002
  • Over the years, data-driven regression on univariate functions has been extensively studied. However, fast, effective, and stable algorithms for multivariate function fitting are still lacking. Recently, Kolmogorov-Arnold networks have garnered significant attention among scholars due to their superior accuracy and interpretability compared to multi-layer perceptrons. In this paper, we have demonstrated that the sigma-pi neural network, a form of Kolmogorov-Arnold networks, can efficiently fit multivariate polynomial functions, including fractional-order multivariate polynomials. Three examples were employed to illustrate the regression performance of the designed neural networks. The explainable sigma-pi neural network will lay the groundwork for further development of general tools for multivariate nonlinear function regression problems.



    With the rapid development of big data and artificial intelligence, data-driven predictions now permeate nearly every aspect of modern science [1,2]. Regression analysis, a key component of data-driven science, has wide applications across various domains, including new material design, stock prediction, medical diagnosis, and geological exploration, etc. [3,4,5,6,7,8,9]. Besides, machine learning offers numerous methods for regression analysis, such as support vector regression (SVR), decision trees (DTs), multi-layer perceptrons (MLPs), deep neural networks, etc. Although those methods can achieve high accuracy, the "black box" nature of neural networks limits their interpretability.

    According to the Weierstrass approximation theorem, continuous functions defined on closed intervals can be uniformly approximated by polynomial functions [10]. Moreover, a polynomial function has a simpler form compared to other complex functions, making polynomial regression a natural choice for practical applications [11,12]. Polynomial regression can be divided into univariate function fitting and multi-variate function regression. While univariate function fitting is highly effective supported by popular data analysis software such as Origin, MATLAB, and SPSS based on the least squares algorithm, there remains a lack of fast, effective, and stable algorithms for multivariate function regression.

    The theme of this paper is to design a fast, efficient and accurate algorithm to solve the multivariate polynomial regression problems (Eq (1.1)).

    y=f(x1,x2,,xn)=ni=1αixβii+nj,k=1,jkγjkxμjjxνkk+ns,p,q=1,spqηspqxσssxλppxτqq+ (1.1)

    In the regression process, SVR introduces a kernel function to construct the nonlinear relation. The MLP method employs the weighted summation of variables (iωjixi+θj) to achieve the aggregation of information. As the weighted summation only passes a linear relationship between variables, the nonlinear knowledge, for example the product relation xμjjxνkk, should be approximated by using an activation function, such as the sigmoid function, f(x)=1/(1+ex), the tanh function, f(x)=(exex)/(ex+ex), etc. However, the applications of kernel functions in SVR and activation functions in MLPs make it difficult to interpret the machine learning model.

    Recently, Kolmogorov-Arnold networks (KANs) have gained attention due to their superior accuracy and interpretability compared to multi-layer perceptrons [13]. MLPs have fixed activation functions on nodes while KANs have learnable activation functions on edges. The Kolmogorov-Arnold representation theorem established that, if f is a multivariate continuous function on a bounded domain, then f can be written as a finite composition of continuous functions of a single variable and the binary operation of addition. The regression model of KANs is in the form of

    y=f(x1,x2,,xn)=nL1iL1=1ϕL1,iL,iL1(nL2iL2=1ϕL2,iL1,iL2(n2i2=1ϕ2,i3,i2(n1i1=1ϕ1,i2,i1)(ni0=1ϕ0,i1,i0(xi0)))) (1.2)

    where, L represents the number of network layers and ϕk,ik+1,ik is a continuous functions of signal variable. That is to say, KANs train and select suitable signal variable continuous functions in each network layer from the given basic function library including polynomial functions, exponential functions, logarithmic functions, etc. Although, KANs have higher accuracy and interpretability, the regression of product relation in multivariate polynomial functions still need to transform to the coupling of univariate functions, which will undoubtedly increase the complexity of the regression.

    The traditional artificial neural network uses nodes whose information transmission is via a sigma unit, i.e., a linear weighted sum of inputs, a=iωixi. A pi unit is constructed by replacing the weighted summation in the activation with a weighted product, a=ixωii. Units of this type are also designated "higher order" in that they contain polynomial terms of order greater than one (linear). The simplest way of modeling such a product relation in multivariate polynomial functions is to increase the complexity of the node by a pi unit. Moreover, a sigma-pi unit can formulate a sum of product terms,

    y=mj=1cjaj=mj=1cjni=1xωjii. (1.3)

    Inspired by this architecture of a sigma-pi unit, we find the high-order sigma-pi neural network (SPNN) [14,15] can efficiently achieve the regression of multivariate polynomial functions. The coefficients and exponents of the multivariate polynomial can be determined through the trained neural network weight parameters (cj and ωji).

    Performing the regression of multivariate polynomial functions by SPNN has the following advantages compared with that of MLPs and KANs.

    ● Polynomial regression by the SPNN model has better interpretability than that of MLPs, due to the fact that high-order terms are generated by pi units instead of complex activation functions.

    ● Polynomial regression by the SPNN model has faster learning efficiency compared with that of KANs, owing to the application of a fixed activation function in polynomial form.

    ● The optimization and control of the parameters in the SPNN model are more convenient than that in MLPs and KANs. Besides, the forms of the SPNN model are more friendly to real applications.

    In an MLP neural network, information from different input signals is transmitted through weighted summation. For example, in a two-layer forward neural network (see Figure 1(a)), the hidden layer

    hi=σ1(nj=1αijxj+bi), (2.1)
    Figure 1.  Different structure units in neural networks: (a) sigma-sigma units in the MLP network; (b) sigma-pi units in the high-order network.

    and the output signal

    y=σ2(mi=1βihi+θ), (2.2)

    where σ1 is the activation function of the hidden layer, σ2 is the activation function of the output layer, α is the weight of the first layer network, β is the weight of the second layer network, and b and θ are the biases. The information transmission from xj to hi, and from hi to y, are all by weighted summation. So, this kind of structure unit in MLP neural networks can be called sigma-sigma units.

    To embed high-order coupled information into a neural network, high-order neural networks are designed. For instance, the sigma-pi neural network (SPNN) enables the embedding of product information into the network. In SPNN, the hidden layer

    gi=σ1(nj=1xαijj+bi), (2.3)

    and the output layer

    y=σ2(mi=1βigi+θ). (2.4)

    The information transmission from xj to gi is by weighted product, and from gi to y is by weighted summation, see Figure 1(b).

    As previously mentioned, the basic unit of SPNN is the power function of input signals. We aim to achieve multivariate polynomial regression using the explainable SPNN. In this model, if the activation functions, σ1 and σ2, are linear, the output signal

    y=mi=1βigi+θ=mi=1βi(nj=1xαijj+bi)+θ=mi=1βi(nj=1xαijj)+mi=1βibi+θ. (2.5)

    Cleary, Eq (2.5) is a multivariate polynomial function, where βi is the polynomial coefficient, αij is the polynomial exponent, and mi=1βibi+θ is the constant term. Thus, this explainable SPNN is essentially a type of Kolmogorov-Arnold network.

    To illustrate the regression process of a multivariate polynomial function more clearly, we present a simple regression problem as an example:

    y=f(x1,x2,,xn)= a1xb11+a2xb22++anxbnn+ni,j=1,ijcijxpiixqjj. (2.6)

    Here, {x1,x2,,xn} represents the state variables, and y is the target variable. Since Eq (2.6) lacks a constant term, bias parameters are not used in the designed network. A two-layer SPNN is employed to solve this regression problem. The number of neurons in the input layer equals n, while the hidden layer contains n+n(n1)=n2 neurons. The neural connections from the input layer to the hidden layer primarily capture the regression of power functions, such as xbii and xpiixqjj, corresponding to each term in the polynomial function. The network then performs a weighted summation of all terms from the hidden layer to the output layer.

    Additionally, to accelerate the parameter optimization, we convert the above sigma-pi neural network to the sigma-sigma neural network, and use the back-propagation algorithm to update the parameters. The input signals are first transformed using logarithmic functions,

    xiui={log(xi),xi>0;,xi=0;log(xi),xi<0. (2.7)

    Next, a sigma unit operates on the ui,

    vj=ni=1αjiui=ni=1log(|xi|αji). (2.8)

    Here αji represents the weight from the ith neuron in the input layer to the jth neuron in the hidden layer. A signed exponential function is then applied to vj,

    hj=sig(xi)exp(vj)=sig(xi)exp(ni=1log(|xi|αji))=ni=1xαjii. (2.9)

    Therefore, the conversion from the pi unit to the sigma unit can be achieved using logarithmic and exponential operators. The output signal, y, is then obtained by the sigma unit from the hidden layer to the output layer.

    y=mj=1βjhj=mj=1βjni=1xαjii=mj=1βjsig(xi)exp(ni=1αjilog(|xi|)). (2.10)

    Here, βj is the weight from the jth neuron in the hidden layer to the output signal. The model error is defined as

    Err=12(yˆy)2, (2.11)

    where y is the model output and ˆy is the real value. For parameter updates, the gradient descent method is employed,

    βj=βjηΔβj,αji=αjiηΔαji. (2.12)

    Here, η is the learn rate,

    Δβj= Errβj= Erryyβj= (yˆy)hj, (2.13)

    and

    Δαji= Errαji=Errvjvjαji= Erryyhjhjvjvjαji= (yˆy)βjexp(vj)sig(xi)log(|xi|). (2.14)

    The optimized parameters β and α are the corresponding multivariate polynomial regression coefficients and exponents.

    The pseudo code of multivariate polynomial regression by our designed neural network is presented in Algorithm 1 below.

    Algorithm 1 Multivariate polynomial regression by the explainable sigma-pi neural network
    Inputs: the corresponding multi-variables, x1,x2,,xn, the target variable, y, and the desired multi-variable polynomial form, for example, y=a1xb11+a2xb22++anxbnn+ni,j=1,ijcijxpiixqjj.
    Outputs: the regression coefficients and exponents, such as, ai,bi,cij,pi,qj.
    1: Divide the data into the train set, valid set, and test set.
    2: Construct the sigma-pi neural network based on the desired empirical formula.
    3: Variable conversion, uixi.
    4: Let the sigma unit work on ui, s.t., uivj.
    5: The signed exponentiation effect on vj, s.t., vjhj.
    6: Let the sigma unit work on hj, s.t., hjy, compute the regression error, Err=12(yˆy)2, set the error threshold as ε, and define the total epochs as N.
    7: while epoch<N do
    8:   Parameters update, βj=βjηΔβj,αji=αjiηΔαji.
    9:   if Err<ε then
    10:     break;
    11:     output the weight parameters, αjibi,pi,qj and βjai,cij.
    12:   else if Err>ε then
    13:     continue;
    14:   end if
    15: end while

    In this section, we present two examples to demonstrate the feasibility and effectiveness of the proposed model. Moreover, we illustrate the application of our method in analyzing multivariate correlated regression, specifically concerning the maximum stress on concrete pipes under traffic loads.

    y=2x21x2+3x31x2+4x21x22. (3.1)

    In this example, the model is used to determine the regression coefficients and exponents of Eq (3.1) based on generated data. The variable x1 is linearly sampled from the interval [1,3], x2 is randomly sampled from the interval [1,4], and y is calculated according to Eq (3.1). These generated data are briefly presented in Figure 2, where Figure 2(a) gives the time series of x1, x2, and y; Figure 2(b), (c) display scatter plots of y versus x1, and y versus x2, respectively.

    Figure 2.  Generated data for integer order polynomial regression: (a) the time series of x1, x2, and y; (b) the scatter plot of y versus x1; (c) the scatter plot of y versus x2.

    The number of neurons in the hidden layer equals 3, i.e., the number of terms of polynomial function Eq (3.1). Let u=log(x1),v=log(x2), and the signals u and v are fed into the sigma unit as inputs. It should be noted that the variables x1 and x2 in this example are restricted to positive values. We use the linear activation function in each architecture, thus, the neurons in the hidden layer satisfy

    h1= α11u+α12v=log(xα111)+log(xα122),h2= α21u+α22v=log(xα211)+log(xα222),h3= α31u+α32v=log(xα311)+log(xα322).

    So, by the exponential operation,

    eh1= elog(xα111)+log(xα122)=xα111xα122,eh2= elog(xα211)+log(xα222)=xα211xα222,eh3= elog(xα311)+log(xα322)=xα311xα322.

    The output signal is the weighted sum of eh1,eh2, and eh3 according to the sigma unit from the hidden layer to the output layer.

    y=β1eh1+β2eh2+β3eh3=β1xα111xα122+β2xα211xα222+β3xα311xα322.

    Here, βi represents the corresponding polynomial regression coefficient, and αij denotes the related polynomial regression exponent. Since we are addressing an integer-order polynomial regression problem, the parameters αij are constrained to integer values during the parameter iteration process. The training is conducted for 500 epochs, with a maximum learning rate of 0.01. Compared with other optimizers, such as BGD, SGD, and MBGD, the combination of the Adam optimizer with a one-cycle scheduler has faster convergence speed and better modeling performance, so we chose this optimization algorithm in the training process. The regression performance is evaluated using the mean squared error (MSE) and the coefficient of determination (R2),

    MSE=1nni=1(yi^yi)2,R2=1ni=1(yi^yi)2ni=1(yiˉy)2.

    Here, yi is the real value, ^yi is the regression value, and ˉy is the mean value of y. The modeling results are shown in Figure 3, the train loss and valid loss are depicted by the MSE, and the coefficient of determination R2=1.0.

    Figure 3.  The modeling results: (a) the train loss and valid loss depicted by MSE; (b) the plot of real values versus regression values, the coefficient of determination R2=1.0.

    The weight parameters of the proposed neural network are listed in Table 1. The parameter αij is related to the polynomial exponent, and parameter βi corresponds to the polynomial coefficient.

    Table 1.  The values of weight parameters of our designed SPNN.
    Parameters α11 α12 α21 α22 α31 α32 β1 β2 β3
    Values 2 1 3 1 2 2 2.0 3.0 4.0

     | Show Table
    DownLoad: CSV

    From the results above, it is evident that our designed SPNN effectively solves the integer-order multivariate polynomial regression problem. The coefficients and exponents of the integer-order multivariate polynomial function align with the weight parameters of the designed SPNN. In the next example, we use the SPNN to solve a fractional-order multivariate polynomial regression problem.

    y=2x2.21x1.52+3x3.41x1.02+4x2.81x2.02. (3.2)

    In this example, we use the designed SPNN to solve the fractional-order polynomial regression problem. The values of x1 and x2 are generated in the same manner as in Example 1, and the output signal is computed according to Eq (3.3). The dataset contains 3000 samples, and the training is conducted for 10,000 epochs. The Adam optimizer is employed combined with a one-cycle scheduler. The maximum learning rate is set to 0.009, while the optimizer's learning rate is initialized at 0.004. The corresponding time series are plotted in Figure 4(a), and the regression result is shown in Figure 4(b).

    Figure 4.  The fractional-order polynomial regression: (a) the generated data of the fractional-order polynomial function; (b) the plot of real values versus regression values.

    The values of mean squared error (MSE), coefficient of determination (R2), and corresponding weight parameters are listed in Table 2.

    Table 2.  The values of MSE, R2, and weight parameters.
    Parameters MSE R2 α11 α12 α21 α22 α31 α32 β1 β2 β3
    Values 6.2×105 1.0 2.21 1.54 2.81 2.01 3.39 1.00 2.05 3.87 3.07
    Approximation 2.2 1.5 2.8 2.0 3.4 1.0 2 4 3

     | Show Table
    DownLoad: CSV

    From the regression results, we observe small discrepancies between the actual polynomial coefficients (or exponents) and the weight parameters. However, the parameters closely align with the polynomial coefficients and exponents when considering their approximate values. Thus, we conclude that the designed SPNN can effectively solve fractional-order multivariate polynomial regression problems.

    In this part, we apply the designed SPNN to analyze the maximum stress of concrete sewage pipelines under the combined influence of traffic load, earth pressure, and groundwater level, with a consideration of 12 physical parameter variables. These variables encompass corrosion depth (Cd), corrosion width (Cw), corrosion length (Cl), void width (Vw), void length (Vl), burial depth (H), traffic load (P), pipe diameter (D), wall thickness (t), bedding modulus (Eb), backfill soil modulus (Es), and groundwater level (hw). The detailed parameter ranges for each variable are provided in Table 3.

    Table 3.  The parameter ranges of each physical variable.
    Physical variable Value Physical variable Value
    Minimum Maximum Minimum Maximum
    Cd (cm) 5 90 P (MPa) 0.5 1.5
    Cw (°) 0 180 D (mm) 300 1200
    Cl (m) 0 10 t (mm) 40 120
    Vw (°) 0 120 Eb (MPa) 6 580
    Vl (m) 0 3 Es (MPa) 5 65
    H (m) 0.5 3 hw/H 0.31 8.85

     | Show Table
    DownLoad: CSV

    These physical variables are randomly generated within the specified ranges, resulting in a dataset of 250 samples. The maximum stress signals are formulated by finite element simulation based on these datasets. Figure 5 illustrates the dependence of the maximum stress on each physical variable. As the maximum stress of concrete sewage pipelines is determined by the 12 physical variables, it is challenging to analyze the relationship between individual variables and stress through simple data fitting. As shown in Figure 5, the relationship between maximum stress and each variable is nonlinear and complex, indicating that this is a multivariate nonlinear regression problem. Existing software and algorithms are unable to efficiently and accurately address this challenge; however, our designed SPNN method is capable of handling it effectively.

    Figure 5.  The dependence of maximum stress signals (y) on corrosion depth (Cd), corrosion width (Cw), corrosion length (Cl), void width (Vw), void length (Vl), burial depth (H), traffic load (P), pipe diameter (D), wall thickness (t), bedding modulus (Eb), backfill soil modulus (Es), and groundwater level over burial depth (hw/H).

    According to the univariate empirical formula [7,9], we define the multivariate regression equation of the maximum stress as

    y= α1Cdβ1+α2Cwβ2+α3Clβ3+α4Vwβ4+α5Vlβ5+α6Hβ6+α7Pβ7+α8Dβ8+α9tβ9+α10Ebβ10+α11Esβ11+α12(hw/H)β12. (3.3)

    The data set is first normalized to eliminate the influence of dimensionality, and then randomly split into training, validation, and test sets, with respective ratios of 64%, 20%, and 16%. It makes almost no difference to divide the data set at the other ratios, for example, 70%, 15%, and 15%, because of the excellent model performance. The regression model is developed using the explainable SPNN. During the training process, a total of 1000 epochs are set, and the Adam optimizer with with a learning rate equal to 0.006 is employed together with a one-cycle scheduler with the maximum learn rate equal to 0.01. Figure 6(a) shows the training loss and valid loss depicted by MSE, and Figure 6(b) presents a comparison between the real and predicted maximum stress values.

    Figure 6.  The multivariate polynomial regression of the maximum stress: (a) the training loss and valid loss depicted by MSE; (b) the plot of real values versus regression values.

    The regression error MSE =3.36×105, and the coefficient of determination R2=0.9998. We save model parameters to two decimal places, and the regression equation is

    y= 0.35Cd0.51+0.24Cw0.32+0.05Cl0.62+0.25Vw0.06+0.28Vl0.38+0.15H0.19+0.92P1.20+0.72D1.16+0.46t1.13+0.37Eb0.240.04Es0.630.04(hw/H)0.43. (3.4)

    It should be noted that the values of coefficients and exponents in Eq (3.4) may exhibit minor fluctuations due to the broad feasible region of the solutions. However, the contributions of each physical variable to the maximum stress derived from Eq (3.4) can be well interpreted. It is evident that the corrosion depth, corrosion width, corrosion length, void width, void length, burial depth, traffic load, pipe diameter, wall thickness, and bedding modulus have positive effects on the maximum stress, while the backfill soil modulus and groundwater level exhibit negative effects. These findings align with existing empirical knowledge.

    Additionally, the SHAP approach [16,17,18] is utilized to analyze the impact of input features on the model's output. The SHAP value for each sample (x) corresponding to feature (fi) is computed using Eq (3.5).

    SHAPfi(x)=fif[|f|×(F|f|)]1×[Pf(x)Pffi(x)], (3.5)

    where fi represents the ith feature (i.e., the ith physical variable), F is the total number of features, and f is a feature subset that includes fi. |f| denotes the number of elements in subset f, and (F|f|) is the number of combinations of F items taken |f| at a time. Pf(x) is the predicted value when all the physical variables in f are selected, and Pffi(x) is the predicted value with all the variables in f except fi. The SHAP values for the samples related to each physical variable are shown in Figure 7.

    Figure 7.  The feature analysis of 12 physical variables on the maximum stress by SHAP values.

    In Figure 7, if the higher feature value corresponds to larger SHAP values, it indicates a positive effect of that feature on the model output. Conversely, if the higher feature value results in smaller SHAP values, it suggests a negative effect. Thus, we conclude that features P,D,t,Cd,Vl,Eb,Cw,Cl,H, and Vw have a positive effect on the maximum stress, while Es and hw/H have a negative effect. These conclusions align with the regression analysis results from Eq (3.4). Additionally, the importance ranking of the input features is P>D>t>Cd>Vl>Eb>Cw>Cl>H>Vw>Es>hw/H.

    According to the regression results, we can further analyze the evolution of the maximum stress of concrete sewage pipelines with respect to the univariate input. By selecting a sample data point and employing the control variable method, the maximum stress signal can be calculated using Eq (3.4), with only one variable being varied. Figure 8 shows the evolution of maximum stress with respect to each physical variable. These univariate evolutionary trends provide theoretical guidance for strategies aimed at repairing and improving the properties of concrete sewage pipelines.

    Figure 8.  The evolution of maximum stress with respect to each physical variable by the control variable method.

    In this manuscript, we propose an explainable sigma-pi neural network to address the multi-variable nonlinear polynomial regression problem. The coefficient and exponent parameters of polynomials are effectively represented by the corresponding weight parameters in the SPNN. To accelerate the regression process, the back-propagation algorithm is employed for parameter optimization. The examples reveal that our designed SPNN can efficiently and accurately solve both integer-order and fractional-order multivariate polynomial regression problems. In practical application, the SPNN provides high-precision fitting of the maximum stress in concrete sewage pipelines under the combined influence of 12 physical parameter variables. Furthermore, feature importance ranking and additional analyses on the relationship between maximum stress and these variables can be conducted based on this explainable machine learning model.

    Building upon the framework for solving multi-variable polynomial regression, future algorithms for addressing more complex nonlinear regression problems could be developed through higher-order neural networks, such as sigma-pi or sigma-pi-sigma networks. This study lays the theoretical foundation for developing generalized tools to solve multi-variable nonlinear regression problems.

    The authors declare that this manuscript is the authors' original work, and they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This work was supported by the China Postdoctoral Science Foundation Funded Project (Grant No. 2022M712902), the Natural Science Foundation of Henan (Grant No. 232300421345), and the Zhongyuan Youth Top Talent Plan (Zhongyuan Youth Postdoctoral Innovative Talents).

    The authors declare no conflict of interest with respect to the research, authorship, and/or publication of this article.



    [1] Clauset A, Larremore D, Sinatra R, (2017) Data-driven predictions in the science of science. Science 355: 477–480. https://doi.org/10.1126/science.aal4217 doi: 10.1126/science.aal4217
    [2] Subrahmanian VS, Kumar S, (2017) Predicting human behavior: The next frontiers. Science 355: 489. https://doi.org/10.1126/science.aam7032 doi: 10.1126/science.aam7032
    [3] Wang Z, Sun ZH, Yin H, Liu XH, Wang JL, Zhao HT, et al. (2022) Data-driven materials innovation and applications. Adv Mater 34: 2104113. https://doi.org/10.1002/adma.202104113 doi: 10.1002/adma.202104113
    [4] Chen X, Yan CC, Zhang XT, Zhang X, Dai F, Yin J, et al. (2016) Drug-target interaction prediction: Databases, web servers and computational models. Briefings Bioinf 17: 696–712. https://doi.org/10.1093/bib/bbv066 doi: 10.1093/bib/bbv066
    [5] Zhao Y, Yin J, Zhang L, Zhang Y, Chen X, (2024) Drug-drug interaction prediction: Databases, web servers and computational models. Briefings Bioinf 25: bbad445. https://doi.org/10.1093/bib/bbad445 doi: 10.1093/bib/bbad445
    [6] Guo XX, Sun YT, Ren JL, (2020) Low dimensional mid-term chaotic time series prediction by delay parameterized method. Inf Sci 516: 1–19. https://doi.org/10.1016/j.ins.2019.12.021 doi: 10.1016/j.ins.2019.12.021
    [7] Li B, Guo XX, Fang HY, Ren JL, Yang KJ, Wang F, et al. (2020) Prediction equation for maximum stress of concrete drainage pipelines subjected to various damages and complex service conditions. Constr Build Mater 264: 120238. https://doi.org/10.1016/j.conbuildmat.2020.120238 doi: 10.1016/j.conbuildmat.2020.120238
    [8] Guo XX, Xiong NN, Wang HY, Ren JL, (2022) Design and analysis of a prediction system about influenza-like illness from the latent temporal and spatial information. IEEE Trans Syst Man Cybern Syst 52: 66–77. https://doi.org/10.1109/TSMC.2020.3048946 doi: 10.1109/TSMC.2020.3048946
    [9] Li B, Fang HY, Yang KJ, Zhang XJ, Du XM, Wang NN, et al. (2022) Impact of erosion voids and internal corrosion on concrete pipes under traffic loads. Tunnelling Underground Space Technol 130: 104761. https://doi.org/10.1016/j.tust.2022.104761 doi: 10.1016/j.tust.2022.104761
    [10] Rudin W, (1976) Principles of Mathematical Analysis, McGraw-Hill Companies.
    [11] Guo XX, Han WM, Ren JL, (2023) Design of a prediction system based on the dynamical feed-forward neural network. Sci China Inf Sci 66: 112102. https://doi.org/10.1007/s11432-020-3402-9 doi: 10.1007/s11432-020-3402-9
    [12] Yu LP, Guo XX, Wang G, Sun BA, Han DX, Chen C, et al. (2022) Extracting governing system for the plastic deformation of metallic glasses using machine learning. Sci China Phys Mech Astron 65: 264611. https://doi.org/10.1007/s11433-021-1840-9 doi: 10.1007/s11433-021-1840-9
    [13] Liu Z, Wang Y, Vaidya S, Ruehle F, Halverson J, Soljacic M, et al. (2024) KAN: Kolmogorov-Arnold networks. preprint, arXiv: 2404.19756. https://doi.org/10.48550/arXiv.2404.19756
    [14] Gurney KN, (1992) Training nets of hardware realizable sigma-pi units. Neural Networks 5: 289–303. https://doi.org/10.1016/S0893-6080(05)80027-9 doi: 10.1016/S0893-6080(05)80027-9
    [15] Penny WD, Stonham TJ, (1995) Generalization in multi-layer networks of sigma-pi units IEEE Trans Neural Networks 6: 506–508. https://doi.org/10.1109/72.363490
    [16] Lundberg SM, Lee SI, (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30: 4765–4774.
    [17] Wu LL, Wei GY, Wang G, Wang HY, Ren JL, (2022) Creating win-wins from strength-ductility trade-off in multi-principal element alloys by machine learning. Mater Today Commun 32: 104010. https://doi.org/10.1016/j.mtcomm.2022.104010 doi: 10.1016/j.mtcomm.2022.104010
    [18] Xiao L, Wang G, Long WM, Liaw PK, Ren JL, (2024) Fatigue life prediction of the FCC-based multi-principal element alloys via domain knowledge-based machine learning. Eng Fract Mech 296: 109860. https://doi.org/10.1016/j.engfracmech.2024.109860 doi: 10.1016/j.engfracmech.2024.109860
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(859) PDF downloads(34) Cited by(0)

Figures and Tables

Figures(8)  /  Tables(3)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog