Periodic solutions to symmetric Newtonian systems in neighborhoods of orbits of equilibria

Anna Gołȩbiewska; Marta Kowalczyk; Sławomir Rybicki; Piotr Stefaniak; Anna Gołȩbiewska; Marta Kowalczyk; Sławomir Rybicki; Piotr Stefaniak

doi:10.3934/era.2022085

Electronic Research Archive

2022, Volume 30, Issue 5: 1691-1707. doi: 10.3934/era.2022085

Previous Article Next Article

Research article Special Issues

Periodic solutions to symmetric Newtonian systems in neighborhoods of orbits of equilibria

Faculty of Mathematics and Computer Science, Nicolaus Copernicus University in Toruń, PL-87-100 Toruń, ul. Chopina 12/18, Poland

Received: 13 August 2021 Revised: 15 January 2022 Accepted: 15 January 2022 Published: 25 March 2022

The aim of this paper is to prove the existence of periodic solutions to symmetric Newtonian systems in any neighborhood of an isolated orbit of equilibria. Applying equivariant bifurcation techniques we obtain a generalization of the classical Lyapunov center theorem to the case of symmetric potentials with orbits of non-isolated critical points. Our tool is an equivariant version of the Conley index. To compare the indices we compute cohomological dimensions of some orbit spaces.

Keywords:

Citation: Anna Gołȩbiewska, Marta Kowalczyk, Sławomir Rybicki, Piotr Stefaniak. Periodic solutions to symmetric Newtonian systems in neighborhoods of orbits of equilibria[J]. Electronic Research Archive, 2022, 30(5): 1691-1707. doi: 10.3934/era.2022085

Related Papers:

[1]	Sandra Vaz, Delfim F. M. Torres . A dynamically-consistent nonstandard finite difference scheme for the SICA model. Mathematical Biosciences and Engineering, 2021, 18(4): 4552-4571. doi: 10.3934/mbe.2021231
[2]	A. M. Elaiw, Raghad S. Alsulami, A. D. Hobiny . Global dynamics of IAV/SARS-CoV-2 coinfection model with eclipse phase and antibody immunity. Mathematical Biosciences and Engineering, 2023, 20(2): 3873-3917. doi: 10.3934/mbe.2023182
[3]	Wenhan Guo, Yixin Xie, Alan E Lopez-Hernandez, Shengjie Sun, Lin Li . Electrostatic features for nucleocapsid proteins of SARS-CoV and SARS-CoV-2. Mathematical Biosciences and Engineering, 2021, 18(3): 2372-2383. doi: 10.3934/mbe.2021120
[4]	Maghnia Hamou Maamar, Matthias Ehrhardt, Louiza Tabharit . A nonstandard finite difference scheme for a time-fractional model of Zika virus transmission. Mathematical Biosciences and Engineering, 2024, 21(1): 924-962. doi: 10.3934/mbe.2024039
[5]	Anichur Rahman, Muaz Rahman, Dipanjali Kundu, Md Razaul Karim, Shahab S. Band, Mehdi Sookhak . Study on IoT for SARS-CoV-2 with healthcare: present and future perspective. Mathematical Biosciences and Engineering, 2021, 18(6): 9697-9726. doi: 10.3934/mbe.2021475
[6]	Sarafa A. Iyaniwura, Rabiu Musa, Jude D. Kong . A generalized distributed delay model of COVID-19: An endemic model with immunity waning. Mathematical Biosciences and Engineering, 2023, 20(3): 5379-5412. doi: 10.3934/mbe.2023249
[7]	Ahmed Alshehri, Saif Ullah . A numerical study of COVID-19 epidemic model with vaccination and diffusion. Mathematical Biosciences and Engineering, 2023, 20(3): 4643-4672. doi: 10.3934/mbe.2023215
[8]	Matthew Hayden, Bryce Morrow, Wesley Yang, Jin Wang . Quantifying the role of airborne transmission in the spread of COVID-19. Mathematical Biosciences and Engineering, 2023, 20(1): 587-612. doi: 10.3934/mbe.2023027
[9]	Fang Wang, Lianying Cao, Xiaoji Song . Mathematical modeling of mutated COVID-19 transmission with quarantine, isolation and vaccination. Mathematical Biosciences and Engineering, 2022, 19(8): 8035-8056. doi: 10.3934/mbe.2022376
[10]	Rahat Zarin, Usa Wannasingha Humphries, Amir Khan, Aeshah A. Raezah . Computational modeling of fractional COVID-19 model by Haar wavelet collocation Methods with real data. Mathematical Biosciences and Engineering, 2023, 20(6): 11281-11312. doi: 10.3934/mbe.2023500

Abstract

1. Introduction

Variable importance and model selection are nuanced concepts that are relevant in statistics, data science, and many other areas of scientific literature (see Kruskal et al. ^[1]). Perhaps the simplest example of a metric for variable importance in regression may be found in introductory textbooks (see Achen ^[2]). When all variables have been standardized, the magnitude of the regression coefficients are considered measures of importance of the associated variables. A slew of variable importance measures have been developed over the years, including t-statistics and stepwise elimination of variables on the basis of statistical significance or measures like AIC. Most of these have been superseded by the LASSO ^[3] and variations, including the elastic net ^[4].

A different and axiomatic approach was taken by Pratt ^[5]. Starting with exchangeable, standardized predictor variables, Pratt showed that the importance of the $j^{th}$ predictor variable may be defined as:

$\begin{equation} VarImp_j = \hat{\beta}_j \times r_{y, x_j} \end{equation}$

(1.1)

where $\hat{\beta}_j$ is the regression coefficient for the $j^{th}$ predictor variable and $r_{y, x_j}$ is the marginal (Pearson) correlation between the response ( $y$ ) and the $j^{th}$ predictor variable, $x_j$ .

In regression, the approach of using t-statistics for assessing variable importance may be shown to be equivalent to removing a variable and looking at the difference in mean squared errors for the models with and without the variable of interest. Complex machine learning methods, such as random forests, do not employ regression variable diagnostics such as estimated coefficients or t-statistics. Leo Breiman ^[6] introduced a variation of this idea for random forests in which the so-called out-of-bag (OOB) values on a variable of interest are permuted-and-predicted (PaP) so that change in accuracy can be observed and aggregated over all observations. This approach and related permutation methods have become a standard method in machine learning over the last 20 years; but see Strobl et al. ^[7] and Bladen ^[8] for a discussion of the impact of collinearity on permutation variable importance. Hooker et al. ^[9] pointed out that the variable permutation of Breiman's ^[6] original algorithm leads to a form of potentially problematic extrapolation and suggested that re-learning additional models is required to handle this problem. They pointed to the work done by Lei et al. ^[10] involving the technique of leave-one-covariate-out (LOCO) of the dataset and re-learning the model of interest. Barber et al. ^[11] and Candès et al. ^[12] chose to handle this via a technique they call knockoffs, which involves switching original variables for random replacements that are sampled conditionally on the remaining variables. Each of the LOCO and knockoff techniques involves fitting a new model after removing, permuting, or otherwise altering a training variable and comparing the new model to the original untainted model.

Variable importance assessments for random forests have been explored and furthered in various ways. In their work, Hooker et al. ^[9], expanded upon the LOCO technique and utilized a similar concept for work with random forests. Ye et al. ^[13] created the SOIL technique designed for sparse linear modeling and high-dimensional regression and showed the proficiency it has in model selection for random forests. Strobl et al. ^[7] attempted to navigate feature importance issues caused by collinearity via a conditional variable importance algorithm. Many researchers have developed plots relating changes in the predicted response to changes in a particular variable while holding other variables constant. Apley et al. ^[14] contibuted with the concept and visualization known as accumulated local effects (ALE) plots. Goldstein et al. ^[15] and Greenwell et al. ^[16] developed individual conditional expectation (ICE) plots and averaged partial dependence plots (PDPs), respectively. PDPs, in particular, have been used to assess the importance of a variable by taking the standard deviation of the response predictions from the PDP for the desired variable ^[16].

In this paper, we highlight how some variations of the PaP and LOCO methods for assessing variable importance in linear modeling and machine learning relate to regression metrics. By doing so, we hope to illustrate which variable importance methods should be employed to best approximate specific regression metrics of interest.

In the remainder of this section, we introduce, define, and provide notations for standard regression metrics and several machine learning variable importance computations. In Section 2, we outline our simulated data for comparing these importance values. In Section 3, we show comparative plots of our importance metrics and comment on observed results within them. Finally, in Section 4, we offer our conclusions about the different variable importance metrics we analyze.

1.1. Regression metrics

In ordinary regression, there are three different metrics provided in standard summary tables: estimated coefficients, t-statistics, and p-values. If variables have been standardized, then higher magnitudes for coefficients are often interpreted to imply greater importance ^[2]. The same should be true for t-statistics, which are robust to standardization. While p-values communicate similar information to t-statistics, we recognize that p-values are rarely optimal for interpreting variable importance because they have an inverse and nonlinear relationship with t-statistics.

If variables are orthogonal, then t-statistics are proportional to the estimated coefficients. Let $\hat{\beta}_j$ be the estimate for the $j^{th}$ regression coefficient. These two metrics have the following relationship for the $j^{th}$ variable:

$\begin{equation} t_{j} = \frac{\hat{\beta}_j}{SE(\hat{\beta}_j)} \propto \hat{\beta}_j \end{equation}$

(1.2)

if the predictor variables have been standardized so that the standard error $SE(\hat{\beta}_j)$ is irrespective of which variable we are considering. However, these relationship conditions rarely hold true, which suggests a potential disconnect between the estimated coefficients and the t-statistics. A common issue occurs when variables are collinear. The standard errors of their estimated coefficients are then larger, which diminishes their t-statistics. Thus, the effect or relationship between a predictor and the response remains the same, but we are less certain of this relationship.

1.2. Permute-and-predict (PaP) metrics

In more complex machine learning algorithms, a technique for assessing variable importance was introduced by Breiman ^[6]. Commonly called the permute-and-predict (PaP) method, it involves randomly scrambling the values of a chosen variable and accumulating predictions for the newly altered data. This method was originally conceived for random forests and was performed on out-of-bag (OOB) data, a type of validation dataset for individual trees. We will use the OOB PaP technique when assessing random forest importance metrics, but we will also perform the procedure manually on a separate validation set to compare to regression metrics and the OOB PaP.

Our research is motivated by earlier work we performed assessing how random forest variable importance metrics depend on collinearity and $\texttt{mtry}$ , a tunable hyper-parameter of the number of randomly selected variables to consider for a given split for a tree in the forest ^[8]. In that work, we derived a heuristic suggesting a relationship between the regression t-statistics and random forest variable importance metrics. This heuristic submits that $\text{importance}_j \propto t_{j}^{2}$ , which implies that a square root transformation is helpful for relating random forest importances back to regression metrics. In this paper, we will explore the relationship between regression metrics and PaP importances. We will also look at the relationships these metrics have with model refitting (LOCO importances) and with the true equation coefficients.

If the predictions using the raw data and the permuted data yield similar accuracies or errors, then that variable is not particularly important. Alternatively, if the original predictions clearly outperform the permutation predictions, then the variable is important.

Mathematically, our permutation variable importance may be expressed using syntax where the index indicates the validation data that we generate predictions for. Thus,

$\begin{equation} \text{PaP}_j = \sqrt{\frac{\text{vMSE}_{\sim j} - \text{MSE}}{\text{MSE}}}, \end{equation}$

(1.3)

where $\text{MSE}$ is the validation mean squared error from the original model and $\text{vMSE}_{\sim j}$ is the MSE generated from that same model when the validation values of the $j^{th}$ predictor have been permuted.

Alternatively, the variable importance can be assessed with a drop-and-predict (DaP) method. In our work, we accomplish this dropout by first rescaling each variable so that the center of the distribution is 0 and then setting all values of a given variable to the central value of 0. The structure is fundamentally equivalent to Eq (1.3), with one simple difference:

$\begin{equation} \text{DaP}_j = \sqrt{\frac{\text{vMSE}_{-j} - \text{MSE}}{\text{MSE}}}, \end{equation}$

(1.4)

where $\text{vMSE}_{-j}$ is the MSE generated from the original model when the validation values of the $j^{th}$ predictor have all been set to the central value of 0.

These importance metrics are therefore equivalent if $\text{vMSE}_{\sim j} = \text{vMSE}_{-j}$ .

1.3. Leave-one-covariate-out (LOCO) metrics

Another method for assessing variable importance is called re-learning (see Hooker et al. ^[9]) or LOCO (see Lei et al. ^[10]). When performing this technique, we randomly permute the values of a chosen variable in the training data rather than the validation data. We then build a new regression model using the training data containing the permuted feature. The difference of MSEs between the permuted data model and the original model is again computed. Just as before, if the original predictions clearly outperform the permutation predictions, then the variable is important.

The LOCO variable importance will be expressed in similar fashion to the PaP and DaP methods in Section 1.2. However, in this syntax, the index provides information about the model used to extract prediction and the training data that generated that model. Thus,

$\begin{equation} \text{perm_LOCO}_j = \sqrt{\frac{\text{tMSE}_{\sim j} - \text{MSE}}{\text{MSE}}}, \end{equation}$

(1.5)

where $\text{MSE}$ is the validation mean squared error from the original model, just as in Section 1.2. The only difference between this and Eq (1.3) is the substitution of $\text{vMSE}_{\sim j}$ with $\text{tMSE}_{\sim j}$ . Here, $\text{tMSE}_{\sim j}$ is the MSE generated with the original validation data but from a new regression model where training values of the $j^{th}$ predictor have been permuted.

Just like DaP techniques, LOCO variable importances can be assessed by rescaling each variable so the distribution center is 0 and then setting the values of the variable to 0. The structure is identical to Eq (1.5), with the exception of substituting notation for dropping a variable rather than permuting it:

$\begin{equation} \text{drop_LOCO}_j = \sqrt{\frac{\text{tMSE}_{-j} - \text{MSE}}{\text{MSE}}}, \end{equation}$

(1.6)

where $\text{tMSE}_{-j}$ is the MSE generated with the original validation data but from a new regression model where training values of the $j^{th}$ predictor have all been set to the central value of 0.

Similar to Section 1.2, these importance metrics are equivalent if $\text{tMSE}_{\sim j} = \text{tMSE}_{-j}$ .

2. Methods

Using the standard regression metrics and the machine learning importance definitions listed above, we designed several simulations to assess the relationships among these metrics. The general architecture is to build both a training and validation dataset of identical sizes and structures with established coefficients and variable relations. In our subsequent methods and analyzes, we use only the training data for model creation, while we use the validation data to assess variable importance. We then collect regression metrics and compute PaP and LOCO metrics. Finally, we plot pairwise scatterplots and correlations amongst these importance values to assess the relationship between them. Higher correlations between metrics provide an empirical foundation for showing proportionality and agreement between variable importance assessments. We utilize the definitions in Sections 1.2 and 1.3 when computing permutation and dropout variable importance metrics ^[8].

2.1. Orthogonal data

For our first simulation, we let six predictor variables be independent and identically distributed (iid) from a $\mathcal{N}(0, 1)$ distribution. Here, iid implies that all of the features are orthogonal to each other. We generated 1000 training observations and 1000 validation observations for each of these variables and then created the response variable values using the following equation:

$\begin{equation} y = 5v_{1} + 4v_{2} + 3v_{3} + 2v_4 + 1v_{5} + 0v_{6} + \epsilon. \end{equation}$

(2.1)

where initially $\epsilon \overset{\mathrm{iid}}{\sim} \mathcal{N}(0, 1)$ and then in a second iteration $\epsilon \overset{\mathrm{iid}}{\sim} \mathcal{N}(0, 10)$ .

2.2. Collinear data

For our next simulation, we utilize a data structure from Strobl et al. ^[7] that allows for high collinearity among some of the predictor variables. In this architecture, we use a linear equation with twelve predictor variables and 1000 observations in the training and validation sets.

The regression coefficients for the predictor variables are chosen as follows:

$\begin{equation} y = 5Cor_{1} + 5Cor_{2} + 2Cor_{3} + 0Cor_{4} + 5v_{5} + 5v_{6} + 2v_{7} + 0v_{8-12} + \epsilon, \end{equation}$

(2.2)

where initially $\epsilon \overset{\mathrm{iid}}{\sim} \mathcal{N}(0, 1)$ and then in a second iteration $\epsilon \overset{\mathrm{iid}}{\sim} \mathcal{N}(0, 10)$ .

The predictor variables are sampled from a multivariate normal distribution:

$Cor_1, ..., Cor_4, v_5, ..., v_{12} \sim \mathcal{N}(0, \Sigma).$

The covariance structure $\Sigma$ is chosen such that all variables have unit variance $\sigma_{i, i} = 1$ and the first four predictor variables are block-correlated with $\sigma_{i, j} = 0.95$ for $i \neq j \leq 4$ , while the rest are orthogonal with $\sigma_{i, j} = 0$ .

2.3. Nonlinear squared data

We expand the previous simulation to create nonlinear datasets. We again use 12 predictor variables and 1000 observations. We initially sample the predictors from the exact multivariate normal distribution described in Section 2.2 to impose correlations. We then take the cumulative distribution function of each predictor to convert it to a standard uniform distribution, $\mathcal{U}(0, 1)$ . We multiply these values by 2 and subtract 1 from them to provide a new distribution of $\mathcal{U}(-1, 1)$ for each predictor.

The regression equation matches the one found in Section 2.2, except we square each variable as follows:

$\begin{equation} y = 5Cor_{1}^2 + 5Cor_{2}^2 + 2Cor_{3}^2 + 0Cor_{4}^2 + 5v_{5}^2 + 5v_{6}^2 + 2v_{7}^2 + 0v_{8-12}^2 + \epsilon, \end{equation}$

(2.3)

where $\epsilon \overset{\mathrm{iid}}{\sim} \mathcal{N}(0, 0.1)$ .

2.4. Nonlinear cosine data

Leaning heavily on the architecture in Section 2.3, we now create a response using cosine relations with the predictors. We sample the predictors from the same technique described in Section 2.3 to impose correlations and create standard uniform distributions, $\mathcal{U}(0, 1)$ . Instead of converting this to $\mathcal{U}(-1, 1)$ , we convert it to $\mathcal{U}(-4\pi, 4\pi)$ by multiplying the values by $8\pi$ and subtracting $4\pi$ from them.

The regression equation matches that found in Section 2.2, except we remove the error term and take the cosine of each variable as follows:

$\begin{equation} \begin{split} y = &\ 5cos(Cor_{1}) + 5cos(Cor_{2}) + 2cos(Cor_{3}) + 0cos(Cor_{4})\ +\\ &\ 5cos(v_{5}) + 5cos(v_6) + 2cos(v_{7}) + 0cos(v_{8-12}). \end{split} \end{equation}$

(2.4)

2.5. Interaction data

Finally, we generate interaction datasets of 1000 observations for 8 predictor variables and an error ( $\epsilon$ ) term. Each of the predictors is orthogonal and identically distributed from a $\mathcal{N}(0, 1)$ distribution. We then create the response with this equation:

$\begin{equation} y = 4v_{1}v_{2} + 2v_{3}v_{4} + 1v_{5}v_{6} + 0v_{7}v_{8} + \epsilon, \end{equation}$

(2.5)

where $\epsilon \overset{\mathrm{iid}}{\sim} \mathcal{N}(0, 0.1)$ .

2.6. Regression modeling and variable importance

For each linear equation, we fit a linear regression model on the training data. The regression model summary provides estimates for the true equation coefficients and t-statistics. We collect and compare these to several machine learning assessments of importance and to the true coefficients.

For all datasets, except the orthogonal data, we fit a random forest regression model on the training data. In this case, we compare machine learning variable importances to the true coefficients, regression metrics (if the equation is linear), and the default random forest variable importance technique of OOB PaP discussed in Section 1.2.

The machine learning assessments that we will collect are the PaP, DaP, perm_LOCO, and drop_LOCO metrics discussed in Sections 1.2 and 1.3. For each metric, large values suggest higher variable importance, while small values indicate low variable importance.

3. Results

We now provide pairwise scatterplots and correlations between each of the variable importance metrics calculated for our simulated datasets. We also show results relating to random forest models and compare their default importance metric to other PaP metrics as functions of the dominant hyper-parameter $\texttt{mtry}$ .

3.1. Orthogonal comparison

We begin with our dataset where variables are all orthogonal. The importance values and their pairwise plots are provided in . For this analysis, the regression $R^2 \approx 0.98$ .

Figure 1. Comparative plots of importance metrics for the linear regression model. The equation for the response is

$y = 5v_{1} + 4v_{2} + 3v_{3} + 2v_{4} + v_{5} + 0v_6+ \epsilon$ , where

$\epsilon \sim \mathcal{N}(0, 1)$ . See also Eq (2.1).

DownLoad: Full-Size Img PowerPoint

From Figure 1, we observe the following when variables are orthogonal:

● All variable importance metrics have a near perfect correlation with an average of 1.00.

● While this example is trivial, these results do not deviate from expectations and illustrate agreement between all importance metrics for orthogonal predictors such as principal components.

We will now use an identical regression structure but increase the standard deviation of the noise so that $\epsilon \sim \mathcal{N}(0, 10)$ . This yields a massive drop in the regression $R^2 \approx 0.35$ . The plots of these results are in Figure 2.

Figure 2. Comparative plots of importance metrics for the linear regression model. The equation for the response is

$y = 5v_{1} + 4v_{2} + 3v_{3} + 2v_{4} + v_{5} + 0v_6+ \epsilon$ , where

$\epsilon \sim \mathcal{N}(0, 10)$ . See also Eq (2.1).

DownLoad: Full-Size Img PowerPoint

From Figure 2, we observe the following when increasing the error for the response variable:

● All variable importance metrics have a strong association with an average correlation of 0.98.

● These results illustrate agreement between all importance metrics for orthogonal predictors.

3.2. Collinearity comparison

We move to the dataset where some variables possess high collinearity. The pairwise plots of importance assessments for a linear regression model are provided in . For this analysis, $\epsilon \sim \mathcal{N}(0, 1)$ and the regression $R^2 \approx 0.995$ .

Figure 3. Comparative plots of importance metrics taken from or performed on the linear regression model. Plot symbols denote the status of a given variable: whether it was part of the correlated or orthogonal group of features. The equation for the response is

$y = 5Cor_{1} + 5Cor_{2} + 2Cor_{3} + 0Cor_{4} + 5v_{5} + 5v_{6} + 2v_{7} + 0v_{8} + \epsilon$ , where

$\epsilon \sim \mathcal{N}(0, 1)$ . See also Eq (2.2).

DownLoad: Full-Size Img PowerPoint

From Figure 3, we observe the following when variables are highly collinear:

● Coefficients have perfect correlation with estimates but only moderate correlation with t-statistics.

● There are two blocks of high similarity metrics.

● The first block involving the true coefficients, the regression estimates, and PaP techniques has an average correlation of 1.00.

● The second block, involving t-statistics and LOCO techniques, has an average correlation of 1.00.

● In both blocks, the respective permute and drop methods have a perfect correlation of 1.00.

We now analyze this same dataset utilizing a random forest model. The pairwise plots of importance are provided in Figure 4. We keep the linear regression estimates and t-statistics from Figure 3, but now we perform the PaP and LOCO methods on the random forest model.

Figure 4. Comparative plots of importance metrics: regression model estimates and t-statistics and PaP and LOCO methods performed on a random forest model. Plot symbols denote the status of a given variable: whether it was part of the correlated or orthogonal group of features. The equation for the response is

$y = 5Cor_{1} + 5Cor_{2} + 2Cor_{3} + 0Cor_{4} + 5v_{5} + 5v_{6} + 2v_{7} + 0v_{8} + \epsilon$ , where

$\epsilon \sim \mathcal{N}(0, 1)$ . See also Eq (2.2).

DownLoad: Full-Size Img PowerPoint

From Figure 4, we observe the following for high-collinearity data:

● The same two blocks of high-similarity metrics found in Figure 3 are observed here.

● The first block, the true coefficient and PaP techniques, has an average correlation of 0.99.

● The second block involving the regression t-statistics, the OOB PaP, and both LOCO techniques has an average correlation of 0.99.

● In both blocks, the permute and drop methods have a near perfect correlation of 1.00.

● The default importance method for random forests, OOB PaP, aligns with the t-statistics, while the other PaP methods align with the coefficients.

We now use an identical regression structure, but increase the error so $\epsilon \sim \mathcal{N}(0, 10)$ . This yields a large drop in the regression $R^2 \approx 0.67$ . These results are plotted in Figure 5.

Figure 5. Comparative plots of importance metrics taken from or performed on the linear regression model. Plot symbols denote the status of a given variable: whether it was part of the correlated or orthogonal group of features. The equation for the response is

$y = 5Cor_{1} + 5Cor_{2} + 2Cor_{3} + 0Cor_{4} + 5v_{5} + 5v_{6} + 2v_{7} + 0v_{8} + \epsilon$ , where

$\epsilon \sim \mathcal{N}(0, 10)$ . See also Eq (2.2).

DownLoad: Full-Size Img PowerPoint

From Figure 5, we observe the following when increasing the error for the response variable:

● High and low correlation patterns align with those identified in Figure 3.

● The first block involving the true coefficients, the regression estimates, and both PaP techniques has an average correlation of 0.98.

● The second block, involving t-statistics and LOCO techniques, has an average correlation of 0.98.

● In each block, the permutation and dropout methods have nearly perfect correlations $\approx$ 0.99.

We now analyze this same dataset utilizing a random forest model. The pairwise plots of importance metrics are provided in Figure 6. We keep the linear regression estimates and t-statistics from Figure 5, but now we perform the PaP and LOCO methods on the random forest model.

Figure 6. Comparative plots of importance metrics: regression model estimates and t-statistics and PaP and LOCO methods performed on a random forest model. Plot symbols denote the status of a given variable: whether it was part of the correlated or orthogonal group of features. The equation for the response is

$y = 5Cor_{1} + 5Cor_{2} + 2Cor_{3} + 0Cor_{4} + 5v_{5} + 5v_{6} + 2v_{7} + 0v_{8} + \epsilon$ , where

$\epsilon \sim \mathcal{N}(0, 10)$ . See also Eq (2.2).

DownLoad: Full-Size Img PowerPoint

From Figure 6, we observe the following for the high-collinearity data:

● The same blocks of high-similarity metrics are observed again.

● The first block involving the true coefficients, the regression estimates, and both PaP techniques has an average correlation of 0.95.

● The second block involving the t-statistics, the OOB PaP, and both LOCO techniques has an average correlation of 0.89, but it moves up to 0.98 when OOB PaP is removed.

● In both blocks, the respective permutation and dropout methods have a near perfect correlation of 0.98.

● The default importance method for random forests, OOB PaP, is the least stable metric in terms of alignment with other metrics.

3.3. Collinearity results

We now offer results exclusively focused on our random forest models. In Section 1.2, we mention the hyper-parameter $\texttt{mtry}$ and prior research assessing how variable importance fluctuates based on $\texttt{mtry}$ ^[8]. Moving forward, we discuss plots that show how a few importance metrics change across $\texttt{mtry}$ . These metrics include the default OOB PaP, the PaP and DaP metrics (Section 1.2), and a variation of PaP done on the training data instead of our validation data.

We start with our simulation from Section 2.2, with $\epsilon \sim \mathcal{N}(0, 1)$ . In , we show importance metrics for a default random forest, where $\texttt{mtry}$ = 4. Here, we utilize a Monte Carlo simulation of 20 replicates to build random forests for $\texttt{mtry}$ = 1, 2, ..., 12.

From Figure 7, we observe the following:

Figure 7. Panel of plots comparing four different variable importance metrics across

$\texttt{mtry}$ . The equation for the response is

$y = 5Cor_{1} + 5Cor_{2} + 2Cor_{3} + 0Cor_{4} + 5v_{5} + 5v_{6} + 2v_{7} + 0v_{8} + \epsilon$ , where

$\epsilon \sim \mathcal{N}(0, 1)$ . See also Eq (2.2).

DownLoad: Full-Size Img PowerPoint

● The OOB plot experiences a distinct shift or bias between the variables possessing coefficients of 5, for all $\texttt{mtry}$ .

● The Train PaP experiences a distinct bias such that the importance of a pure noise variable, V8, is well above 0 for all $\texttt{mtry}$ , especially for lower values of $\texttt{mtry}$ .

● As $\texttt{mtry}$ increases, the plots generally trend toward importances that are more proportional with the true coefficients.

● The Validation DaP spread is a bit smaller than the Validation PaP, suggesting slightly more variance in the PaP technique.

● The Validation DaP and PaP plots appear superior to the others due to their lack of bias and shifts in the importance values.

3.4. Squared results

We repeat this assessment for our nonlinear datasets, beginning with the simulation where each feature has a quadratic relation with the response. The simulation is described in Section 2.3. We again utilize a Monte Carlo simulation of 20 replicates to build random forests for all values of $\texttt{mtry}$ .

From Figure 8, we observe the following:

Figure 8. Panel of plots comparing four different variable importance metrics across

$\texttt{mtry}$ . The equation for the response is

$y = 5Cor_{1}^2 + 5Cor_{2}^2 + 2Cor_{3}^2 + 0Cor_{4}^2 + 5v_{5}^2 + 5v_{6}^2 + 2v_{7}^2 + 0v_{8-12}^2 + \epsilon$ , where

$\epsilon \sim \mathcal{N}(0, 0.1)$ . See also Eq (2.3).

DownLoad: Full-Size Img PowerPoint

● Each plot experiences a shift between the variables possessing coefficients of 5, for all $\texttt{mtry}$ .

● The OOB metrics contain a much stronger shift between the variables possessing coefficients of 5 than the other importance metrics.

● The Train PaP again experiences a distinct bias such that the pure noise variable, V8, is well above 0 for all $\texttt{mtry}$ .

● The plots again trend toward importances that are more proportional with the true coefficients as $\texttt{mtry}$ increases.

● The Validation DaP spread is again slightly smaller than the Validation PaP, suggesting slightly more variance in the PaP technique.

● The Validation DaP and PaP plots again appear superior to the others due to their reduced bias and shifts in the importance values.

3.5. Cosine results

We then expand to the structure where each feature has a cosine relation with the response, as described in Section 2.4.

From Figure 9, we observe the following:

Figure 9. Panel of plots comparing four different variable importance metrics across

$\texttt{mtry}$ . The equation for the response is

$y = 5cos(Cor_{1}) + 5cos(Cor_{2}) + 2cos(Cor_{3}) + 0cos(Cor_{4}) + 5cos(v_{5})$ +

$5cos(v_6) + 2cos(v_{7}) + 0cos(v_{8-12})$ . See also Eq (2.4).

DownLoad: Full-Size Img PowerPoint

● All plots experience a shift between the correlated variables and the orthogonal variables across $\texttt{mtry}$ .

● The OOB metrics contain a much stronger shift between the correlated and orthogonal variables than the other importance metrics.

● The Train PaP experiences an enormous positive bias in the importance of V8.

● Each of the four importance metrics has at least a slight bias in the importance of V8, but the Validation DaP consistently has the lowest bias.

● The Validation DaP metric appears superior, especially when compared with the OOB and Train PaP.

3.6. Interaction results

Finally, we explore the situation where each variable relates to the response through an interaction. The data generation process is shown in Section 2.5.

From Figure 10, we observe the following:

Figure 10. Panel of plots comparing four different variable importance metrics across

$\texttt{mtry}$ . The equation for the response is

$y = 4v_{1}v_{2} + 2v_{3}v_{4} + 1v_{5}v_{6} + 0v_{7}v_{8} + \epsilon$ , where

$\epsilon \sim \mathcal{N}(0, 0.1)$ . See also Eq (2.5).

DownLoad: Full-Size Img PowerPoint

● The OOB plot experiences the greatest spread in importance values.

● The Train PaP experiences a distinct positive bias in the importance values of V7 and V8.

● As $\texttt{mtry}$ increases, each of the plots generally trends toward importances that are more proportional with the true coefficients.

● The gap between coefficients of 2 and 1 is much larger for the OOB plot than the other plots.

● The Validation DaP spread is just slightly smaller than the Validation PaP.

4. Discussion

Our results highlight some powerful discoveries about permutation and variable deletion techniques. First and most notably, they provide valuable empirical evidence of a very high association between permuting a variable and dropping that variable if the structures are fundamentally equivalent.

This work suggests that t-statistics have high alignment with the LOCO technique for permutation and dropout. Mathematically, $t_j \approx \text{perm_LOCO}_{j} \approx \text{drop_LOCO}_{j}$ . If surrogate t-statistic importances are desired in machine learning, then utilizing a LOCO technique appears to be a reasonable approach to approximate them. However, it is noteworthy that this can often be computationally expensive, especially for large datasets.

Meanwhile, the true coefficients have strong alignment with the regression estimated coefficients and the PaP and DaP methods. This can be expressed as $\beta_j \approx \hat{\beta}_j \approx \text{PaP}_{j} \approx \text{DaP}_{j}$ . If interest is predominately in predictor relationships with the response rather than marginal predictive capacity, then these metrics would be preferred. Recognizing this relationship, PaP or DaP techniques could be employed to obtain meaningful surrogates to the functional equation coefficients. We especially note from our results that these should be calculated using validation data over training data due to biases in variables with lower importance. If no reasonable validation set is available, the same process might be done using a k-fold cross-validation technique.

Our work also suggests a clear difference between the random forest OOB importance and the validation PaP and DaP importances. This contradicts common intuition that they should be equal or proportional. Further consideration led us to realize that OOB importance assesses the variables in the individual trees and aggregates them, while Validation PaP and DaP assess the variables in the entire forest. If precision of estimating importance values matters, then the Validation PaP or DaP might be preferred over the OOB PaP.

In summary, we have shown that permutation techniques and dropout techniques are approximately equal. We also illustrate that manual PaP and DaP methods are not equal to the OOB PaP. Even more noteworthy, our work shows that equation coefficients and regression estimates have strong associations with manual PaP metrics, while t-statistics have strong associations with LOCO metrics. This implies that some excellent future theoretical work may be available to relate these approaches to mathematically well-defined regression metrics. Ultimately, if a machine learning project desires a certain regression metric, our work suggests which variable importance techniques should be used to obtain an appropriate surrogate.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Conflict of interest

The authors declare there is no conflicts of interest.

Code access

The code used to generate the data and figures in this report can be found at:

https://github.com/KelvynBladen/reg_Permute_Importance

References

[1]	A. M. Lyapunov, Problème général de la stabili $\mathop {\rm{t}}\limits^{' }$ e du mouvement, Ann. Fac. Sci. Univ. Toulouse, 9 (1907), 203–474.
[2]	A. Weinstein, Normal modes for nonlinear Hamiltonian systems, Invent. Math., 20 (1973), 47–57. https://doi.org/10.1007/BF01405263 doi: 10.1007/BF01405263
[3]	J. Moser, Periodic orbits near an equilibrium and a theorem by Alan Weinstein, Comm. Pure Appl. Math., 29 (1976), 724–747. https://doi.org/10.1002/cpa.3160290613 doi: 10.1002/cpa.3160290613
[4]	E. Fadell, P. H. Rabinowitz, Generalized cohomological index theories for Lie group actions with an application to bifurcation questions for Hamiltonian systems, Invent. Math., 45 (1978), 139–174. https://doi.org/10.1007/BF01390270 doi: 10.1007/BF01390270
[5]	J. A. Montaldi, R. M. Roberts, I. N. Stewart, Periodic solutions near equilibria of symmetric Hamiltonian systems, Phil. Trans. R. Soc. Lond. A, 325 (1988), 237–293. https://doi.org/10.1098/rsta.1988.0053 doi: 10.1098/rsta.1988.0053
[6]	T. Bartsch, A generalization of the Weinstein-Moser theorems on periodic orbits of a Hamiltonian system near an equilibrium, Ann. Inst. H. Poincaré Anal. Non Linéaire, 14 (1997), 691–718. https://doi.org/10.1016/S0294-1449(97)80130-8 doi: 10.1016/S0294-1449(97)80130-8
[7]	E. N. Dancer, S. Rybicki, A note on periodic solutions of autonomous Hamiltonian systems emanating from degenerate stationary solutions, Differ. Integral Equ., 12 (1999), 147–160.
[8]	A. Szulkin, Bifurcation for strongly indefinite functionals and a Liapunov type theorem for Hamiltonian systems, Differ. Integral Equ., 7 (1994), 217–234.
[9]	E. Pérez-Chavela, S. Rybicki, D. Strzelecki, Symmetric Liapunov center theorem, Calc. Var. Partial Differ. Equ., 56 (2017), 1–23. https://doi.org/10.1007/s00526-017-1120-1 doi: 10.1007/s00526-017-1120-1
[10]	E. Pérez-Chavela, S. Rybicki, D. Strzelecki, Symmetric Liapunov center theorem for minimal orbit, J. Differ. Equ., 265 (2018), 752–778. https://doi.org/10.1016/j.jde.2018.03.009 doi: 10.1016/j.jde.2018.03.009
[11]	D. Strzelecki, Periodic solutions of symmetric Hamiltonian systems, Arch. Ration. Mech. Anal., 237 (2020), 921–950. https://doi.org/10.1007/s00205-020-01522-6 doi: 10.1007/s00205-020-01522-6
[12]	M. Kowalczyk, E. Pérez-Chavela, S. Rybicki, Symmetric Lyapunov center theorem for orbit with nontrivial isotropy group, Adv. Differ. Equ., 25 (2020), 1–30.
[13]	M. Izydorek, Equivariant Conley index in Hilbert spaces and applications to strongly indefinite problems, Nonlinear Anal., 51 (2002), 33–66. https://doi.org/10.1016/S0362-546X(01)00811-2 doi: 10.1016/S0362-546X(01)00811-2
[14]	T. tom Dieck, Transformation groups, Walter de Gruyter & Co., Berlin, 1987. https://doi.org/10.1515/9783110858372
[15]	T. Kawasaki, Cohomology of twisted projective spaces and lens complexes, Math. Ann., 206 (1973), 243–248. https://doi.org/10.1007/BF01429212 doi: 10.1007/BF01429212
[16]	A. Hatcher, Algebraic topology, Cambridge University Press, Cambridge, 2002. https://doi.org/10.1017/S0013091503214620
[17]	K. H. Mayer, G-invariante Morse-funktionen, Manuscripta Math., 63 (1989), 99–114. http://dx.doi.org/10.1007/bf01173705 doi: 10.1007/bf01173705
[18]	J. Fura, A. Ratajczak, S. Rybicki, Existence and continuation of periodic solutions of autonomous Newtonian systems, J. Differ. Equ., 218 (2005), 216–252. https://doi.org/10.1016/j.jde.2005.04.004 doi: 10.1016/j.jde.2005.04.004
[19]	A. Gołȩbiewska, S. Rybicki, Equivariant Conley index versus degree for equivariant gradient maps, Discrete Contin. Dyn. Syst. Ser. S, 6 (2013), 985–997. http://dx.doi.org/10.3934/dcdss.2013.6.985 doi: 10.3934/dcdss.2013.6.985
[20]	Z. Balanov, W. Krawcewicz, H. Steinlein, Applied Equivariant Degree, AIMS Series on Differential Equations & Dynamical Systems, 1, Springfield, 2006.
[21]	A. Gołȩbiewska, P. Stefaniak, Global bifurcation from an orbit of solutions to non-cooperative semi-linear Neumann problem, J. Differ. Equ., 268 (2020), 6702–6728. https://doi.org/10.1016/j.jde.2019.11.053 doi: 10.1016/j.jde.2019.11.053
[22]	J. P. Serre, Linear representations of finite groups, Graduate Texts in Mathematics, 42 (1977), Springer-Verlag, New York-Heidelberg. https://doi.org/10.1007/978-1-4684-9458-7 doi: 10.1007/978-1-4684-9458-7
[23]	T. Bartsch, Topological methods for variational problems with symmetries, Lect. Notes Math., 1560, Springer-Verlag, Berlin, 1993. https://doi.org/10.1007/BFb0073859
[24]	K. Gȩba, Degree for gradient equivariant maps and equivariant Conley index, Topological nonlinear analysis II, Birkhäuser, (1997), 247–272. https://doi.org/10.1007/978-1-4612-4126-3_5
[25]	C. Conley, Isolated invariants sets and the Morse index, CBMS Regional Conference Series in Mathematics, 38, American Mathematical Society, Providence, R. I., 1978. https://doi.org/10.1090/cbms/038
[26]	J. Smoller, A. Wasserman, Bifurcation and symmetry-breaking, Invent. Math., 100 (1990), 63–95. https://doi.org/10.1007/BF01231181 doi: 10.1007/BF01231181

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Electronic Research Archive

1 1.3

Metrics

Article views(1931) PDF downloads(82) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Electronic Research Archive

Periodic solutions to symmetric Newtonian systems in neighborhoods of orbits of equilibria

Related Papers:

Abstract

1. Introduction

1.1. Regression metrics

1.2. Permute-and-predict (PaP) metrics

1.3. Leave-one-covariate-out (LOCO) metrics

2. Methods

2.1. Orthogonal data

2.2. Collinear data

2.3. Nonlinear squared data

2.4. Nonlinear cosine data

2.5. Interaction data

2.6. Regression modeling and variable importance

3. Results

3.1. Orthogonal comparison

3.2. Collinearity comparison

3.3. Collinearity results

3.4. Squared results

3.5. Cosine results

3.6. Interaction results

4. Discussion

Use of AI tools declaration

Conflict of interest

Code access

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Other Articles By Authors

Catalog

Electronic Research Archive

Periodic solutions to symmetric Newtonian systems in neighborhoods of orbits of equilibria

Related Papers:

Abstract

1. Introduction

1.1. Regression metrics

1.2. Permute-and-predict (PaP) metrics

1.3. Leave-one-covariate-out (LOCO) metrics

2. Methods

2.1. Orthogonal data

2.2. Collinear data

2.3. Nonlinear squared data

2.4. Nonlinear cosine data

2.5. Interaction data

2.6. Regression modeling and variable importance

3. Results

3.1. Orthogonal comparison

3.2. Collinearity comparison

3.3. Collinearity results

3.4. Squared results

3.5. Cosine results

3.6. Interaction results

4. Discussion

Use of AI tools declaration

Conflict of interest

Code access

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog