Loading [MathJax]/jax/output/SVG/jax.js
Research article Special Issues

Forecasting crude oil price using LSTM neural networks

  • Received: 20 February 2022 Revised: 09 May 2022 Accepted: 18 May 2022 Published: 07 July 2022
  • JEL Codes: C53, C81, C82, E37

  • As a key input factor in industrial production, the price volatility of crude oil often brings about economic volatility, so forecasting crude oil price has always been a pivotal issue in economics. In our study, we constructed an LSTM (short for Long Short-Term Memory neural network) model to conduct this forecasting based on data from February 1986 to May 2021. An ANN (short for Artificial Neural Network) model and a typical ARIMA (short for Autoregressive Integrated Moving Average) model are taken as the comparable models. The results show that, first, the LSTM model has strong generalization ability, with stable applicability in forecasting crude oil prices with different timescales. Second, as compared to other models, the LSTM model generally has higher forecasting accuracy for crude oil prices with different timescales. Third, an LSTM model-derived shorter forecast price timescale corresponds to a lower forecasting accuracy. Therefore, given a longer forecast crude oil price timescale, other factors may need to be included in the model.

    Citation: Kexian Zhang, Min Hong. Forecasting crude oil price using LSTM neural networks[J]. Data Science in Finance and Economics, 2022, 2(3): 163-180. doi: 10.3934/DSFE.2022008

    Related Papers:

    [1] Pengfei Liu, Yantao Luo, Zhidong Teng . Role of media coverage in a SVEIR-I epidemic model with nonlinear incidence and spatial heterogeneous environment. Mathematical Biosciences and Engineering, 2023, 20(9): 15641-15671. doi: 10.3934/mbe.2023698
    [2] Zongwei Ma, Hongying Shu . Viral infection dynamics in a spatial heterogeneous environment with cell-free and cell-to-cell transmissions. Mathematical Biosciences and Engineering, 2020, 17(3): 2569-2591. doi: 10.3934/mbe.2020141
    [3] Wenzhang Huang, Maoan Han, Kaiyu Liu . Dynamics of an SIS reaction-diffusion epidemic model for disease transmission. Mathematical Biosciences and Engineering, 2010, 7(1): 51-66. doi: 10.3934/mbe.2010.7.51
    [4] Abdennasser Chekroun, Mohammed Nor Frioui, Toshikazu Kuniya, Tarik Mohammed Touaoula . Global stability of an age-structured epidemic model with general Lyapunov functional. Mathematical Biosciences and Engineering, 2019, 16(3): 1525-1553. doi: 10.3934/mbe.2019073
    [5] Yongli Cai, Yun Kang, Weiming Wang . Global stability of the steady states of an epidemic model incorporating intervention strategies. Mathematical Biosciences and Engineering, 2017, 14(5&6): 1071-1089. doi: 10.3934/mbe.2017056
    [6] Danfeng Pang, Yanni Xiao . The SIS model with diffusion of virus in the environment. Mathematical Biosciences and Engineering, 2019, 16(4): 2852-2874. doi: 10.3934/mbe.2019141
    [7] Shuangshuang Yin, Jianhong Wu, Pengfei Song . Analysis of a heterogeneous SEIRS patch model with asymmetric mobility kernel. Mathematical Biosciences and Engineering, 2023, 20(7): 13434-13456. doi: 10.3934/mbe.2023599
    [8] Jinzhe Suo, Bo Li . Analysis on a diffusive SIS epidemic system with linear source and frequency-dependent incidence function in a heterogeneous environment. Mathematical Biosciences and Engineering, 2020, 17(1): 418-441. doi: 10.3934/mbe.2020023
    [9] Jinliang Wang, Hongying Shu . Global analysis on a class of multi-group SEIR model with latency and relapse. Mathematical Biosciences and Engineering, 2016, 13(1): 209-225. doi: 10.3934/mbe.2016.13.209
    [10] Yan’e Wang , Zhiguo Wang, Chengxia Lei . Asymptotic profile of endemic equilibrium to a diffusive epidemic model with saturated incidence rate. Mathematical Biosciences and Engineering, 2019, 16(5): 3885-3913. doi: 10.3934/mbe.2019192
  • As a key input factor in industrial production, the price volatility of crude oil often brings about economic volatility, so forecasting crude oil price has always been a pivotal issue in economics. In our study, we constructed an LSTM (short for Long Short-Term Memory neural network) model to conduct this forecasting based on data from February 1986 to May 2021. An ANN (short for Artificial Neural Network) model and a typical ARIMA (short for Autoregressive Integrated Moving Average) model are taken as the comparable models. The results show that, first, the LSTM model has strong generalization ability, with stable applicability in forecasting crude oil prices with different timescales. Second, as compared to other models, the LSTM model generally has higher forecasting accuracy for crude oil prices with different timescales. Third, an LSTM model-derived shorter forecast price timescale corresponds to a lower forecasting accuracy. Therefore, given a longer forecast crude oil price timescale, other factors may need to be included in the model.



    Surveillance videos are integral to maintaining public safety and upholding social order by serving as valuable sources of forensic evidence. However, the current state of intelligent surveillance systems, while capable of real-time detection of abnormal video actions [1], is predominantly characterized by intricate, multi-stage pipelines. These pipelines involve processes like object detection, tracking, identification, and action analysis. Unfortunately, these approaches heavily rely on handcrafted features and struggle to adapt to the diverse landscape of monitoring data.

    Video content analysis is indispensable in digital forensics [2], as human motions contain vital biological characteristics for suspect action analysis and case resolution. This research aims to introduce an end-to-end, trainable deep learning approach to automatically detect and precisely recognize abnormal actions in multi-object surveillance videos. This development is poised to empower public safety agencies, granting them the ability to perform efficient forensic video analysis.

    This paper confronts the challenge of detecting abnormal actions in surveillance videos, a domain often characterized by multiple objects exhibiting diverse action categories. Traditional action detection methods fall into two categories: temporal and spatio-temporal. Temporal methods focus on identifying actions and their temporal boundaries, while spatio-temporal methods incorporate spatial information into the analysis. This study delves into the domain of spatio-temporal action detection methods, with a specific emphasis on enhancing the effectiveness of action detection in the context of abnormal actions.

    Numerous studies have delved into abnormal action recognition in surveillance, largely depending on action detection to identify actions. However, the complexity of action detection is not merely about identifying actions; it is also about precisely localizing them. This intricacy makes training more demanding and balancing recognition and localization accuracy a formidable task. In light of this challenge, this paper advocates for the division of abnormal action recognition in surveillance into two distinctive stages: coarse and precise detection. For untrimmed surveillance videos, the initial stage involves conducting coarse detection using spatio-temporal action detection to identify action categories and temporal boundaries of each object, consequently creating action tubes. Subsequently, precise detection comes into play, utilizing action recognition methods with higher classification accuracy to classify the action tubes.

    This study introduces an innovative approach for spatio-temporal action detection and recognition. This work integrates the advanced YOWOv2E model with the ViTSN model, enhancing the precision of multi-object abnormal action recognition while maintaining a rapid inference speed. In response to the practical requirements of public security applications, this research also presents a novel surveillance action dataset, ensuring adaptability of the model to a wide range of real-world scenarios.

    The contributions of this paper can be succinctly summarized as follows:

    1) In the coarse detection phase, this paper introduces the YOWOv2E algorithm, which builds upon YOWOv2 by incorporating a channel attention mechanism and a novel joint loss function to enhance coarse detection accuracy while maintaining speed.

    2) For precise detection, this paper proposes a spatio-temporal two-stream network model based on the Vision Transformer (ViT), utilizing transfer learning to mitigate overfitting and incorporating the Simple Attention Mechanism (SimAM) to reduce background interference. Segmented sampling effectively manages lengthy time sequences, while the integration of optical and RGB data enhances model accuracy.

    3) This paper creates a public security surveillance abnormal action dataset (PSA-Dataset) to meet practical forensic demands and validate the generalization performance of the proposed model using this dataset.

    This paper divides the process of recognizing abnormal actions into two distinct tasks: action detection and action recognition. Action detection primarily involves the identification and localization of human actions in untrimmed videos, while action recognition focuses on classifying these actions. This research explores the challenges and methods associated with this dual objective.

    Action detection: Action detection seeks to identify and localize specific actions within videos. Early methods [3,4] heavily relied on manually crafted features and simplistic classifiers, limiting their effectiveness due to the absence of high-level semantic understanding of complex dynamic actions. The introduction of Convolutional Neural Networks (CNNs) revolutionized video action detection with deep learning [5,6]. Further advancements, such as 3DCNNs [7,8], improved the ability of the model to extract temporal characteristics. In the context of multi-object scenarios, existing methods often entail object detection followed by action localization and classification, posing challenges in end-to-end training, processing efficiency, and deployment. The YOWO series [9,10] introduced a groundbreaking approach that enables simultaneous localization of action boundaries, action recognition, and actor identification. This approach facilitates end-to-end training while addressing timeliness concerns in action detection. One of the publicly available datasets for this assignment is UCF101-24, which offers rich scenarios for behavior analysis and covers a broad spectrum of behavioral categories. JHMDB-21 facilitates more in-depth behavior analysis by providing comprehensive annotations of joint positions in every video. The AVA dataset encompasses a wide range of interpersonal behavior exchanges and is sourced from movie and TV program clips. The UCF-Crime and RWF-2000, sourced from surveillance videos, include real-world anomalous behavior events, contributing to the development and testing of anomaly detection systems applicable in real-world settings.

    Action recognition: The primary goal of action recognition is to accurately classify pretrimmed video actions. Initially, CNNs [11] extracted spatial features directly from video frames for classification but did not surpass traditional methods like improved dense trajectories (iDT) in recognition accuracy. The optical method, which calculates object motion information between frames, proved effective in extracting temporal characteristics. The two-stream network [12], combining video frames and stacked optical maps for spatial and temporal feature extraction, enhanced temporal feature extraction. To capture long-term features, Temporal Segment Networks (TSN) [13] introduced a sparse sampling strategy. Recent developments in this field focus on the Transformer architecture [14,15,16,17], renowned for its self-attention mechanism that can gather global information directly in both spatial and temporal dimensions. Nevertheless, challenges in action recognition persist, including the need for extensive annotated data, training with small datasets, managing background interference, and addressing computational constraints when processing lengthy time sequences. UCF101, which covers a wide range of action categories from sports to musical instrument performance to daily activities, is one of the public datasets used for this job. The HMDB-51, which spans a wide spectrum of human motions from basic gestures to intricate movements, is gathered from YouTube and motion pictures. Large-scale datasets like Kinetics make it easier to train more intricate and effective models.

    This paper introduces a two-step approach to digital forensics, encompassing coarse detection and precise detection, to facilitate the identification and detection of abnormal actions in surveillance videos. The overall detection framework is graphically depicted in Figure 1. In the coarse detection phase, the YOWOv2E algorithm builds upon YOWOv2 by introducing a channel attention mechanism and a novel joint loss function, enhancing coarse detection accuracy while maintaining speed. For precise detection, the paper proposes a spatio-temporal two-stream network model based on the Vision Transformer (ViT) [18]. Transfer learning is employed to minimize overfitting, the SimAM [19] reduces background interference, and segmented sampling manages lengthy time sequences. Integrating optical and RGB data within the two-stream network, coupled with the self-attention mechanism, reduces computational complexity and improves accuracy. Constructed action tubes are subsequently classified to ensure robust detection accuracy. To address practical forensic requirements, a public security surveillance abnormal action dataset has been created and employed to validate the performance of the model.

    Figure 1.  The framework of multi-target abnormal action detection.

    This research introduces the YOWOv2E model, as depicted in Figure 2. At its core, YOWOv2E features two enhanced single-stage networks, 2DShuffleNetv2E and 3DShuffleNetv2E, each incorporating dual branches. These branches are dedicated to the extraction of 2D spatial information and 3D spatio-temporal information, respectively. A fusion network integrates these components through a sequence of encodings, generates detection boxes for human actions. These boxes include confidence scores and categories, enabling precise localization and recognition of actions.

    Figure 2.  The structure of the improved YOWOv2.

    In the YOWOv2E model, the 2D branch focuses on capturing spatial details within video frames, encompassing the shapes, appearances, positions, and postures of individuals. This branch utilizes 2D convolutional networks, drawing inspiration from the YOLOv7 [20] model in object detection. YOWOv2E treats object detection as a regression problem, directly predicting bounding boxes and action class probabilities from image pixels.

    In contrast, the 3D branch is responsible for capturing spatio-temporal information, including action types, speeds, and individual motion directions. This branch predominantly utilizes 3D convolutional networks. The design of the 3D branch equips the YOWOv2E model to comprehensively extract temporal features from actions, enhancing its understanding of actions within videos. YOWOv2E effectively integrates spatial and temporal features, leveraging the ECA [22] module to enhance its feature representation capability.

    The PSA-Dataset, designed for surveillance scenarios, presents unique challenges characterized by complex multi-person interactions and a diverse array of action categories. To better capture features within these intricate scenes, this paper introduces an enhanced ShuffleNetv2 structure, referred to as ShuffleNetv2E. The original ShuffleNetv2 [21] is an efficient network tailored for mobile devices, employing techniques such as channel shuffling and group convolution to enhance computational efficiency. To further optimize its performance in complex surveillance scenarios, this work integrates an Efficient Channel Attention (ECA) [22] attention mechanism at the conclusion of each branch within every Shuffle Block in the ShuffleNetv2 architecture.

    The ECA attention mechanism aids in improving the model's focus on the channel elements that are most beneficial for categorization. The features travel through a one-dimensional convolution layer after undergoing global average pooling as input into the ECA attention mechanism. The kernel size of this layer is adaptively adjusted according to the feature map's dimensions. This approach effectively captures the relationships between channels without explicitly increasing the model's parameters or computational complexity. Channel attention weights for the original feature map are produced by passing the features through a Sigmoid function following convolution layer processing. These channel attention weights lessen the influence of irrelevant features and increase the focus on useful features for anomaly identification. Adaptive weighting across channels is accomplished through element-wise multiplication of these weights with the original features. The model's capacity to describe certain category features is improved by highlighting the dependencies between channels, which further increases the accuracy of anomalous behavior identification.

    ShuffleNetv2E centers around the Shuffle Block as its core component, with its operation contingent on the stride value. In the case of a stride value set to 1, as illustrated in Figure 3(a), the input is divided into two components. One portion forms a "direct connection" or "shortcut" to maintain information continuity and mitigate gradient vanishing issues. The other undergoes a 1 × 1 convolution for channel transformation, followed by a 3 × 3 depthwise separable convolution, extracting spatial information while preserving computational efficiency. This is followed by another 1 × 1 convolution, coupled with the ECA mechanism. The result is then concatenated with the "direct connection" and subjected to a channel shuffle operation, facilitating information exchange between features and yielding the final output.

    Figure 3.  The structure of the improved Shuffle Block. (a) The structure of the Shuffle Block 1. (b) The structure of the Shuffle Block 2.

    In the case of a Shuffle Block with a stride of 2, as depicted in Figure 3(b), the input is initially bifurcated into two components. The first component undergoes a 3 × 3 depthwise separable convolution with a stride of 2, reducing spatial dimensions while capturing spatial information. It is followed by a 1 × 1 convolution for channel transformation. Concurrently, the second component commences with a 1 × 1 convolution for channel transformation, followed by a 3 × 3 depthwise separable convolution with a stride of 2 to reduce spatial dimensions. This part also undergoes another 1 × 1 convolution for channel transformation and benefits from the ECA mechanism to enhance feature distinctiveness.

    Following the described operations, the two separate feature map parts are combined. To promote effective information exchange in the channel dimension between these components, they are subjected to a channel shuffle operation. This process results in the output of the Shuffle Block, and this outcome varies depending on whether the stride is set to 2 or 1. Following these operations, the two feature map components are unified. To facilitate effective information exchange in the channel dimension between these components, they are subjected to a channel shuffle operation. This process culminates in the output of the Shuffle Block, and the resulting output varies based on whether the stride is set to 2 or 1.

    In the context of modeling abnormal actions, a common and challenging problem arises due to class imbalance. Abnormal events are less frequent, resulting in a dataset with significantly more normal action samples than abnormal action samples. This imbalance can introduce bias, where the model is more likely to favor the majority class.

    In the context of modeling abnormal actions, the challenge of class imbalance frequently arises. Abnormal events are less frequent, leading to a dataset with significantly more normal samples than abnormal ones. This imbalance can introduce bias, with the model favoring the majority class. To effectively address this issue, this paper employs the Sigmoid Focal Loss [23] as the classification loss function, defined in Eq (1). In this equation, pt represents the predicted probability for positive samples, α denotes the weight coefficient, and γ regulates the weight distribution. Specifically designed for scenarios where each category is treated as an independent binary classification problem, especially in cases of multiple categories or multi-label tasks, the Sigmoid Focal Loss assigns lower weights to normal actions and higher weights to abnormal actions. This encourages the model to prioritize the learning of abnormal actions during training.

    Lfocal(pt)=α(1pt)γln(pt) (1)

    When conducting a detailed analysis of abnormal actions, a common challenge lies in the ambiguity of action boundaries. Many actions, such as walking and running or sitting and falling, lack clearcut boundaries, making it challenging for the model to provide accurate predictions, particularly at these ambiguous boundaries. To address this challenge and enhance the resilience of the proposed model in handling these fuzzy boundaries, this paper introduces Label Smooth [24] into the Binary Cross-Entropy (BCE) with Logits Loss, utilizing it as a confidence loss function. Label Smooth transforms predictions from hard labels (e.g., 0 or 1) into softer labels ranging between 0 and 1, as shown in Eqs (2) and (3). By introducing a small smoothing parameter ε, Label Smooth offers a more gradual learning approach, preventing the model from making overly confident predictions. Instead, it adds flexibility to the predictions of the model, contributing to better handling of boundary samples among various actions.

    ˆy=(1ε)y+ε2 (2)
    Lsmooth(ˆy,p)=ˆyln(p)(1ˆy)ln(1p) (3)

    In this paper, the training loss function, as depicted in Eq (4), combines three components: Lconf (BCE with Logits Loss with Label Smooth integration), Lcls (Sigmoid Focal Loss), and Lreg (GIoU Loss). This combination aims to enhance the model's overall performance. Within the equation, ax,y, bx,y, and cx,y correspond to classification predictions, regression predictions, and confidence predictions, respectively, while ˆax,y, ˆbx,y, and ˆcx,y represent the corresponding labels. Npos signifies the number of positive samples, and α, β, and θ are the weights assigned to each loss component. I{ˆax,y>0} serves as an indicator function that evaluates to 1 when the condition ˆax,y > 0 holds and 0 otherwise.

    L({ax,y},{bx,y},{cx,y})=θNposx,yLconf(ˆcx,y,cx,y) +αNposx,yI{ˆax,y>0}Lcls(ˆax,y,ax,y) + βNposx,yI{ˆax,y>0}Lreg(ˆbx,y,bx,y) (4)

    Precise detection is carried out to reduce the false alarm rate of the algorithm after constructing action tubes from untrimmed surveillance videos. The enhancement of action detection in surveillance videos leverages the ViTSN model, composed of four key modules: an attention module based on SimAM, spatio-temporal feature extraction using the pretrained ViT, a time self-attention module, and a decision fusion module. Furthermore, improvements have been made to the loss function. The process involves preprocessing action tubes to obtain optical frames, feature extraction with the SimAM-based attention module and pretrained ViT, extracting temporal information with the temporal attention module, and assigning weights to RGB and optical images using the decision fusion module. The significance and role of each module in reducing false alarms and increasing detection accuracy are detailed in Figure 4 of the source.

    Figure 4.  Spatio-temporal two-stream network model based on ViT.

    The primary innovation of the model lies in the introduction of a 3D parameter-free attention mechanism into the backbone network to enhance interference resistance. This mechanism focuses on information-rich neurons and their impact on neighboring neurons, particularly regarding spatial suppression effects. It defines an energy function for each neuron, quantifying the linear separability between the target neuron and others. The importance of individual neurons is assessed, and their weights are adjusted accordingly, as depicted in Eq (5). Equation (5) represents the energy function, where t denotes the target neuron, et signifies the energy of the target neuron, ˆμ and ˆσ2 represent the mean and variance of all other neurons in a single channel, excluding the specified neuron, and λ is a hyperparameter. Explain the practical implications and benefits of this approach.

    et=4(ˆσ2+λ)(tˆμ)2+2ˆσ2+2λ (5)

    Equation (6) implements the assessment of neurons through a linearly separable function, where E represents a matrix composed of et, sigmoid is the activation function, which introduces nonlinearity. The importance of a target neuron is inversely related to its energy level, allowing it to distinguish itself from neighboring neurons and emphasizing its significance. Subsequently, this target neuron's weight is determined based on its significance. Clarify the implications and advantages of this evaluation approach within the context of neural network models.

    ˜X=sigmoid(1E)X (6)

    In contrast to the one-dimensional squeeze-and-excitation network (SENet) and the two-dimensional convolutional attention mechanism (CBAM), the three-dimensional SimAM is proficient in assessing the significance of all neurons. This distinctive capability enables SimAM to focus on both the channel and spatial dimensions. This streamlined approach not only diminishes background interference but also augments action features.

    Both two-stream feature extraction networks are built upon the ViT model but use distinct input data sources. The spatial network takes a sequence of n RGB image frames as input, facilitating action classification through the extraction of static image features, involving feature fusion from multiple frames along the temporal dimension. In contrast, the temporal network is fed n consecutive sets of optical images, each comprising 10 frames, which encode motion information, contributing to the extraction of action-related data.

    This paper introduces a temporal self-attention mechanism, enabling the model to detect changes in temporal features and enhance their extraction. The spatial features extracted by ViT are reshaped into a matrix and undergo temporal self-attention operations. This reshaped matrix is subjected to linear projection, yielding the query matrix Q, the key matrix K, and the value matrix V. The complete temporal self-attention process is illustrated in Eq (7).

    Attention(Q,K,V)=SoftMax(QKTdK)V (7)

    The complete temporal self-attention process is illustrated in Eq (8) for the input matrix. In this equation, LayerNorm() represents a normalization layer designed to stabilize the feature distribution, MLP() denotes a multi-layer perceptron responsible for scaling feature vectors to extract more abstract and meaningful feature representations, and Dropout() is a regularization technique that selectively deactivates certain neurons to mitigate the risk of model overfitting.

    X1=Attention(LayerNorm(X))X2=X1+Dropout(X1)X3=MLP(LayerNorm(X2))Output=X3+Dropout(X3) (8)

    The loss function of the network model under consideration places a premium on enhancing classification precision while concurrently minimizing feature disparities across a multitude of video segments. The formula for these calculations is articulated as follows:

    lossCE=ni=1yiln^yilosscos=max[1cos(xi,xj)]Lossall=λlossCE+ηlossCOS (9)

    The loss function of the network model prioritizes enhancing classification precision while minimizing feature disparities across numerous video segments. The formula for these calculations, as shown in Eq (9), incorporates the cross-entropy loss function, which quantifies the disparity between the model-generated probability distribution and the true label distribution in classification tasks. Additionally, it utilizes the cosine similarity loss function to assess the similarity between samples within the feature space, promoting alignment of frames from the same category and separation of frames from distinct categories. Through the combined use of these loss functions, a synergistic optimization effect enhances the feature representation capabilities of the proposed model.

    UCF101-24 and JHMDB-21 primarily focus on sports and fitness activities. Although they include anomalous behaviors, each video in these datasets features only one individual performing actions without considering multi-person interactions. The AVA dataset can be categorized into three major types: human actions, object manipulation, and person interactions. Each frame may involve multiple actors, each possibly performing various actions. These datasets, however, are not intended for use in surveillance. This implies that they might not have certain behaviors and features, including car movement and pedestrian flow, nor realistic, everyday surveillance contexts, such as streets or malls. As such, they might not be appropriate for modeling training in surveillance environments. Although pertinent to surveillance settings, UCF-Crime and RWF-2000 only offer labels at the video level. This limitation hinders the model's ability to learn the temporal and spatial positioning of actions and limits their practical application in real-world settings.

    To facilitate comprehensive performance assessment and align with practical application scenarios, this study introduces the Public Security Abnormal Action Dataset (PSA-Dataset). Curated from 118 original videos from the UCF-Crime and RWF-2000, the PSA-Dataset notably integrates frame-level spatial annotations, ensuring comprehensive coverage in pivotal frames. It is divided into PSAD for spatio-temporal action detection and PSAR for action recognition. This dual division minimizes false negatives during model training and enhances accuracy. Each action in the dataset is endowed with two distinct labels: one designates it as an abnormal action, while the other specifies the specific category, thus evaluating the competence of the model in both abnormality detection and multi-class classification.

    PSAD undergoes data preprocessing, which includes cropping, frame sampling, spatial annotations for each individual in frames, and labeling. The dataset includes annotated abnormal actions displayed in Figure 5. Within the images, green bounding boxes indicate normal actions, while red bounding boxes highlight abnormal actions.

    Figure 5.  Partial images in the PSAD.

    As for PSAR, action tubes are constructed from the original video using PSAD labels. Each keyframe is cropped based on the ground-truth bounding box of each actor, with careful extensions to prevent the loss of action backgrounds that could impact model training. Images are resized to 224 × 224 for model data input. Cropped segments of the same actor from each keyframe are merged to form the action tube, as depicted in Figure 6. After constructing the action tube, the TV-L1 optical algorithm computes the optical images corresponding to each action tube.

    Figure 6.  The production process of PSAR.

    The experiments in this study were conducted on a 64-bit Windows 10 operating system, utilizing an Intel Xeon Gold 5118 CPU running at 2.30 GHz, GPU acceleration provided by an NVIDIA Tesla V100, and 32 GB of available memory.

    The performance evaluation of the model in spatio-temporal action detection employs the UCF101-24 public dataset for ablation and comparative experiments. For the action detection tasks involving the UCF101-24 dataset, the study maintains the configuration prescribed by the authors of YOWOv2 to ensure the objectivity of the experimental results. The training and testing sets are divided in an 8: 2 ratio for the PSAD. A weight decay of 0.0005 is used in conjunction with an initial learning rate of 0.0004 during training. The model is trained for a total of 50 epochs with an image batch size of 64. At the second, fourth, sixteenth, and twenty-fourth epochs, the learning rate is halved. The batch size is kept at 64 throughout the testing process.

    In the context of action recognition, ablation and comparative experiments rely on the UCF101 and HMDB51 public datasets. For these datasets, this paper adheres to the official configurations. For PSAR, the study employes a RGB image batch size of 30 and an optical image batch size of 10. For RGB images, the number of channels is 3, while for optical flow images, the number of channels is 10. The initial learning rate is set at 0.001, and the model undergoes training for 40 epochs, with the learning rate halving every 2 epochs to mitigate overfitting. To mitigate overfitting, the momentum is set to 0.9, and the weight decay rate is set to 0.0005. Stochastic gradient descent is employed as the optimizer to enhance the model's robustness. To determine the optimal value of λ, this paper conducted relevant ablation experiments, which are included in the supplementary materials. The joint loss function is utilized, with a weight of 0.60 for λ and 0.40 for η. Input images are resized, center-cropped to dimensions of 224 × 224, and subject to random horizontal rotation with a 50% probability. Subsequently, images are normalized and subjected to regularization in accordance with specified parameters.

    To address potential overfitting, this study employs cross-modal pretraining techniques. In the experiments, model weights pretrained on the ImageNet dataset were loaded into the baseline model to achieve thorough optimization of network parameters. For optical flow data, linear transformation is used to discretize the optical flow data into the 0–255 range, aligning it with the value domain of a single RGB channel. The weights of the first convolutional layer of the ViT are averaged and then replicated according to the channel count of the optical flow data. The input channel number of ViT's first convolutional layer is modified, and the averaged weights are loaded. Regarding regularization techniques, on top of layer normalization and Dropout in ViT, an additional Dropout layer is added after the final layer normalization to further reduce the impact of overfitting, with a Dropout rate set at 0.5. For data augmentation techniques, scale jittering, corner cropping, and random horizontal flipping are included. These techniques help enhance the model's ability to recognize data from different angles and positions. During testing, both RGB and optical images are processed with a batch size set to 1. Each video is evenly divided into 25 segments, from which a single frame is randomly selected to produce 25 frames. All images are scaled, center-cropped to 224 × 224, normalized, and standardized, yielding 25 images of 224 × 224 each. The primary metric for assessing the performance of the action recognition algorithms in this paper is Accuracy.

    This paper rigorously validates the performance of the YOWOv2E model in the domain of general action detection tasks, utilizing comparisons with contemporary leading action detection models on the UCF101-24 dataset, as meticulously outlined in Table 1. YOWOv2E emerges as a standout performer in terms of detection accuracy on this widely recognized public dataset while maintaining exceptional computational efficiency. Notably, in contrast to its counterparts, including ACT [26], MOC [28], and YOWO [9], YOWOv2E showcases a distinctive advantage by operating with significantly lower floating-point operations, thus enabling expedited inference. This attribute greatly enhances its applicability in real-world scenarios and substantially multi-gates computational complexity, addressing a pressing concern in the field of action detection. Furthermore, YOWOv2E distinguishes itself by surpassing its predecessor, YOWOv2, showcasing substantial enhancements in general action detection tasks. This includes improvements in target localization and classification while preserving the original computational efficiency of the proposed model. The YOWOv2E model strikes an admirable equilibrium between performance and computational costs, endowing it with a distinct practical advantage, particularly in resource-constrained scenarios encountered in real-world contexts such as public safety surveillance.

    Table 1.  Performance comparison of different methods on UCF101-24.
    Method Frame-mAP/% GFLOPs (× 109)
    T-CNN [25] 41.4 -
    ACT [26] 67.1 256.2
    STEP [27] 75 125.7
    MOC [28] 77.8 > 93.2
    YOWO [9] 77.8 39.3
    YOWOv2 [10] 77.14 1.28
    YOWOv2E (Ours) 78.52 1.28

     | Show Table
    DownLoad: CSV

    Additionally, the YOWOv2E approach demonstrates remarkable robustness and versatility when contrasted with other leading models. Its consistent, stable performance extends across diverse scenarios and a spectrum of abnormal actions, making it an attractive choice for a wide range of applications. In essence, Table 1 effectively substantiates the superior performance of the YOWOv2E model on the UCF101-24 dataset. Not only does it surpass its competitors in terms of accuracy, but it also unambiguously exhibits clear advantages in computational efficiency, thereby forming a solid foundation for the extensive deployment of this approach in various public safety monitoring applications.

    Abnormal actions often present distinctive characteristics which significantly intensify the challenge of accurately distinguishing them from normal ones. In response to this intricate classification challenge, this paper introduces the YOWOv2E model, an extension of the YOWOv2. This extended model introduces innovative components, namely the ShuffleNetv2E feature extraction network and a joint loss function. By capitalizing on advanced techniques like the ECA mechanism, Focal Loss, and label smoothing, the model greatly amplifies its capabilities in feature extraction and classification, with a particular focus on abnormal actions.

    Table 2 provides an insightful view of the experimental results related to binary classification, precisely discriminating between normal and abnormal actions using both YOWOv2 and YOWOv2E. A meticulous review of Table 2 underscores the exceptional proficiency of YOWOv2E in locating and categorizing actions across both abnormal and normal action detection scenarios. Notably, YOWOv2E excels with a remarkable 5.77% improvement in Frame-mAP for abnormal action recognition compared to its predecessor, YOWOv2. This outcome underscores heightened capacity of YOWOv2E to extract features associated with abnormal actions and refine the boundaries of classification between abnormal and normal actions, ultimately enhancing the precision of abnormal action detection.

    Table 2.  Comparison of binary classification results on PSAD.
    Method Abnormal/% Normal/% Total/%
    YOWOv2 62.10 78.59 70.34
    YOWOv2E (ShuffleNetv2E) 63.99 75.34 69.67
    YOWOv2E (Improved Loss) 65.19 77.81 71.50
    YOWOv2E 67.87 81.86 74.87

     | Show Table
    DownLoad: CSV

    Furthermore, Table 3 provides an in-depth presentation of multi-class classification experiments executed on the PSAD dataset. YOWOv2E shines, delivering a substantial 2.73% enhancement in Frame-mAP when compared to YOWOv2. This success can be attributed to the integration of the ShuffleNetv2E module, combined with the ECA mechanism. This strategic amalgamation empowers the model to focus intently on critical information within video frames, dynamically adjusting the weights assigned to each input. The relevance of this feature within the realm of video analysis becomes apparent, as it accommodates the diverse and varied nature of actional information embedded in different frames.

    Table 3.  Comparison of multiple classification results on PSAD.
    Method Frame-mAP/%
    YOWOv2 49.45
    YOWOv2E (ShuffleNetv2E) 50.96
    YOWOv2E (Improved Loss) 51.50
    YOWOv2E 52.18

     | Show Table
    DownLoad: CSV

    Moreover, the joint loss function plays a pivotal role in bolstering the classification capabilities of the model across a spectrum of actions. This function effectively expands the boundaries of action classification, instills greater confidence in action detection, and systematically enhances the capability to address class imbalances.

    The paper further delves into a comparative analysis of the proposed models, scrutinizing their performance in detecting specific categories. Visual results of abnormal action detection by YOWOv2 and YOWOv2E are vividly presented in Figure 7. Action detection grapples with a unique challenge – the blurred boundary that distinguishes abnormal from normal behaviors. This intrinsic ambiguity can lead to classification errors, including both missed detections and false alarms, as illustrated in Figure 7(a), (d). YOWOv2, for instance, misclassifies actions such as theft and fighting as normal activities like walking and standing. In stark contrast, YOWOv2E, harnessing the potential of ShuffleNetv2E and the joint loss function, proficiently identifies abnormal actions while imbuing a heightened level of confidence in its predictions.

    Figure 7.  Display of the detection results. (a)–(c) detection results of YOWOv2; (d)–(f) detection results of YOWOv2E.

    The challenge of distinguishing abnormal actions from normal ones, often complicated by blurred boundaries, is vividly depicted in Figure 7(a). Notably, YOWOv2 incorrectly categorizes a fall as a fight action, highlighting the classification shortcomings. In stark contrast, YOWOv2E excels by precisely recognizing abnormal actions, effectively addressing the intricacies of boundary definition. YOWOv2E takes a step further by intensifying feature extraction across the entire image, prioritizing image details, and significantly reducing the chances of overlooking actions, as show-cased in Figure 7(e). This enhancement is attributed to the incorporation of the ECA mechanism and meticulous loss function optimization, collectively elevating the understanding of complex scenes. Consequently, the model adeptly captures spatio-temporal features associated with abnormal actions in surveillance videos. This progress not only facilitates early anomaly detection but also empowers the extraction of crucial evidence, presenting substantial potential for enhancing law enforcement efficiency and bolstering public safety efforts.

    The difference in Loss for various baselines when trained on UCF101 is seen in Figure 8. The model converges more quickly when the baseline is swapped out for the ViT, and the Loss post-convergence also significantly decreases. The model's convergence speed is further increased and the loss is further decreased by modifying the loss function and adding SimAM.

    Figure 8.  Comparison of loss of different baselines on UCF101.

    For action recognition tasks, important factors include the actor's appendages and related objects, whereas irrelevant factors include the action's background. The more important factors and the fewer irrelevant factors the model learns, the better it performs. As depicted in Figure 9, embedding SimAM increases the model's attention to the appendages of people and objects related to actions and reduces the interference of action backgrounds, thereby enhancing the model's ability to represent features.

    Figure 9.  Visual feature heat maps.

    The ViT introduces a self-attention mechanism that effectively identifies long-range dependencies among image regions, effectively bypassing the constraints associated with the local receptive fields in CNN. By integrating transfer learning techniques and capitalizing on pretrained weights designed for image classification tasks, ViT mitigates the impact of the lack of inductive bias, yielding substantial performance improvements in action recognition tasks. The comparative analysis presented in Table 4 underscores the superior performance of ViT when contrasted with two CNN-based neural networks, SE-ResNet152 [29] and NFNet-F1 [30]. The latter networks excel primarily in image classification tasks but exhibit limitations stemming from their inductive biases, such as translation invariance and local sensitivity, which hinder their ability to comprehensively capture global image information and evaluate feature interdependencies. As a consequence, their understanding of the overall image context remains incomplete.

    Table 4.  Comparison of results of different baselines on UCF101.
    Baseline Spatial/% Temporal/% Two-Stream/% Param (× 106) GFLOPs (× 109)
    SE-ResNet152 83.06 72.55 89.65 65 148.06
    NFNet-F1 85.97 83.22 93.19 129.87 226.8
    ViT 87.28 83.37 94.59 85.7 228.36
    ViT + SimAM 88.58 84.97 95.35 85.7 228.36
    ViT (Joint Loss) 88.79 84.21 95.1 85.7 228.36
    ViT (Joint Loss) + SimAM 89.67 85.33 95.63 85.7 228.36

     | Show Table
    DownLoad: CSV

    In contrast, the self-attention mechanism of ViT empowers the network to concentrate on relationships among any image blocks, allowing it to effectively utilize contextual information in modeling global dependencies within images. This pivotal feature compensates for the limitations inherent in convolutional architectures.

    In order to comprehensively extract temporal features, this paper has introduced a temporal self-attention module into the proposed model, and this work has undertaken a series of ablation experiments to evaluate its optimal placement. As depicted in Table 5, when the temporal self-attention module is positioned at the beginning of the ViT, it prematurely amalgamates temporal and visual information, leading to increased complexity in the grasp of interframe relationships about the model. Conversely, when the temporal self-attention module is located at the end of ViT, it ensures that the model first acquires sufficient abstract visual semantic information before incorporating temporal data. This strategic positioning empowers the temporal self-attention mechanism to enhance the model's understanding of relationships among distinct time frames within the input sequence. Consequently, this arrangement facilitates a more accurate capture of crucial information within the sequence. Given that action recognition fundamentally relies on a profound comprehension of inter-frame relationships, the inclusion of temporal self-attention at the end of ViT consistently outperforms its alternative placements.

    Table 5.  Comparison of RGB results of different segments and whether to add temporal attention on UCF101.
    Num of segments Head/% End/% None/% GFLOPs (× 109)
    1Frame (RGB) 85.03 17.58
    3Frames (RGB) 88.23 90.12 89.67 52.74 (+ 4.16)
    6Frames (RGB) 85.72 88.71 86.55 105.42 (+ 8.41)
    9Frames (RGB) 84.85 87.39 86.36 158.13 (+ 12.63)
    3Frames (Optical) 85.49 86.27 85.33 175.62 (+ 41.69)
    3Frames (RGB + Optical) 95.46 96.12 95.63 228.36 (+ 45.85)

     | Show Table
    DownLoad: CSV

    Moreover, this study has conducted experiments employing 3, 6 and 9 segments to investigate the impact of segment count on model accuracy. As indicated in Table 5, video segmentation surpasses the performance of non-segmented data, emphasizing that segmenting videos augments the efficiency of the model in handling prolonged sequential data. However, it is imperative to note that an excessive number of segments can impede the effective extraction of temporal features and result in heightened computational demands.

    Table 6 provides a comprehensive overview of the experimental results at each stage of our study. As this paper advanced through the precise detection stage, we consistently observed a noteworthy increase in accuracy, unequivocally affirming the effectiveness of the enhancements made.

    Table 6.  Results of the ablation validation experiment on UCF101.
    Model Accuracy/%
    ViT (RGB) 86.35
    ViT (RGB + Optical) 94.23
    ViT + Decision Fusion (0.6) 94.59
    ViT + SimAM 95.35
    ViT + Joint Loss 95.63
    ViT + Temporal Attention 96.12

     | Show Table
    DownLoad: CSV
    Table 7.  Comparison of accuracy of different models on UCF101 and HMDB51.
    Model UCF101/% HMDB51/%
    iDT + FV 85.93 57.2
    iDT + HSV 87.92 61.11
    Two-Stream 88.02 59.4
    ResNext-101 + SE + SA (16f) [31] 92.5 68.0
    TSN 92.64 65.74
    HAR-depth [32] 93.0 69.7
    spatio-temporal STFT [33] 94.7 71.5
    HME-Net [34] 94.8 72.2
    BifurcatedNet [35] 94.9 72.1
    ViTSN 96.12 73.5

     | Show Table
    DownLoad: CSV

    To ascertain the superiority of the proposed method, this work conducted a rigorous comparative analysis of its recognition performance against other widely adopted methodologies. As depicted in Table 6, the proposed model outperforms the prevailing algorithms currently in use. When compared to TSN, the proposed approach demonstrates substantial improvements in accuracy, achieving a noteworthy enhancement of 3.48% on UCF101 and 7.76% on HMDB51. This underscores the ability of the method to significantly elevate the accuracy of models in the realm of action recognition, highlighting the numerous advantages inherent in the approach presented in this paper.

    ViTSN undergoes a rigorous comparative evaluation against the baseline TSN framework on the PSAR dataset to assess its generalization capabilities. As detailed in Table 8, the results affirm the suitability of the proposed model for real-world applications, particularly in the context of public security. Compared to TSN, the method proposed in this paper first embeds non-parametric attention modules and temporal attention modules at the beginning and end of the network. In order to evaluate feature weights, it uses a more thorough and effective attention mechanism, which improves the model's capacity to handle background noise and lengthy temporal sequence data. Second, it fully captures the spatial relationships between image block features by utilizing a spatio-temporal feature extraction module based on pretrained ViT, which produces more representative image feature vectors. To further enhance the model's classification capabilities, the training loss function has been adjusted. It highlights the aptitude of the model for finegrained differentiation of abnormal actions and the extraction of specific video evidence from extensive monitoring data. Consequently, this approach shows some benefits by increasing accuracy in behavior recognition tests.

    Table 8.  Comparison of accuracy of different models on PSAR.
    Model RGB/% Flow/% Two Stream/%
    TSN 32.04 35.4 42.99
    ViTSN 51.35 40.81 51.79

     | Show Table
    DownLoad: CSV

    Additionally, in Figure 10, this work presents the confusion matrices for both models, encompassing the recognition of seven distinct action classes within the test dataset. A noteworthy distinction emerges when comparing the proposed model to the baseline. The method excels in the recognition of abnormal actions and exhibits superior performance in identifying abnormal action classes, significantly enhancing its applicability to the practical demands of public security work.

    Figure 10.  Comparison of confusion matrices of different models on PSAR.

    This paper presents YOWOv2E, an advanced lightweight model for video abnormal action detection tailored for forensic science. YOWOv2E integrates ShuffleNetv2E for feature extraction, is augmented with the ECA mechanism, and employs a joint loss function to significantly enhance the accuracy of detecting abnormal actions in complex surveillance scenarios. Additionally, this paper introduces a spatio-temporal two-stream network model based on ViT, incorporating the SimAM attention mechanism to enhance the model's resilience to interference. To handle long sequential data, segmental sampling strategies are implemented, and decision layer fusion is employed to improve accuracy. The effectiveness of the proposed model is further validated on the PSA-Dataset, a novel surveillance abnormal action dataset developed in this study, highlighting its robust generalization performance. This research aligns with the practical requirements of public security work and offers valuable support to agencies involved in electronic evidence investigations.

    However, the sample size for some categories in this dataset is still insufficient because publicly available surveillance footage of abnormal actions is scarce, which may have hindered the model's detection capabilities. Future research could concentrate on a number of areas, including expanding the dataset's sample size and scene variety, investigating techniques based on weak supervision or self-supervision to lessen the labor-intensive nature of manual labeling, and creating more effective and lightweight model structures for simpler deployment. All of these directions merit additional research.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This work was supported by the Fundamental Research Funds for the Central Universities, China No. 2023JKF01ZK05.

    The authors declare there is no conflict of interest.



    [1] Abdollahi H (2020) A Novel Hybrid Model for Forecasting Crude Oil Price Based on Time Series Decomposition. Appl Energy 267: 115035. https://doi.org/10.1016/j.apenergy.2020.115035 doi: 10.1016/j.apenergy.2020.115035
    [2] Aho K, Derryberry DW, Peterson T (2016) Model Selection for Ecologists: The Worldviews of Aic and Bic. Ecol 95: 631-636. https://www.jstor.org/stable/43495189
    [3] Ajmi AN, Hammoudeh S, Mokni K (2021) Detection of Bubbles in Wti, Brent, and Dubai Oil Prices: A Novel Double Recursive Algorithm. Resour Policy 70: 101956. https://doi.org/10.1016/j.resourpol.2020.101956 doi: 10.1016/j.resourpol.2020.101956
    [4] Azadeh A, Moghaddam M, Khakzad M, et al. (2012) A Flexible Neural Network-Fuzzy Mathematical Programming Algorithm for Improvement of Oil Price Estimation and Forecasting. Comput Indl Eng 62: 421-30. https://doi.org/10.1016/j.cie.2011.06.019 doi: 10.1016/j.cie.2011.06.019
    [5] Butler S, Kokoszka P, Miao H, et al. (2021) Neural Network Prediction of Crude Oil Futures Using B-Splines. Energy Econ 94: 105080. https://doi.org/10.1016/j.eneco.2020.105080 doi: 10.1016/j.eneco.2020.105080
    [6] Chen L, Zhang Z, Chen F, et al. (2019) A Study on the Relationship between Economic Growth and Energy Consumption under the New Normal. Natl Account Rev 1: 28-41. https://doi.org/10.3934/nar.2019.1.28 doi: 10.3934/NAR.2019.1.28
    [7] Chiroma H, Abdulkareem S, Herawan T (2015) Evolutionary Neural Network Model for West Texas Intermediate Crude Oil Price Prediction. Appl Energy 142: 266-273. https://doi.org/10.1016/j.apenergy.2014.12.045 doi: 10.1016/j.apenergy.2014.12.045
    [8] Fan D, Sun H, Yao J, et al. (2021) Well Production Forecasting Based on Arima-Lstm Model Considering Manual Operations. Energy 220: 119708. https://doi.org/10.1016/j.energy.2020.119708 doi: 10.1016/j.energy.2020.119708
    [9] Fischer T, Krauss C (2018) Deep Learning with Long Short-Term Memory Networks for Financial Market Predictions. Eur J Oper Res 270: 654-669. https://doi.org/10.1016/j.ejor.2017.11.054 doi: 10.1016/j.ejor.2017.11.054
    [10] Gori F, Ludovisi D, Cerritelli P (2007) Forecast of Oil Price and Consumption in the Short Term under Three Scenarios: Parabolic, Linear and Chaotic Behaviour. Energy 32: 1291-1296. https://doi.org/10.1016/j.energy.2006.07.005 doi: 10.1016/j.energy.2006.07.005
    [11] Grace SP, Kanamura T (2020) Examining Risk and Return Profiles of Renewable Energy Investment in Developing Countries: The Case of the Philippines. Green Financ 2: 135-150. https://doi.org/10.3934/gf.2020008 doi: 10.3934/GF.2020008
    [12] Graves A (2012) Long Short-Term Memory, A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks. Berlin, Heidelberg: Springer Berlin Heidelberg, 37-45.
    [13] He K, Yu L, Lai KK (2012) Crude Oil Price Analysis and Forecasting Using Wavelet Decomposed Ensemble Model. Energy 46: 564-574. https://doi.org/10.1016/j.energy.2012.07.055 doi: 10.1016/j.energy.2012.07.055
    [14] Hochreiter S, Schmidhuber J (1997) Long Short-Term Memory. Neural comput 9: 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735 doi: 10.1162/neco.1997.9.8.1735
    [15] James DH (2009) Causes and Consequences of the Oil Shock of 2007-08. Brookings Papers on Economic Activity 215-261. https://doi.org/10.1353/eca.0.0047 doi: 10.1353/eca.0.0047
    [16] Lammerding M, Stephan P, Trede M, et al. (2013) Speculative Bubbles in Recent Oil Price Dynamics: Evidence from a Bayesian Markov-Switching State-Space Approach. Energy Econ 36: 491-502. https://doi.org/10.1016/j.eneco.2012.10.006 doi: 10.1016/j.eneco.2012.10.006
    [17] Li T, Liao G (2020) The Heterogeneous Impact of Financial Development on Green Total Factor Productivity. Front Energy Res 8: 29. https://doi.org/10.3389/fenrg.2020.00029 doi: 10.3389/fenrg.2020.00029
    [18] Li T, Zhong J, Huang Z (2020a) Potential Dependence of Financial Cycles between Emerging and Developed Countries: Based on Arima-Garch Copula Model. Emerg Mark Financ Trade 56: 1237-1250. https://doi.org/10.1080/1540496X.2019.1611559 doi: 10.1080/1540496X.2019.1611559
    [19] Li X, Shang W, Wang S (2019) Text-Based Crude Oil Price Forecasting: A Deep Learning Approach. Int J Forecasting 35: 1548-60. https://doi.org/10.1016/j.ijforecast.2018.07.006 doi: 10.1016/j.ijforecast.2018.07.006
    [20] Li Z, Dong H, Floros C, et al. (2021) Re-Examining Bitcoin Volatility: A Caviar-Based Approach. Emerg Mark Financ Trade: 1-19. https://doi.org/10.1080/1540496X.2021.1873127 doi: 10.1080/1540496X.2021.1873127
    [21] Li Z, Wang Y, Huang Z (2020b) Risk Connectedness Heterogeneity in the Cryptocurrency Markets. Front Phys 8: 243. https://doi.org/10.3389/fphy.2020.00243 doi: 10.3389/fphy.2020.00243
    [22] Lin Y, Xiao Y, Li F (2020) Forecasting Crude Oil Price Volatility Via a Hm-Egarch Model. Energy Econ 87: 104693. https://doi.org/10.1016/j.eneco.2020.104693 doi: 10.1016/j.eneco.2020.104693
    [23] Lu Q, Li Y, Chai J, et al. (2020) Crude Oil Price Analysis and Forecasting: A Perspective of "New Triangle". Energy Econs 87: 104721. https://doi.org/10.1016/j.eneco.2020.104721 doi: 10.1016/j.eneco.2020.104721
    [24] Mostafa MM, El-Masry AA (2016) Oil Price Forecasting Using Gene Expression Programming and Artificial Neural Networks. Econ Model 54: 40-53. https://doi.org/10.1016/j.econmod.2015.12.014 doi: 10.1016/j.econmod.2015.12.014
    [25] Murat A, Tokat E (2009) Forecasting Oil Price Movements with Crack Spread Futures. Energy Econ 31: 85-90. https://doi.org/10.1016/j.eneco.2008.07.008 doi: 10.1016/j.eneco.2008.07.008
    [26] Nonejad N (2020) Should Crude Oil Price Volatility Receive More Attention Than the Price of Crude Oil? An Empirical Investigation Via a Large-Scale out-of-Sample Forecast Evaluation of Us Macroeconomic Data. J Forecasting. https://doi.org/10.1002/for.2738 doi: 10.1002/for.2738
    [27] Ouyang ZS, Yang XT, Lai Y (2021) Systemic Financial Risk Early Warning of Financial Market in China Using Attention-Lstm Model. North Am J Econ Financ 56: 101383. https://doi.org/10.1016/j.najef.2021.101383 doi: 10.1016/j.najef.2021.101383
    [28] Pabuçcu H, Ongan S, Ongan A (2020) Forecasting the Movements of Bitcoin Prices: An Application of Machine Learning Algorithms. Quant Financ Econ 4: 679-692. https://doi.org/10.3934/qfe.2020031 doi: 10.3934/QFE.2020031
    [29] Ramyar S, Kianfar F (2017) Forecasting Crude Oil Prices: A Comparison between Artificial Neural Networks and Vector Autoregressive Models. Comput Econ 53: 743-761. https://doi.org/10.1007/s10614-017-9764-7 doi: 10.1007/s10614-017-9764-7
    [30] Shibata R (1976) Selection of the Order of an Autoregressive Model by Akaike's Information Criterion. Biometrika 63: 117-126. https://doi.org/10.1093/biomet/63.1.117 doi: 10.1093/biomet/63.1.117
    [31] Wei Y, Wang Y, Huang D (2010) Forecasting Crude Oil Market Volatility: Further Evidence Using Garch-Class Models. Energy Econ 32: 1477-1484. https://doi.org/10.1016/j.eneco.2010.07.009 doi: 10.1016/j.eneco.2010.07.009
    [32] Yu L, Dai W, Tang L, et al. (2015) A Hybrid Grid-Ga-Based Lssvr Learning Paradigm for Crude Oil Price Forecasting. Neural Comput Appls 27: 2193-2215. https://doi.org/10.1007/s00521-015-1999-4 doi: 10.1007/s00521-015-1999-4
    [33] Yu L, Zha R, Stafylas D, et al. (2020) Dependences and Volatility Spillovers between the Oil and Stock Markets: New Evidence from the Copula and Var-Bekk-Garch Models. Int Rev Financ Anal 68. https://doi.org/10.1016/j.irfa.2018.11.007 doi: 10.1016/j.irfa.2018.11.007
    [34] Zhang JL, Zhang YJ, Zhang L (2015) A Novel Hybrid Method for Crude Oil Price Forecasting. Energy Econ 49: 649-659. https://doi.org/10.1016/j.eneco.2015.02.018 doi: 10.1016/j.eneco.2015.02.018
    [35] Zhang Y, Ma F, Wang Y (2019) Forecasting Crude Oil Prices with a Large Set of Predictors: Can Lasso Select Powerful Predictors? J Empir Financ 54: 97-117. https://doi.org/10.1016/j.jempfin.2019.08.007 doi: 10.1016/j.jempfin.2019.08.007
    [36] Zhao Y, Li J, Yu L (2017) A Deep Learning Ensemble Approach for Crude Oil Price Forecasting. Energy Econ 66: 9-16. https://doi.org/10.1016/j.eneco.2017.05.023 doi: 10.1016/j.eneco.2017.05.023
    [37] Zheng Y, Du Z (2019) A Systematic Review in Crude Oil Markets: Embarking on the Oil Price. Green Financ 1: 328-345. https://doi.org/10.3934/gf.2019.3.328 doi: 10.3934/GF.2019.3.328
    [38] Zhong J, Wang M, M Drakeford B, et al. (2019) Spillover Effects between Oil and Natural Gas Prices: Evidence from Emerging and Developed Markets. Green Financ 1: 30-45. https://doi.org/10.3934/gf.2019.1.30 doi: 10.3934/GF.2019.1.30
  • This article has been cited by:

    1. Abdul Alamin, Sankar Prasad Mondal, Kunal Biswas, Shariful Alam, 2020, chapter 6, 9781799837411, 120, 10.4018/978-1-7998-3741-1.ch006
    2. Zhimin Han, Yi Wang, Jinde Cao, Impact of contact heterogeneity on initial growth behavior of an epidemic: Complex network-based approach, 2023, 451, 00963003, 128021, 10.1016/j.amc.2023.128021
  • Reader Comments
  • © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(5950) PDF downloads(619) Cited by(12)

Figures and Tables

Figures(7)  /  Tables(3)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog