Aerial images object detection method based on cross-scale multi-feature fusion

Yang Pan; Jinhua Yang; Lei Zhu; Lina Yao; Bo Zhang; Yang Pan; Jinhua Yang; Lei Zhu; Lina Yao; Bo Zhang

doi:10.3934/mbe.2023721

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 9: 16148-16168. doi: 10.3934/mbe.2023721

Previous Article Next Article

Research article Special Issues

Aerial images object detection method based on cross-scale multi-feature fusion

School of Electronics and Information, Xi'an Polytechnic University, Xi'an 710048, China

Academic Editor: Jong Hyuk Park

Received: 07 June 2023 Revised: 17 July 2023 Accepted: 27 July 2023 Published: 09 August 2023

Aerial image target detection technology has essential application value in navigation security, traffic control and environmental monitoring. Compared with natural scene images, the background of aerial images is more complex, and there are more small targets, which puts higher requirements on the detection accuracy and real-time performance of the algorithm. To further improve the detection accuracy of lightweight networks for small targets in aerial images, we propose a cross-scale multi-feature fusion target detection method (CMF-YOLOv5s) for aerial images. Based on the original YOLOv5s, a bidirectional cross-scale feature fusion sub-network (BsNet) is constructed, using a newly designed multi-scale fusion module (MFF) and cross-scale feature fusion strategy to enhance the algorithm's ability, that fuses multi-scale feature information and reduces the loss of small target feature information. To improve the problem of the high leakage detection rate of small targets in aerial images, we constructed a multi-scale detection head containing four outputs to improve the network's ability to perceive small targets. To enhance the network's recognition rate of small target samples, we improve the K-means algorithm by introducing a genetic algorithm to optimize the prediction frame size to generate anchor boxes more suitable for aerial images. The experimental results show that on the aerial image small target dataset VisDrone-2019, the proposed method can detect more small targets in aerial images with complex backgrounds. With a detection speed of 116 FPS, compared with the original algorithm, the detection accuracy metrics mAP_0.5 and mAP_0.5:0.95 for small targets are improved by 5.5% and 3.6%, respectively. Meanwhile, compared with eight advanced lightweight networks such as YOLOv7-Tiny and PP-PicoDet-s, mAP_0.5 improves by more than 3.3%, and mAP_0.5:0.95 improves by more than 1.9%.

Keywords:

Citation: Yang Pan, Jinhua Yang, Lei Zhu, Lina Yao, Bo Zhang. Aerial images object detection method based on cross-scale multi-feature fusion[J]. Mathematical Biosciences and Engineering, 2023, 20(9): 16148-16168. doi: 10.3934/mbe.2023721

Related Papers:

[1]	Hongqiang Zhu . A graph neural network-enhanced knowledge graph framework for intelligent analysis of policing cases. Mathematical Biosciences and Engineering, 2023, 20(7): 11585-11604. doi: 10.3934/mbe.2023514
[2]	Xiaowen Jia, Jingxia Chen, Kexin Liu, Qian Wang, Jialing He . Multimodal depression detection based on an attention graph convolution and transformer. Mathematical Biosciences and Engineering, 2025, 22(3): 652-676. doi: 10.3934/mbe.2025024
[3]	Xin Liu, Chen Zhao, Bin Zheng, Qinwei Guo, Yuanyuan Yu, Dezheng Zhang, Aziguli Wulamu . Spatiotemporal and kinematic characteristics augmentation using Dual-GAN for ankle instability detection. Mathematical Biosciences and Engineering, 2022, 19(10): 10037-10059. doi: 10.3934/mbe.2022469
[4]	Saranya Muniyappan, Arockia Xavier Annie Rayan, Geetha Thekkumpurath Varrieth . DTiGNN: Learning drug-target embedding from a heterogeneous biological network based on a two-level attention-based graph neural network. Mathematical Biosciences and Engineering, 2023, 20(5): 9530-9571. doi: 10.3934/mbe.2023419
[5]	Shuo Zhang, Yonghao Ren, Jing Wang, Bo Song, Runzhi Li, Yuming Xu . GSTCNet: Gated spatio-temporal correlation network for stroke mortality prediction. Mathematical Biosciences and Engineering, 2022, 19(10): 9966-9982. doi: 10.3934/mbe.2022465
[6]	Yijia Wang, Na Xie, Zhe Wang, Shuzhen Ding, Xijian Hu, Kai Wang . Spatio-temporal distribution characteristics of the risk of viral hepatitis B incidence based on INLA in 14 prefectures of Xinjiang from 2004 to 2019. Mathematical Biosciences and Engineering, 2023, 20(6): 10678-10693. doi: 10.3934/mbe.2023473
[7]	Suqi Zhang, Wenfeng Wang, Ningning Li, Ningjing Zhang . Multi-behavioral recommendation model based on dual neural networks and contrast learning. Mathematical Biosciences and Engineering, 2023, 20(11): 19209-19231. doi: 10.3934/mbe.2023849
[8]	Lu Yuan, Yuming Ma, Yihui Liu . Protein secondary structure prediction based on Wasserstein generative adversarial networks and temporal convolutional networks with convolutional block attention modules. Mathematical Biosciences and Engineering, 2023, 20(2): 2203-2218. doi: 10.3934/mbe.2023102
[9]	Kunli Zhang, Bin Hu, Feijie Zhou, Yu Song, Xu Zhao, Xiyang Huang . Graph-based structural knowledge-aware network for diagnosis assistant. Mathematical Biosciences and Engineering, 2022, 19(10): 10533-10549. doi: 10.3934/mbe.2022492
[10]	Hanyu Zhao, Chao Che, Bo Jin, Xiaopeng Wei . A viral protein identifying framework based on temporal convolutional network. Mathematical Biosciences and Engineering, 2019, 16(3): 1709-1717. doi: 10.3934/mbe.2019081

Abstract

1. Introduction

With the rapid development of companies such as Didi, Uber, and Grab in the global ride-hailing service sector, ride-hailing has become one of the primary modes of transportation for people. According to statistics, there are currently 322 ride-hailing platform companies in China^[1], and the number of users has reached 472 million^[2]. In addition, the number of ride-hailing drivers has reached a high of 5,976,000 as of July 2023, an increase of 1,376,000 compared to last year^[3]. In July, there were 821 million order information records. These data clearly show that the ride-hailing industry is in a stage of rapid growth and booming development. However, under the trend of expanding the ride-hailing market, balancing the distribution of supply and demand is still an urgent problem for ride-hailing platforms ^[4,5,6]. This is mainly in two aspects: From a passenger's perspective, due to the uncertainty of passenger travel and the aggregation of passengers, there may be longer waiting times for vehicles during peak hours or in specific areas. From a driver's perspective, drivers often offer services in areas where they believe there is more passenger demand. However, this can lead to oversupply in some areas and undersupply in others^[7]. In the face of numerous driver and user demands, it has become crucial for ride-hailing platforms to fully utilize their existing operational data for effective demand forecasting and scheduling. This can enhance service quality, improve user experience, and increase vehicle utilization^[8,9].

The prediction of ride-hailing demand shares many similarities with traditional taxi and traffic flow forecasting. Previous research in the field of transportation has laid the foundation for predicting ride-hailing demand. As early as 1978, Yang et al.^[10] considered factors such as the number of taxis, taxi fares, and disposable income as endogenous variables in their study aimed at improving service levels. In 1972, Douglas ^[11] indicated that a reasonable number of taxis and pricing could enhance the service quality for passengers. With the widespread application of Global Positioning System (GPS) in taxis, a foundation was laid for research based on GPS data. In 2010, Bazzani et al.^[12] utilized GPS data to analyze complex social systems. Asmundsdottir et al.^[13] through the analysis of taxi GPS data, extracted travel characteristics of taxi passengers. However, predicting the demand for ride-hailing orders is a complex task. Its complexity is not only dependent on GPS data ^[14], but is also influenced by various factors such as time ^[15,16], space ^[17,18,19], and the environment ^[20,21]. This can be regarded as a complex spatiotemporal data prediction problem. Currently, spatiotemporal data prediction encompasses various aspects such as taxi demand ^[7], traffic flow ^[16], shared bicycle demand ^[22], etc. These share similarities with ride-hailing demand, demonstrating continuous spatial distribution and interconnectedness between areas. To address spatial distribution challenges, it is essential to partition them into grids ^[23,24,25] or structures based on road networks ^[26], thereby transforming spatial issues into graph model processing. To investigate the impact of region partitioning methods, Davis et al.^[27] analyzed the impact of different spatial partitioning strategies on taxi demand prediction and proposed an efficient hybrid surface subdivision algorithm. In addition, external environmental factors such as weather conditions, holidays, and the distribution of Points of Interest (POI) have a significant impact on the demand for ride-hailing services. For instance, during rainy or extremely hot weather, individuals may be more inclined to use ride-hailing services, leading to an increase in demand. Similarly, during holiday periods, people might prefer choosing ride-hailing as their mode of transportation for travel or social activities, resulting in a potential surge in demand. Furthermore, individuals tend to seek ride-hailing services more frequently in commercial areas, tourist attractions, or event centers. Taking into account these factors, a comprehensive analysis of external spatiotemporal elements enhances our understanding and prediction of fluctuations in ride-hailing demand. References ^[7,24] confirm that studies considering time and weather conditions are promising. In such a complex and dynamic predictive environment, designing an accurate prediction model is crucial for enhancing the quality of ride-hailing services.

Researchers in the field of transportation have already accumulated rich and in-depth achievements, covering aspects such as traffic flow prediction, taxi order prediction, and ride-hailing order prediction. We can categorize the research methods into the following three types:

Prediction models based on statistical methods. Ride-hailing demand prediction is similar to other transportation prediction and can be viewed as a time series prediction problem^[28]. Representative models in this field of application include the historical average model (HA)^[29], the differential autoregressive moving average model (ARIMA) ^[30], and its variants. Williams et al. ^[31] proposed and demonstrated in 2003 that the Seasonal Autoregressive Integrated Moving Average (SARIMA) model is capable of capturing the seasonality in time series data. Moreira-Matias et al. ^[32] validated the feasibility of the ARIMA model in predicting taxi passenger demand using GPS trajectory data from Porto. Singh et al. ^[33] demonstrated the superiority of the ARIMA model by predicting the performance of virtual machines. However, these traditional models impose strict linear assumptions, insufficiently consider spatiotemporal correlations and the influence of external factors, and are incapable of handling nonlinear features. Therefore, their predictive performance is suboptimal when influenced by external factors.

Prediction models based on traditional machine learning. In recent years, machine learning methods have gradually become the primary methods for demand prediction^[34,35,36], and they can achieve higher prediction accuracy and more sophisticated data modeling. For example, Yang and Gonzales ^[37] mined the factors of taxi demand from the number of cab users and socio-economic and employment data in New York. They used a multiple linear regression model to analyze passenger flow prediction in a particular area and verified its validity. Jiang et al. ^[38] proposed a least-squares support vector machine (LS-SVM) based method for ride-hailing short-term prediction and demonstrated its excellent performance. Peñalvo et al.^[39] proposed a machine learning framework for predicting the fluctuation of stock prices. Lippi et al. ^[40] constructed a Support Vector Regression (SVR) model with seasonal identification capability to extract the seasonality of traffic flow. Castro-Neto et al. ^[41] proposed the Online-SVR (OL-SVR) prediction model, considering both typical and atypical conditions, thereby enhancing predictive capabilities under atypical conditions. However, when machine learning is utilized for complex data prediction, challenges such as poor predictive accuracy and overfitting may arise, representing limitations inherent in machine learning.

Deep learning-based prediction modeling. With the rapid development of deep learning methods in various fields such as computer vision ^[42,43], natural language processing ^[44], and recommendation systems ^[45], the application scope continues to expand. Traffic prediction ^[46,47] is a crucial domain where deep learning methods excel in capturing the nonlinearity and dynamic trends of data for modeling ^[48,49,50]. Demand prediction for ride-hailing is a typical time-series prediction problem. In the early stages, researchers commonly utilized Recurrent Neural Networks (RNN) for time-series data prediction. However, RNN faces challenges such as vanishing and exploding gradients, limiting its ability to capture long-term dependencies. Conversely, variants of RNN, such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU), demonstrate certain advantages in capturing temporal dependencies and are frequently utilized for extracting time-dependent features in time-series prediction data ^[15,16]. Dogan ^[51] demonstrated that expanding the dataset of traffic flow can enhance the predictive performance of LSTMs. Kouziokas ^[52] optimized unidirectional LSTM and proposed Bidirectional LSTM (Bi-LSTM) to improve prediction accuracy. Dai et al. ^[53] proved in their research on traffic flow prediction that GRU outperforms LSTM in terms of performance. Additionally, spatial relationships are also a crucial factor that needs to be fully considered in this research field, contributing to extracting the spatial variations of transportation systems. Huang et al. ^[54] utilized a Convolutional Neural Network (CNN) model for regional partitioning to predict the demand for ride-hailing trips. However, CNN, when dealing with regional connectivity graphs, represents the regional network in the form of a two-dimensional image, limiting its applicability in non-Euclidean topology regional networks. Therefore, in recent years, many researchers have addressed the limitations of regional topology structures by employing Graph Convolutional Networks (GCNs) for processing ^[18]. Compared to CNNs, GCNs are better suited for capturing the spatial dependencies of regional networks ^[55]. Hence, Geng et al.^[56] in their study of non-Euclidean regional structures, utilized GCN as a graph convolutional module and proposed the Spatiotemporal Multi-Graph Convolutional Network (ST-MGCN) model for demand prediction.

To better adapt to various complex environments and fully leverage the advantages of different algorithms in extracting spatiotemporal correlations, researchers are gradually and widely applying composite models ^[46,57,58]. In 2009, Tsai et al. ^[59] demonstrated the superiority of composite models through Parallel Ensemble Neural Networks (PENN). Li and Zhu ^[60] enhanced traffic flow prediction performance by integrating graph modules and gated convolutional modules. Ke et al. ^[8], considering the temporal, spatial, and exogenous dependencies of ride-hailing demand, proposed a Fusion Convolutional Long Short-Term Memory Network (FCL-Net) by combining Cov-LSTM, LSTM, and CNN, showing strong adaptability in predictions. In 2018, Li et al. ^[47] proposed a model called the Diffusion Convolutional Recurrent Neural Network (DCRNN) to address the complex spatial characteristics of road networks and the non-linear temporal dynamics of road condition changes. The model utilizes bidirectional random walks in the graph structure to capture spatial dependencies and employs a predetermined sampling encoder-decoder architecture to capture temporal dependencies. Zhao et al. ^[61] introduced a Time Graph Convolutional Network (T-GCN) model, which combines GCN with GRU. This model takes advantage of GCN for spatial information extraction and GRU for capturing dynamic temporal relationships to predict traffic flow, producing predictions close to real dataset values.

According to the above analysis, despite the current capability of many studies in extracting spatiotemporal relationships for traffic prediction, there is still a deficiency in capturing the impact of external spatiotemporal factors. On one hand, a majority of studies either neglect external spatiotemporal factors or insufficiently extract key information during the extraction process. On the other hand, the use of a single model is often susceptible to the influence of data complexity, resulting in suboptimal predictive accuracy. To address these issues, this study proposes a Spatiotemporal Information-Enhanced Graph Convolutional Network model (EST-GCN) that effectively tackles both of these challenges. Our main contributions to the work are as follows:

(1) The paper introduces an innovative model for predicting ride-hailing demand, named EST-GCN. It utilizes correlation analysis to extract essential information from external factors and integrates it with a spatiotemporal graph convolutional model. This is designed to accurately capture the spatiotemporal features of ride-hailing demand and the influence of external spatiotemporal factors.

(2) The EST-GCN model can adapt to the effects of weather conditions, date attributes, and the distribution of POIs, enabling more accurate prediction of ride-hailing demand in different environments.

(3) We evaluated the model using actual operational data, and the experimental results show that the EST-GCN model outperforms the baseline method in prediction and has vital portability.

2. Problem definition and analysis

2.1. Problem definition

Definition 1: Spatial Gridding

Based on the latitude and longitude of the city, the size and shape of each hexagon are determined to partition the entire area. As illustrated in , the city is partitioned into a spatial hexagonal grid of $P\times Q$ specifications, with each spatial grid referring to an area ${{S}_{ij}}(i\in 1...P, j\in 1...Q)$ .

Figure 1. Spatial division of the city into hexagonal grids.

DownLoad: Full-Size Img PowerPoint

Definition 2: Demand Characterization Matrix $X$

The demand for ride-hailing refers to the users' need for ride-hailing services during a specific period, typically measured using the number of orders placed. In this paper, ${{x}_{t}}$ represents the demand of the $t$ th moment.

Definition 3:Areas Network $G$

We approximate the spatial grid as a transportation network and utilize the graph structure $G = (V, E)$ to represent the connectivity between different areas network. $V = (v_1, v_2, ..., v_n)$ denotes the set of spatial area grids, $n$ the number of grids, $E = ({{e}_{1}}, {{e}_{2}}, ..., {{e}_{m}})$ the set of edges denoting the connectivity between two areas, and $m$ the number of edges. Then, the adjacency matrix $A$ is used to represent the connectivity of the areas network.

Definition 4: External attribute matrix $H$

We form factors such as time periodicity, POIs, weather, and date attributes into a feature matrix $H = {{{h}_{1}}, {{h}_{2}}, ..., {{h}_{c}}}$ , where $c$ is the category number of external spatiotemporal factors. The time-varying information for the class $j$ of factors is represented as ${H}_{j} = {{{j}_{1}}, {{j}_{2}}, ..., {{j}_{t}}}$ , while for factors that do not vary with time, ${j}_{t}$ remains a fixed value.

2.2. Demand prediction

Ride-hailing demand prediction is a spatiotemporal data prediction problem that varies continuously over time in different areas. This type of problem requires extractive modeling of temporal and spatial relationships^[26,62]. Figure 2 illustrates the spatiotemporal correlation of ride-hailing demands. In the spatial dimension Figure 2(a), the neighborhoods of different areas form a network graph, and each vertex state of the graph represents the ride-hailing demand of the area, and in the temporal dimension Figure 2(b), the ride-hailing of the different areas is constantly changing with time. In conclusion, the correlation of ride-hailing demands shows strong dynamics in both spatial and temporal dimensions.

Figure 2. Spatial and temporal variation of ride-hailing demand.

DownLoad: Full-Size Img PowerPoint

Building upon the exploration of spatiotemporal features, this paper further incorporates external spatiotemporal factors into the model, thereby enhancing the model's ability to perceive the impact of external factors.

To summarize, the ride-hailing demand prediction problem can be understood as predicting the most probable demand result in the following $T$ time steps given the topological network $G$ , the demand feature matrix $X$ , and the external attribute matrix $H$ , combined with the given $n$ historical demand measurement values. The mapping relationship for this problem can be defined and represented as

$\begin{equation} f({{X}_{t-n:t}}|H,G)\to {{X}_{(t+1):(t+T)}} \end{equation}$

(2.1)

2.3. External spatial and temporal factors

To comprehensively account for the external spatiotemporal factors affecting ride-hailing demand, we divide these into two main categories: dynamic factors that change over time, and static factors that do not change with short-term fluctuations in time.

(1) Static Factors

Static factors impact demand that does not change over a short-term time horizon. For example, POI distribution information and date attributes do not vary from area to area over short time scales. Still, the characteristics they imply have the potential to be able to influence the movement and aggregation of people within an area. As shown in Figure 3, we can observe a difference in the number of POIs and demand between Area 1 and Area 2. In Area 1, the number of POIs is higher, and the order is higher during the time of day when the activity occurs. Within a week, the demand on weekends is significantly higher than on weekdays. These analyses indicate that the distribution of POI and date attributes has an impact on ride-hailing demand.

Figure 3. Static attribute impacts.

DownLoad: Full-Size Img PowerPoint

(2) Dynamic Factors

Dynamic factors change over time and can impact ride-hailing demand. For example, weather conditions can significantly affect travel, which directly affects the demand for ride-hailing. Figure 4 shows the variation of ride-hailing demand in the same area under different weather conditions. Specifically, during the rainy period, the demand surges and deviates far from the order quantity during regular hours. The analysis shows that weather has an enormous impact on ride-hailing demand.

Figure 4. Dynamic attribute impact.

DownLoad: Full-Size Img PowerPoint

3. Methods

3.1. Framework

This method integrates the features of ride-hailing demand in the area with external factor features. It employs correlation analysis to extract the main features, utilizes a GRU layer to capture temporal features, and incorporates a GCN layer to extract spatial features, enhancing the accuracy of ride-hailing predictions. We present the framework of our work in Figure 5, comprising four main components: data preprocessing, integration of external attributes, modeling spatiotemporal dependencies, and prediction.

Figure 5. EST-GCN modeling framework. The P module is used for correlation extraction, and the C module is used for combining all the data.

DownLoad: Full-Size Img PowerPoint

In the data preprocessing phase, we conducted cleaning and feature engineering on the original dataset. For the extraction of spatiotemporal features, we employed the Pearson correlation coefficient to analyze the correlation. Features with correlations greater than the threshold $\alpha$ were selected for model training. Subsequently, we performed encoding and fusion processing on the selected external features and the ride-hailing demand data.

To effectively model spatiotemporal dependencies, we have chosen a combination of GRU and GCN models. These two models are used to extract the temporal and spatial features of ride-hailing demand data, enhancing the overall prediction accuracy. The GRU model is responsible for capturing temporal changes, while the GCN model focuses on modeling the spatial relationships between different locations in the transportation network. Through this combination, we expect to comprehensively consider spatiotemporal factors and improve the accurate prediction of ride-hailing demand.

3.2. Spatio-temporal factor feature extraction and enhance methods

This study conducts experiments by analyzing the correlation between external factors and short-term demand for ride-hailing. It extracts highly correlated attributes to mitigate the impact of the specificity of numerical values on experimental results.

(1) Feature extraction

We use the Pearson correlation coefficient to characterize the strength of linear correlation, denoted as $r,$ between two attributes. We calculate the value of $r$ using Eq (3.1) and select features with an $r$ value greater than the threshold $\alpha$ for experimentation.

$\begin{equation} r = \frac{\sum_{i = 1}^n\left(x_i-\bar{x}\right)\left(y_i-\bar{y}\right)}{\sqrt{\sum_{i = 1}^n\left(x_i-\bar{x}\right)^2} \sqrt{\sum_{i = 1}^n\left(y_i-\bar{y}\right)^2}} \end{equation}$

(3.1)

where $\bar{x}$ and $\bar{y}$ are the means of the samples within the two feature sets, respectively, and ${r}$ takes the value $[-1, 1]$ .

Given that strongly correlated features typically yield more information about the relationships between data, the selection of such features can provide data with higher information content, thereby enhancing the reliability and predictive capability of experiments.

(2) Static factors extraction and enhance

Since the values of the static factors do not change over time, we use correlation analysis to extract the $p$ static factors that are different in time but have a strong correlation to form the matrix $S.$ Specifically, the matrix after fusing the static factors at time $t$ is

$\begin{equation} C_{s}^{t} = \left[ {{X}^{t}},S \right],C_{s}^{t}\in {{R}^{n\times (p+l)}} \end{equation}$

(3.2)

(3) Dynamic factors extraction and enhance

Considering that the dynamics factors will be affected with time, we use the method of correlation analysis to extract $m+1$ time slices with strong correlation from the continuous time series, i.e., we select $D_w^{t-m, t} = \left[D_w^{t-m}, D_w^{t-m-1}, \ldots, D_w^t\right]$ as the dynamic factors ${{D}_{w}}$ for each submatrix.

Finally, through the incorporation of relevant attribute enhancement units, we create an enhancement matrix containing all external spatiotemporal factors and demand characteristic information at time $t$ . This enhancement matrix minimizes the loss of feature information during model training, thereby enhancing the model's perceptiveness to various factors.

$\begin{equation} {{C}^{t}} = \left[ {{X}^{t}},S,D_{1}^{t-m,t},D_{2}^{t-m,t},\ldots ,D_{w}^{t-m,t} \right] \end{equation}$

(3.3)

where ${{C}^{t}}\in {{R}^{n\times (p+l+w\times(m+1)).}}$

3.3. Spatial dependency modeling

The demand for ride-hailing orders exhibits connectivity and fluidity between neighboring areas, resulting in mutual influence. In the transportation field, GCN is currently widely used ^[18], which can handle non-Euclidean spatial data and is very suitable for transportation data analysis and prediction tasks^[9]. Therefore, we utilize GCNs to model the spatial relationships between different areas in the transportation network. Through graph convolution, GCNs can learn the connectivity patterns between different areas and the impact of external spatial factors, thereby enhancing model understanding and prediction of spatial features. The GCN model can be represented as

$\begin{equation} {{O}^{l+1}} = \sigma ({{\widetilde{D}}^{-\frac{1}{2}}}\widetilde{A}{{\widetilde{D}}^{-\frac{1}{2}}}{{O}^{(l)}}{{W}^{(l)}}) \end{equation}$

(3.4)

where $\sigma$ is the activation function, $\widetilde{A}$ the adjacency matrix, $\widetilde{D}$ the corresponding degree matrix, ${{W}^{l}}$ the weight matrix of the $l$ th convolutional layer, and ${{O}^{(l)}}$ the convolutional output of the $l$ th layer. The architecture of the GCN model is shown in Figure 6.

Figure 6. GCN model architecture.

DownLoad: Full-Size Img PowerPoint

In this study, we will use a 2-layer GCN model for training. The model can be represented as

$\begin{equation} f(X|H,G) = \sigma (\widehat{A}\text{ReLu(}\widehat{A}X{{W}_{0}}){{W}_{1}}) \end{equation}$

(3.5)

where $\widehat{A} = {\widetilde{D}^{-\frac{1}{2}}\widetilde{A}{\widetilde{D}^{-\frac{1}{2}}}}$ .

3.4. Time-dependent modeling

Time dependency is also a vital issue in the prediction of demand. Currently, RNNs are a widely used method for processing time series data. However, during the backpropagation process, issues such as gradient vanishing or exploding can be encountered^[63]. LSTM^[64] and GRU^[65] are two variants of RNNs, and they solve this problem nicely by introducing gating mechanisms. GRU replaces the forgetting gate and the input gate with an update gate on top of LSTM, which results in a smaller number of parameters and lower computational complexity, thus improving the training speed of GRU. So, we choose the GRU model to obtain the time dependence of the demand.

As shown in , GRU consists of a combination of a reset gate and an update gate: ${{r}_{t}}$ denotes the reset gate, which determines how the candidate's hidden state at the current time step selectively ignores the information of the previous time step; ${{u}_{t}}$ denotes the update gate, which controls the degree of updating of the hidden state in the previous time step at the current time step; respective ${{c}_{t}}$ denotes the candidate hidden state of the current time step, which contains the intermediate state between the current input and the information of the previous time step; $\sigma$ and $\tanh$ refer to the sigmoid and tanh activation functions; ${{C}_{t}}$ denotes the characteristic information of the demand at the moment of $t$ ; and ${{h}_{t}}$ is the output state of the moment of $t$ .

Figure 7. GRU model structure.

DownLoad: Full-Size Img PowerPoint

In the ride-hailing demand prediction model, GRU effectively captures the temporal dependencies in the time series data, such as hourly, daily, and weekly patterns, through its gating mechanism. This capability enables the model to capture the dynamic relationship between demand and external factors during training.

3.5. EST-GCN

This section introduces the formation process of the EST-GCN unit.

As shown in , taking the input at time $t$ as an example, we represent the attributes related to dynamic factors as a continuum ${D}^{t-m}, ..., {D}^{t-1}, {D}^{t}$ , which includes time periodicity and weather conditions. Meanwhile, $p$ attributes related to the target variable are extracted from static factors, denoted as ${s}^{1}, ..., {s}^{p-1}, {s}^{p}$ . These static factors include POI information and date attributes. Subsequently, one-hot encoding is applied to these attribute values, transforming descriptive variables into continuous variable values, thereby reducing training errors.

Figure 8. Architecture of the EST-GCN unit.

DownLoad: Full-Size Img PowerPoint

Integrate external attributes with the continuously relevant historical demand quantities ${{X}^{t-m}, ..., {X}^{t-2}, {X}^{t-1}, {X}^{t}}$ required at time $t$ to obtain the related attribute enhancement unit ${{C}^{t}}$ , and subsequently, we incrementally input the fused feature unit into the GRU to capture the temporal dependencies of ride-hailing demand features. The output of the GRU further serves as the input for the GCN, utilizing graph convolution operations to learn the spatial correlations of ride-hailing demand across different areas. The objective of this process is to systematically capture spatiotemporal features through the training of GRU and GCN. Ultimately, we attain accurate prediction results, integrating considerations of ride-hailing demand features in both temporal and spatial dimensions.

The specific calculation process is shown below.

$\begin{align} {u_t} & = \sigma ({W_u}[{C^t},{h_{t-1}}]+{b_u} ) \end{align}$

(3.6)

$\begin{align} {r_t} & = \sigma ( {{W}_{r}}[ {{C}^{t}},{{h}_{t-1}} ]+{{b}_{r}} ) \end{align}$

(3.7)

$\begin{align} {c_t} & = \tanh ( {{W}_{c}}[ {{C}^{t}},( {{r}_{t}}*{{h}_{t-1}} )]+{{b}_{c}} ) \end{align}$

(3.8)

$\begin{align} {h_t} & = {u_t}*{h_{t-1}}+( 1-{u_t})*{c_t} \end{align}$

(3.9)

$\begin{align} {\hat{x_t}} & = gc[{A},{Y_t}] \end{align}$

(3.10)

where $gc$ denotes the graph convolution process, and $W$ and $b$ represent the weights and biases in the training process, respectively.

3.6. Loss function

During the training process, the goal is to minimize the error between the actual regional demand and the prediction. We add the L2 regularization to adjust the loss function, which helps to avoid the overfitting problem. The loss function of the model can be expressed as

$\begin{equation} Loss = ||{{X}_{t}}-\widehat{{{X}_{t}}}||+\lambda {\sum\limits_{i = 1}^n({X}_{t}-\widehat{{X}_{t}})^2} \end{equation}$

(3.11)

where ${X_t}$ is the actual demand, ${\widehat{X_t}}$ is the prediction demand, and $\lambda$ is the hyperparameter.

4. Experiment

4.1. Data description

To validate the effectiveness of the EST-GCN model, we opted for a real dataset from Chengdu's ride-hailing operations. This dataset covers two complete temporal cycles extensively and features detailed field content, making it well-suited for experimentation.

● MeiC Taxi: This dataset contains information on ride-hailing in Chengdu from June 3rd to June 17th, 2023, covering two weeks, to mine the impact of cyclicality on future demand. We count demand at five-minute intervals, i.e., we record demand every five minutes, totaling 730,000 pieces of total demand data.

● Areas: Each hexagonal area of the division is 0.7373 square kilometers. The 169 crucial areas within the Chengdu city bypass are selected, and each area is regarded as a vertex of the graph, constituting a adjacency matrix.

● Weather: This data was obtained from the Weather Query API (https://lbs.amap.com/api/webservice/guide/api/weatherinfo/), which obtains real-time weather conditions in the study areas every five minutes. The weather data contains weather conditions from June 3rd to June 17th, 2023.

● POIs: This dataset is the POI distribution information within the selected study area obtained through the API (https://lbs.amap.com/api/webservice/guide/api/search). When selecting POIs, we chose six indicators based on travel demand and study purpose: life, healthcare, tourism, transportation, residential, and companies and enterprises.

● Time Attribute: This dataset contains weekday, non-workday, and holiday attributes from June 3rd to June 17th, 2023, in Chengdu.

4.2. Experimental setting and baseline model

In this paper, we compare the proposed EST-GCN with the widely used temporal prediction baseline models:

● HA ^[29]: Predicting future demand based on the average demand from a past period;

● Autoregressive Integral Moving Average Model (ARIMA)^[66]: Analyzing trends, seasonality, and randomness in demand data to predict future demand;

● SVR ^[67]: Mapping input demand features to continuous output values;

● GCN^[18]: Learning graph network information through convolution operations to predict demand;

● Gated Recycling Unit Model (GRU)^[16]: Predicting demand by learning the temporal dependencies of demand using temporal convolution;

● Spatio-Temporal Graph Convolutional Model (ST-GCN)^[61]: An extension to GCN, specialized for processing graph data with a temporal dimension;

● Spatio-Temporal Attention Network (ST-GAT)^[46]: Combines graph neural networks and attention mechanisms for learning representations and relationships of nodes in spatiotemporal graph data.

● Coupled Layer-wise Graph Convolution (CCRNN)^[68]: A GCN with a layered coupling mechanism.

We trained using the same hyperparameters in the original paper for the above baseline model.

4.3. Evaluation criteria

To validate the EST-GCN model's capability in perceiving external spatiotemporal factors, we have selected the following four criteria for evaluation:

1. Root Mean Squared Error (RMSE). To measure the deviation between the predicted demand values and the actual values; a smaller value indicates higher accuracy.

$\begin{equation} RMSE = \sqrt{\frac{1}{TN}\sum\limits_{t = 1}^{T}{\sum\limits_{i = 1}^{N}{{{(x_{i}^{t}-\widehat{x_{i}^{t}})}^{2}}}}} \end{equation}$

(4.1)

2. Mean Absolute Error (MAE). Calculate the mean of the absolute error between the predicted demand values and the actual values; a smaller value indicates higher accuracy.

$\begin{equation} MAE = \frac{1}{TN}\sum\limits_{t = 1}^{T}\sum\limits_{i = 1}^{N}|x_{i}^{t}-\widehat{x_{i}^{t}}| \end{equation}$

(4.2)

3. Coefficient of Determination ( $R^2$ ). The model's explanatory power regarding the variability of actual values, with a range from 0 to 1; a value closer to 1 indicates a better model fit.

$\begin{equation} {{R}^{2}} = 1-\frac{\sum\limits_{t = 1}^{T}{\sum\limits_{i = 1}^{N}{(}}x_{i}^{t}-\widehat{x_{i}^{t}}{{)}^{2}}}{\sum\limits_{t = 1}^{T}{\sum\limits_{i = 1}^{N}{{{(x_{i}^{t}-\bar{x})}^{2}}}}} \end{equation}$

(4.3)

4. Explained variance score (var). To measure the average deviation squared between the actual data and its mean; a larger value indicates a higher degree of data dispersion.

$\begin{equation} var = \frac{1}{N}\sum\limits_{i = 1}^{N}(x_{i}^{t}-\bar{x_{i}^{t}})^2 \end{equation}$

(4.4)

where $x_{i}^{t}$ and $\widehat{x_{i}^{t}}$ denote the $t$ th time period real demand and prediction demand in the $i$ th area.

4.4. Parameter setting

During the model training, the EST-GCN model requires setting parameters, including the training set ratio, learning rate, number of training epochs, and batch size. In the spatiotemporal dependency extraction stage, we construct a stacked pattern with two layers of GRU and GCN. The GRU model is configured with 32 hidden states, and the GCN model is configured with 64 hidden units. We perform grid search to select the optimal parameters.

In our experiments, to assess the impact of the number of training sessions on the model's performance, the results of each training session are recorded, as shown in the training results in Figure 9, where the horizontal axis represents the number of training sessions. The vertical axis represents the changes of different metrics. shows the trend of RMSE and MAE as the number of training times increases. shows the variation of $R^2$ and $Var$ for different training times. The prediction results are better when the training number is set to 70.

Figure 9. Trends in indicators at different training epochs.

DownLoad: Full-Size Img PowerPoint

5. Results

5.1. Prediction performance comparison

This experiment tested the performance of EST-GCN with other baseline methods in 15, 30, and 45-minute prediction tasks, and the performance comparison is shown in Table 1, where * denotes a negative number, which indicates that the model is less effective in prediction. It can be seen that our EST-GCN model outperforms other baseline models in the prediction performance of almost all evaluation indicators, demonstrating the effectiveness of external factors in predicting ride-hailing.

Table 1. Comparison of performance under different prediction time frames.

Model	$\mathrm{T}(15\; \mathrm{min} / 30\; \mathrm{min} / 45\; \mathrm{ min})$
Model	RMSE	MAE	R2	Var
HA	9.44	5.76	0.65	0.65
SVR	$7.88 / 9.12 / 11.32$	$4.54 / 5.66 / 6.94$	$0.81 / 0.81 / 0.80$	$0.81 / 0.81 / 0.80$
ARIMA	$8.77 / 9.71 / 10.56$	$6.39 / 6.82 / 7.02$	$*$	$0.0012 / 0.0035 / 0.0033$
GRU	$6.35 / 6.52 / 6.87$	$4.12 / 4.43 / 4.75$	$0.83 / 0.81 / 0.80$	$0.83 / 0.81 / 0.80$
GCN	$7.32 / 7.65 / 8.56$	$5.22 / 5.68 / 6.33$	$0.65 / 0.65 / 0.65$	$0.65 / 0.65 / 0.65$
ST-GCN	$6.15 / 6.29 / 6.55$	$3.85 / 3.90 / 4.01$	$0.85 / 0.84 / 0.83$	$0.85 / 0.84 / 0.83$
ST-GAT	$6.09 / 6.21 / 6.52$	$3.78 / 3.89 / 3.96$	$0.86 / 0.84 / 0.83$	$0.86 / 0.84 / 0.83$
CCRNN	$6.01 / 6.15 / 6.50$	$3.75 / 3.85 / 3.95$	$0.86 / 0.84 / 0.83$	$0.86 / 0.84 / 0.83$
EST-GCN	$\mathbf{5.93 / 6.10 / 6.39}$	$\mathbf{3.72 / 3.81 / 3.93}$	$\mathbf{0.86 / 0.84 / 0.83}$	$\mathbf{0.86 / 0.84 / 0.83}$

| Show Table

DownLoad: CSV

(1) Excellent Prediction Performance. Methods based on deep learning neural networks have achieved remarkable predictive accuracy by modeling spatiotemporal features. In comparison to HA, SVR, and ARIMA models, the EST-GCN model consistently exhibits the best RMSE performance across different time horizons, reducing RMSE errors by 37.2, 24.7, and 32.4%, respectively. Compared to GRU and GCN models, the EST-GCN model, leveraging the strengths of both, reduces RMSE errors by 6.6 and 19%, respectively. While ensemble models like ST-GCN, ST-GAT, and CCRNN demonstrate exceptional performance in transportation domain predictions, they do not account for the influence of external spatiotemporal factors. EST-GCN, by integrating features of external spatiotemporal factors, reduces RMSE errors by 3.5, 2.6, and 1.3%, respectively, compared to the ensemble models.

(2) Effective External Spatiotemporal Factors. To validate the impact of external spatiotemporal factors on ride-hailing demand, we compared the EST-GCN model with the ST-GAT and CCRNN models. As shown in Figure 10, taking a 15-minute ridesharing demand prediction as an example, compared to models that do not consider external spatiotemporal factors, the RMSE errors were reduced by 2.6 and 1.3%, respectively.

Figure 10. Comparing metrics of different models across various prediction ranges.

DownLoad: Full-Size Img PowerPoint

(3) Predictive Capability across Different Time Horizons. For various prediction ranges (15 minutes, 30 minutes, and 45 minutes), EST-GCN demonstrates superior performance. In the 15-minute prediction range, the EST-GCN model reduces RMSE errors by 2.6 and 1.3% compared to the ST-GAT and CCRNN models, respectively. Within the 30-minute prediction range, the EST-GCN model exhibits RMSE errors 1.7 and 1.0% lower than those of the ST-GAT and CCRNN models, respectively. In the 45-minute prediction range, the EST-GCN model achieves RMSE errors 2.0 and 1.6% lower than those of the ST-GAT and CCRNN models, respectively.

These results have had a significant impact on the application of EST-GCN in predicting ride-hailing demand. First, EST-GCN demonstrates outstanding predictive performance across different time horizons, indicating its reliability in addressing short-term and long-term demand variations. This provides ride-hailing platforms with more flexible and accurate demand predictions, contributing to the optimization of resource allocation and improvement of service efficiency. Second, EST-GCN, by effectively capturing external spatiotemporal factors, better adapts to the complex changes in ride-hailing demand. This underscores the model's sensitivity to environmental and external factors, enabling it to maintain robustness when dealing with dynamic urban changes and special events. This is crucial for ride-hailing platforms to offer reliable services in complex urban environments.

5.2. Ablation experiments

In this experiment, to verify the degree of influence of external correlation factors in ride-hailing demand prediction, an ablation experiment is set up to compare the interpretation. The external spatiotemporal factors in the experiment include weather condition information, relevant POI information, and date attribute information. The experimental results are shown in Table 2.

Table 2. Results of ablation experiments.

Model	Attributes	RMSE	MAE	R2	Var
EST-GCN	Weather	5.96	3.74	0.85	0.85
	POIs	5.99	3.77	0.84	0.84
	Date	6.02	3.79	0.84	0.84
	Weather+POIs	5.94	3.73	0.85	0.85
	Weather+Date	5.94	3.73	0.85	0.85
	POIs+Date	5.96	3.74	0.85	0.85
	Weather+POIs+Date	5.93	3.72	0.86	0.86
STGCN	None	6.15	3.85	0.85	0.85

| Show Table

DownLoad: CSV

The experimental comparison shows that the model works best when introducing a single factor with weather condition information, indicating that weather conditions affect demand more than date attributes and POIs. In addition, when multiple external factors are introduced, the model's performance is better than the performance of the model when only a single external factor is introduced. Specifically, with the addition of single-factor information, the RMSE errors of the EST-GCN model are reduced by 3.0, 2.6, and 2.1%, respectively, compared to the ST-GCN model. Considering multiple external factor information, including weather conditions with POI information (Weather+POIs), weather with date attributes (Weather+Date), POI information with date attributes (POIs+Date), and weather conditions with POI information and date attributes (Weather+POIs+Date), the EST-GCN model reduced the RMSE error by 3.4, 3.4, 3.0, and 3.5%, respectively, compared to the ST-GCN model.

In summary, the experimental results show that external spatiotemporal factors are effective in improving the accuracy of the demand prediction task. Both a single factor and a combination of factors can significantly improve the performance of prediction models.

5.3. Portability experiments

In this study, we conducted portability experiments on the EST-GCN model to verify its generalization ability. We chose four different geographic regions, as shown in Figure 11, with (a)–(d) as the experimental areas. These experiments aimed to evaluate the prediction ability of the EST-GCN model in new and unseen geographic areas. We select a dataset and parameters consistent with the model training. To ensure portability, we apply a transfer learning strategy by training the model in one area and then transferring it to another. This facilitates the model in converging more rapidly in the new area. In the model design, we also incorporate data standardization and adaptive adjacency matrix to enhance its adaptability to diverse environments and datasets.

Figure 11. Schematic diagram of the distribution of the experimental area.

DownLoad: Full-Size Img PowerPoint

The prediction result indicators are shown in Table 3. According to the data in the table, we can notice that the prediction errors of the EST-GCN model are very similar in the four experimental areas. The result proves that the EST-GCN model has a strong generalization ability in dealing with the spatially heterogeneous features of ride-hailing demand, which implies good adaptability and transferability in different areas.

Table 3. Prediction results for different experimental regions.

Experimental area	RMSE	MAE
Original study area (a)	5.9312	3.7235
Experimental area (b)	5.9335	3.7240
Experimental area (c)	5.9302	3.7231
Experimental area (d)	5.9289	3.7203

| Show Table

DownLoad: CSV

5.4. EST-GCN interpretation

(1) Prediction capacity analysis.

To comprehensively evaluate the model's prediction ability, we choose the commercial areas in Chengdu City where the dataset is concentrated in human flow and visualize the actual demand values of the test set with the prediction results of the EST-GCN model. The results of the demand prediction for the next 15, 30, and 45 minutes are shown in Figure 12. The above graph shows the prediction results from June 15th, 2023 to June 17th, 2023. From the visualization results, we can draw the following conclusions:

Figure 12. Visualization results over a range of prediction for different time limits.

DownLoad: Full-Size Img PowerPoint

1. For different prediction ranges, we find that the predictions of the EST-GCN model are consistent with the overall trend of the actual values, but the predictions are poor at the local extremes. This phenomenon is hypothesized to be due to the presence of unexpected events and the randomness of crowd movement, in addition to the external correlates considered in this study, which lead to unpredictable fluctuations in demand.

2. Short-term prediction is closer to the actual value. Because the EST-GCN model is more likely to capture short-term trends, as the prediction range increases the influence of external relevant factors may become complex and unstable, leading to relatively poorer prediction of the model in long-term prediction.

(2) Effect of external spatial and temporal factors. To deeply analyze the effect of external spatiotemporal factors, we chose data from the commercial area with the concentrated human flow in the ride-hailing dataset of Chengdu City to conduct the ablation experiment. The dataset contains information on different date attributes and weather conditions. We worked on demand prediction for different experimental conditions and visualized the prediction results as shown below Figures 13–15:

Figure 13. Comparison of Prediction with and without External Factors.

DownLoad: Full-Size Img PowerPoint

Figure 14. Comparison between predictions incorporating different external information.

DownLoad: Full-Size Img PowerPoint

Figure 15. Comparison of prediction residuals in different prediction environments.

DownLoad: Full-Size Img PowerPoint

1. The external dynamic-related factor (weather) significantly improves the prediction effect of ride-hailing prediction at the peak and turning point. Especially in the early morning of June 17th, under heavy rainy weather conditions, the prediction effect of adding the weather condition information is shown in Figure 15, and its prediction value is closer to the actual value compared to adding the information of other factors.

2. External static factors, including date attributes and POI information, contribute significantly to the prediction accuracy of ride-hailing. In particular, the effect of date attributes is more significant during the transition between weekdays and days off.

3. The EST-GCN model can capture the various factors affecting ride-hailing more comprehensively by adding multiple external factors, including weather conditions, POI information, and date attributes, making the prediction model more flexible and adaptive, thus achieving better performance in the prediction task.

4. In different environments, the model's performance may excessively rely on external factors, potentially leading to a decrease in prediction accuracy in certain scenarios. For example, if the model relies heavily on POI information (such as popular events at specific locations), its predictive capability may be compromised in unconventional circumstances.

6. Conclusions

To address the problem that traditional ride-hailing prediction models do not comprehensively consider external spatiotemporal factors, we introduce the EST-GCN model for modeling the dependence of external spatiotemporal factors in ride-hailing prediction. We coded demand in each area with relevant external spatial and temporal factors to form area characterization units. Combining GCN and GRU models, spatiotemporal information is extracted from the feature units in different areas to explore the potential relationship between external spatiotemporal factors and demand. We conducted experiments on the Chengdu City operations dataset. The experimental results show that the spatiotemporal graph convolution model incorporating external factors can better adapt to changes in the external environment, and the overall prediction effect is better than that of the advanced baseline method, which proves the importance of external spatiotemporal factors in ride-hailing prediction. The model is essential for improving urban transportation systems' efficiency and intelligent scheduling. By accurately predicting ride-hailing demand and integrating external spatiotemporal factors, the model can assist ride-hailing companies in optimizing vehicle scheduling, improving operational efficiency, and reducing passenger wait times. Additionally, the model's insights into spatiotemporal dependencies can promote more effective urban traffic management, potentially reducing congestion and enhancing city resource allocation.

As future work, our planned research includes (1) considering using more external spatiotemporal information data to evaluate the model, (2) optimizing rules for dividing areas, and (3) applying this model to other cities to validate its applicability and effectiveness in different urban environments, thereby expanding the model's scope of application.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

Acknowledgments

This work was supported by the Scientific Research and Innovation Team Program of Sichuan University of Science and Technology under Grant SUSE652A006, the Key Research Base of Intelligent Tourism in Sichuan Province under Grant No.ZHZJ22-02 and No.ZHYR23-03, and the Graduate Innovation Fund of Sichuan University of Science and Engineering under Grant No.Y2023109 and the Key Laboratory of Philosophy and Social Sciences of Sichuan Province – Key Laboratory of Liquor Intelligent Management and Ecological Decision Optimization in the Upper Reaches of Yangtze River under Grant No.zdsy-12. This study was supported by the computational support provided by the High Performance Computing Center, School of Computer Science and Engineering, Sichuan University of Science and Engineering.

Conflict of interest

The authors declare there is no conflict of interest.

References

[1]	D. Christine, A. P. S. Chen, H. J. Christanto, Deep learning for highly accurate hand recognition based on YOLOv7 model, Big Data Cogn. Comput., 7 (2023), 53. https://doi.org/10.3390/bdcc7010053 doi: 10.3390/bdcc7010053
[2]	Y. Zhang, J. Chu, L. Leng, J. Miao, Mask-Refined R-CNN: A network for refining object details in instance segmentation, Sensors, 20 (2020), 1010. https://doi.org/10.3390/s20041010 doi: 10.3390/s20041010
[3]	M. Rostami, S. Forouzandeh, K. Berahmand, M. Soltani, Integration of multi-objective PSO based feature selection and node centrality for medical datasets, Genomics, 112 (2020), 3943–3950. https://doi.org/10.1016/j.ygeno.2020.07.027 doi: 10.1016/j.ygeno.2020.07.027
[4]	L. A. Varga, B. Kiefer, M. Messmer, A. Zell, SeaDronesSee: A maritime benchmark for detecting humans in open water, in 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), (2022), 3686–3696. https://doi.org/10.1109/WACV51458.2022.00374
[5]	W. Li, J. Qiang, X. Li, P. Guan, Y. Du, UAV image small object detection based on composite backbone network, Mobile Inf. Syst., 2022 (2022), 11. https://doi.org/10.1155/2022/7319529 doi: 10.1155/2022/7319529
[6]	Y. Cheng, H. Xu, Y. Liu, Robust small object detection on the water surface through fusion of camera and millimeter wave radar, in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), (2021), 15243–15252. https://doi.org/10.1109/ICCV48922.2021.01498
[7]	J. Ding, N. Xue, G. S. Xia, X. Bai, W. Yang, M. Y. Yang, et al., Object detection in aerial images: A large-scale benchmark and challenges, IEEE Trans. Pattern Anal. Mach. Intell., 44 (2022), 7778–7796. https://doi.org/10.1109/TPAMI.2021.3117983 doi: 10.1109/TPAMI.2021.3117983
[8]	S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell., 36 (2017), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031 doi: 10.1109/TPAMI.2016.2577031
[9]	J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real time object detection, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 779–788. https://doi.org/10.1109/CVPR.2016.91
[10]	M. Liu, X. Wang, A. Zhou, X. Fu, Y. Ma, C. Piao, UAV-YOLO: Small object detection on unmanned aerial vehicle perspective, Sensors, 20 (2020), 2238. https://doi.org/10.3390/s20082238 doi: 10.3390/s20082238
[11]	X. Liang, J. Zhang, L. Zhuo, Y. Li, Q. Tian, Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis, IEEE Trans. Circuits Syst. Video Technol., 30 (2020), 1758–1770. https://doi.org/10.1109/TCSVT.2019.2905881 doi: 10.1109/TCSVT.2019.2905881
[12]	X. Liu, J. Huang, T. Yang, Q. Wang, Improved small object detection for UAV acquisition based on CenterNet, Comput. Eng. Appl., 58 (2022), 96–104.
[13]	Y. Huang, H. Cui, J. Ma, Y. Hao, Research on an aerial object detection algorithm based on improved YOLOv5, in 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), (2022), 396–400. https://doi.org/10.1109/CVIDLICCEA56201.2022.9825196
[14]	G. Xu, G. Mao, Aerial image object detection of UAV based on multi-level feature fusion, J. Front. Comput. Sci. Technol., 17 (2023), 635–645. https://doi.org/10.3778/j.issn.1673-9418.2205114 doi: 10.3778/j.issn.1673-9418.2205114
[15]	Z. Liu, X. Zhang, C. Liu, H. Wang, C. Sun, B. Li, et al., RelationRS: Relationship representation network for object detection in aerial images, Remote Sens., 14 (2022), 1862. https://doi.org/10.3390/rs14081862 doi: 10.3390/rs14081862
[16]	J. Chu, Z. Guo, L. Leng, Object detection based on multi-layer convolution feature fusion and online hard example mining, IEEE Access, 6 (2018), 19959–19967. https://doi.org/10.1109/ACCESS.2018.2815149 doi: 10.1109/ACCESS.2018.2815149
[17]	R. Sheikhpour, K. Berahmand, S. Forouzandeh, Hessian-based semi-supervised feature selection using generalized uncorrelated constraint, Knowledge-Based Syst., 269, (2023), 110521. https://doi.org/10.1016/j.knosys.2023.110521
[18]	T. Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 936–944. https://doi.org/10.1109/CVPR.2017.106
[19]	S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmentation, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 8759–8768. https://doi.org/10.1109/CVPR.2018.00913
[20]	M. Tan, R. Pang, Q. V. Le, Efficientdet: Scalable and efficient object detection, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), 10781–10790. https://doi.org/10.1109/CVPR42600.2020.01079
[21]	G. Jocher, A. Chaurasia, New YOLOv5 Classification Models, 2022. Available from: https://github.com/ultralytics/yolov5/tree/v6.2.
[22]	S. Liu, D. Huang, Y. Wang, Learning spatial fusion for single-shot object detection, preprint, arXiv: 1911.09516.
[23]	J. Redmon, A. Farhadi, YOLO9000: Better, faster, stronger, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 6517–6525. https://doi.org/10.1109/CVPR.2017.690
[24]	Z. Tian, C. Shen, H. Chen, T. He, FCOS: Fully convolutional one-stage object detection, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 9626–9635. https://doi.org/10.1109/ICCV.2019.00972
[25]	D. Du, P. Zhu, L. Wen, X. Bian, H. Lin, Q. Hu, et al., VisDrone-DET2019: The vision meets drone object detection in image challenge results, in 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), (2019), 213–226. https://doi.org/10.1109/ICCVW.2019.00030
[26]	T. Y. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, Microsoft COCO: Common objects in context, in 13th European Conference on Computer Vision, (2014), 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
[27]	Z. Zhang, H. Yi, J. Zheng, Focusing on small objects detector in aerial images, Acta Electron. Sin., 51 (2023), 944–955. https://doi.org/10.12263/DZXB.20220313 doi: 10.12263/DZXB.20220313
[28]	M. Tan, Q. Le, Efficientnet: Rethinking model scaling for convolutional neural networks, in International Conference on Machine Learning, PMLR, (2019), 6105–6114.
[29]	A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, MobileNets: efficient convolutional neural networks for mobile vision applications, preprint, arXiv: 1704.04861.
[30]	J. Redmon, A. Farhadi, YOLOv3: An incremental improvement, preprint, arXiv: 1804.02767.
[31]	Z. Ge, S. Liu, F. Wang, Z. Li, J. Sun, YOLOX: Exceeding YOLO series in 2021, preprint, arXiv: 2107.08430.
[32]	G. Yu, Q. Chang, W. Lv, C. Xu, C. Cui, W. Ji, et al., PP-PicoDet: A better real-time object detector on mobile devices, preprint, arXiv: 2111.00902.
[33]	C. Y. Wang, A. Bochkovskiy, H. Y. M. Liao, YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2023), 7464–7475.
[34]	S. Xu, X. Wang, W. Lv, Q. Chang, C. Cui, K. Deng, et al., PP-YOLOE: An evolved version of YOLO, preprint, arXiv: 2203.16250.

This article has been cited by:

Jong Hyuk Park, Editorial: Artificial Intelligence-based Security Applications and Services for Smart Cities, 2024, 21, 1551-0018, 7012, 10.3934/mbe.2024307

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Mathematical Biosciences and Engineering

3.9

Metrics

Article views(2132) PDF downloads(199) Cited by(2)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Figures(11) / Tables(10)

Mathematical Biosciences and Engineering

Aerial images object detection method based on cross-scale multi-feature fusion

Related Papers:

Abstract

1. Introduction

2. Problem definition and analysis

2.1. Problem definition

2.2. Demand prediction

2.3. External spatial and temporal factors

3. Methods

3.1. Framework

3.2. Spatio-temporal factor feature extraction and enhance methods

3.3. Spatial dependency modeling

3.4. Time-dependent modeling

3.5. EST-GCN

3.6. Loss function

4. Experiment

4.1. Data description

4.2. Experimental setting and baseline model

4.3. Evaluation criteria

4.4. Parameter setting

5. Results

5.1. Prediction performance comparison

5.2. Ablation experiments

5.3. Portability experiments

5.4. EST-GCN interpretation

6. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Mathematical Biosciences and Engineering

Aerial images object detection method based on cross-scale multi-feature fusion

Related Papers:

Abstract

1. Introduction

2. Problem definition and analysis

2.1. Problem definition

2.2. Demand prediction

2.3. External spatial and temporal factors

3. Methods

3.1. Framework

3.2. Spatio-temporal factor feature extraction and enhance methods

3.3. Spatial dependency modeling

3.4. Time-dependent modeling

3.5. EST-GCN

3.6. Loss function

4. Experiment

4.1. Data description

4.2. Experimental setting and baseline model

4.3. Evaluation criteria

4.4. Parameter setting

5. Results

5.1. Prediction performance comparison

5.2. Ablation experiments

5.3. Portability experiments

5.4. EST-GCN interpretation

6. Conclusions

Use of AI tools declaration

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog