1.
Introduction
Currently, taxis in major cities are generally equipped with a global positioning system (GPS). The GPS used by taxis has high data accuracy, extensive coverage, and the ability to obtain dynamic positioning and timing information in real time. The on-board GPS equipment of taxis can not only assist the taxi supervision and dispatch system in knowing the status of taxis (including vacancy, busy, and rest) and the traffic flow condition of the road network [1], but it can also help the taxi management center in supervising the operational behavior and efficiently managing taxis by analyzing vehicle operating data [2]. Taking Nanjing city as an example, which has an average of approximately 15,000 taxis put into operation every day, each taxi uploads its location information in real time through the on-board GPS, which is uploaded once every 60 s on an average, generating approximately 300 million trajectories every day. These data are analyzed to offer a decision-making basis for vehicle operation supervision by analyzing taxi operation characteristics and indicators.
Scholars worldwide have conducted various studies based on taxi travel trajectory data, which are rich in traffic information. The majority of research focuses on discovering the travel characteristics of urban transportation [3,4,5,6], identifying areas of interest [7], discovering behavior patterns [8], exploring the laws of mobility interactions [9], and analyzing urban road conditions and traffic accessibility [10]. However, when compared with regular information, the data contains some interesting information that usually involves anomalous behavior patterns associated with events [11,12]. For example, a previous study identified hacking behaviors from abnormal network data, discovered drunk or reckless driving behaviors from abnormal traffic flow data, detected bank fraud incidents from abnormal credit card transaction data, and identified hazards from abnormal medical images.
In the domain of transportation, a small fraction of outliers detected in taxi datasets may contain abnormal behavior patterns [13], which may be due to detours, events (such as concert, fair, gatherings), unlawful pricing, or even taxi hijacking. Among them, the detour was the most worthy of our attention. It is a serious problem of urban transportation systems that has negatively impacted passengers' perceptions of the city and their sentiments. Some unscrupulous taxi drivers profit by purposefully detouring to extend their driving distance [14,15]. The illegal operation of taxis not only violates the legitimate rights of passengers but also disrupts the normal transport order, damages the operating interests of passenger transport enterprises, and reduces the standard of service provided by the taxi system.
According to studies, the detection problem of abnormal taxi behaviors (such as detours for no reason and illegal gatherings) can be viewed as a specialization of the general problem of identifying data with different patterns [16,17,18,19,20]. The main task of detecting abnormal data is to identify data containing behavioral differences. More specifically, detecting abnormal data involves finding datasets that are distinct from most of the datasets [21]. Most studies on abnormal taxi behavior detection have focused on a series of points representing taxi trajectories. Generally, there are two ways to identify abnormal datasets: identifying datasets that differ from the global behavior (i.e., anomalous behavior detection for a single trajectory), and looking for data groups that differ from neighborhoods (especially generalized local neighborhoods) after grouping all datasets.
The feature-based abnormal behavior detection method is used to establish a global feature model based on the existing trajectory data [22], and the trajectory data that deviates significantly from the global feature model is classified as abnormal behavior. For example, by combining geographic features with semantic descriptions of each trajectory site, Palma et al. extracted anomalous information from trajectory data [23]. Grady and Schwartz used vehicle position, speed, and corner direction as discriminative features and used comprehensive discriminant indicators to identify abnormal vehicle behaviors [24]. Combining mathematical and statistical methods, Zhao et al. established a feature matrix of target trajectories and regarded trajectories with unsatisfactory confidence as abnormal [25].
Meanwhile, time-series analysis can be used to analyze behavioral changes in trajectories over time. By visualizing the trajectory data, the shape of the trajectories can be recognized and their similarity can be identified [26]. For example, Ibrahim used Euclidean distance and Dynamic Time Warping (DTW) to perform hierarchical division and similarity discrimination of taxi trips, group similar trips, and study their trends over time [27]. Clustering methods are one of the most popular techniques for grouping data into homogeneous clusters and aim to minimize the distance between individual data groups within a cluster [23,28,29].
In general, the detour behavior detection of taxis based on trajectory data has achieved several advances. However, there are several problems that persist. First, taxi detour behavior detection is commonly conducted by experienced staff based on the feedback from passengers. However, this method is inefficient, time-consuming, and labor-intensive [30]. Meanwhile, it is usually difficult for passengers who are not familiar with local roads to identify detours, and abnormal data are usually eliminated as "noisy data" in practical applications. Second, existing anomaly detection systems have shortcomings, such as single discrimination criteria and high false alarm ratios in complex urban road network environments. There is a lack of research on the combination of anomaly detection algorithms and the joint use of spatio-temporal data to identify detours. Therefore, appropriate detour behavior detection algorithms that meet the needs of intelligent supervision services are urgently required. With the current advancement of smart and off-site law enforcement, this study proposes a detour behavior detection model based on Nanjing taxi trajectory data. This method uses grid-based abnormal trajectory detection algorithms to detect abnormal data and analyze the microscopic characteristics of detour trajectories.
2.
Problem description
2.1. Dataset
The raw taxi trajectory data were obtained from nearly 20,000 taxis in Nanjing in 2021. The dataset includes the location, speed, direction, status, and other attributes. The data features included SERIAL, ORDER_ID, VEHICLE_NO, POSITION_TIME, LON, LAT, SPEED, DIRECTION, MILEAGE, vehicle VEH_STATUS, ACC, and operating status. These features were recorded at least every 60 s. The description of the individual features is as follows. The ACC feature represents the status of the engine, with values 1 and 0. When the value of the feature ACC equals 1, it indicates that the engine is working. In contrast, when the value of the feature ACC is 0, it indicates that the engine is off. For the feature representing the order number, the value of 0 represents the non-operational status. The value range of the feature representing the direction angle is 0–359 in the clockwise direction. The unit of MILEAGE is the KM. There are four operating statuses (1: passenger, 2: order, 3: empty, and 4: out-of-service). Table 1 lists the features and examples included in the car-hailing positioning information data.
2.2. Data preprocessing
2.2.1. Grid meshing
Taxi trajectory data are used as spatiotemporal data, and grid meshing is a basic method for spatiotemporal big-data analysis. In this study, the advantages are mainly reflected in the following points: First, the trajectory data were originally continuous points scattered on the map. Meshing can discretize the geographic space into small areas one by one, making the analysis of point data easier. Second, the grids had the same size, which ensured that their properties could be compared. This method can also control the grid size and the accuracy of the analysis. The last advantage is that the GPS point trajectory is transformed into a grid sequence that can achieve rapid data correspondence and effectively improve the efficiency of anomaly detection.
As shown in Figure 1, the trajectory points, grid cells, and map data were first connected and corresponded. After the GPS data were gridded, each data point contained the corresponding grid information. When the grid is used to express the distribution of the data, the distribution situation represented by it is close to the real situation. Meanwhile, the GPS trajectory is transformed into a grid sequence, which can realize rapid data correspondence to effectively improve the efficiency.
The grid-based trajectory representation method includes the following steps. First, consider the geographic boundary of Nanjing as the boundary of the research scope and divide it into separate square grids of 500 m * 500 m. As shown in Table 2, the grid number corresponds to the latitude and longitude of the grid center point. After obtaining the gridding parameters and corresponding GPS data for the grid, a grid can be jointly specified by the columns "LONCOL" and "LATCOL", which can be regarded as the horizontal and vertical coordinates of the grid. Thus, the grid was recorded as an Gi=(LONCOL,LATCOL)=(xi,yi). At the same time, for the convenience of calculation, the trajectory was recorded as ti=(G1,G2,…,Gi)=[(x1,y1),(x2,y2),…,(xi,yi)]. Each trajectory consisted of a series of continuous trajectory points or a continuous sequence of one or more grids. Finally, the trajectory set T was recorded as T=(t0,t1,…tn), and each trajectory set consisted of sub-tracks tn with the same start and end points.
Figure 2 depicts the trajectory after meshing. In addition, the number of trajectory points in each grid was counted. A heat map can also be used to reflect the degree of taxi aggregation in different areas while generating grid geographic graphics. Areas with a larger degree of aggregation will be the key research areas for abnormal behavior detection. The numbers in the legend of Figure 2 represent the order number, that is, the number of orders corresponding to the trajectories of each color. Taking Figure 2 as an example, the purple dot correspond to numbers (1.00, 2.00), indicating that the purple trajectories in the Figure 2 are visualizations of trajectories in order 1 and order 2. The meaning of the numbers after other color points can be deduced in the same way.
2.2.2. OD extraction and grouping statistics
After completing the meshing and correlating trajectory points with grids, as shown in Table 3, the OD (origin-destination) as well as the operating status of the taxi can be extracted based on where the trajectory points are located on the grid. In Table 3, the columns "SLONCOL" and "SLATCOL" represent the grid coordinates of the starting point, and the columns "ELONCOL" and "ELATCOL" represent the grid coordinates of the ending point.
GPS devices typically report data at a low frequency of approximately one record per minute. This results in a less detailed representation of taxi trajectories, as a taxi may traverse multiple consecutive cells without recording its GPS points. Therefore, before OD grouping, trajectory point densification is carried out with a time interval of 1 s, ensuring that there a trajectory point is generated every 1 s. After augmenting the trajectory points between each OD pair, all taxi ODs that passed through the same pair of destination cells were grouped. The classification process of the start and end points of the same trajectory is illustrated in Figure 3. Therefore, the problem of abnormal driving trajectory detection was transformed into the problem of finding abnormal trajectories from all trajectories with the same start-end-point unit pair.
2.3. Definition of detour trajectory
Taxi trajectory data is a type of spatio-temporal trajectory data that is generated by moving objects in geographic space and is usually represented by spatial points with temporal order. The formal expression is Trajectorytk=p1−>p2−>…⋯−>pn, where Pi indicates the location and other properties of the target in ti. Usually, the elements of Pi include the ID of the positioning point, the ID of the trajectory, longitude, latitude, altitude, speed, and time. Trajectory dataset T={t1, t2,⋯, tm}, where ti represents the ith trajectory. This study found that not all spatially anomalous trajectories are detour trajectories and may be the well-intentioned choices of drivers, such as taking shortcuts to save time or avoiding road congestion. Therefore, as shown in Figure 4, this study judges whether the trajectory is abnormal in space or time based on spatial characteristics and travel time, and categorizes the abnormal trajectories into four categories: normal, temporal anomaly, spatial anomaly, and spatiotemporal anomaly. As shown in Figure 4, t2 is a normal trajectory. t1 is similar to a normal trajectory in terms of time but different in terms of space characteristics. t3 has spatial characteristics similar to normal trajectories, but abnormal temporal characteristics, which are usually caused by road congestion. Since both the temporal and spatial features of t4 exhibit anomalies, such trajectories are classified as spatiotemporal anomalous trajectories.
The study object is a spatiotemporal anomalous trajectory based on which detour behaviors can be identified. The detour behavior presents abnormal distributions in both time and space, which complicates the determination process. Detour behavior is a manifestation of trajectory abnormality, but it is not the same as trajectory abnormality; it is a subset of or a special type of trajectory abnormality. Combining the actual situation and the spatiotemporal characteristics of the trajectory, as shown in Figure 5, four conditions are supposed to be satisfied: small quantity and deviation from the normal trajectory; graphically abnormal; travel time larger than the threshold value; and locally generated abnormal driving distance. Trajectories satisfying the above conditions are classified as detour trajectories.
3.
Technical framework of taxi detour trajectory detection
After preliminary preparations, this study divides the detour trajectory detection technology framework into two parts: preprocessing and detection. This framework was designed according to the definition and characteristics of detour trajectories.
As shown in Figure 6, in the preprocessing stage, the trajectory data are first converted into a grid sequence, and their origin and destination are classified and indexed. The trajectories that passed through the same origin-destination grid pair were separated into groups, and the question was transformed into finding anomalous trajectories for the same origin-destination grid pair. Second, after the classification index, anomaly detection based on the isolation-Based Anomalous Trajectory (iBAT) algorithm is performed on the trajectory grid sequence, that is, to obtain a set of trajectories, including normal and abnormal, and calculate the mean and variance of the normal trajectory travel time to obtain the corresponding threshold.
As shown in Figure 7, in the detection stage, four steps are performed: conversion of trajectory points to grid sequences; selection of corresponding groupings according to OD pairs; similarity measurement with normal trajectories; calculation of the distance matrix; and DTW score. The last step is to compare the mean and variance of the normal trajectory travel time and output the detour trajectory.
3.1. Abnormal trajectory detection based on iBAT algorithm
In principle, the detection methods for abnormal trajectories can be divided into four categories based on category, historical trajectory similarity, distance index, and grid. The principles, advantages, and disadvantages of these four categories are as follows:
1) Classification-based detection methods.
The classification-based trajectory anomaly detection method generally involves two steps: one is to label the training set to learn to build a classifier, and the other is to use the classifier to distinguish noise, normal values, and outliers in the test set [31,32]. In practical applications, to obtain higher detection accuracy, it is necessary to manually attach labels to the training set data, which consumes manpower and material resources and hence does not meet the requirements of online anomaly detection [33].
2) Detection methods based on historical similarities.
As it is difficult to obtain a relatively complete dataset, historical data can be used to ensure the integrity of the acquired abnormal trajectory types [34,35]. The trajectory anomaly detection method based on the similarity with historical trajectories is used to extract abnormal features based on the existing trajectory database and assess if it is an abnormal trajectory by calculating the matching degree between the target trajectory and the abnormal features [36].
3) Distance-based detection method.
The trajectory anomaly detection method based on the distance index measures the difference between trajectories based on the distance between trajectory points or between trajectories (particularly between trajectories) [37], identifying abnormal trajectories that are significantly off away from usual trajectories [38,39]. However, the distance-based detection method is aimed at the trajectory characteristics at a certain time point and ignores the trajectory changes in the entire process [40], that is, the difference in the spatial position characteristics at a specific instant.
4) Detection method based on grid divisions.
The main idea of the grid-based trajectory anomaly detection method is to divide the area into grids, attach a series of sequences to the trajectory, and identify anomalies according to the sequences [41]. Anomaly detection methods are mainly based on the likelihood ratio and isolation mechanism. The abnormal trajectory detection method based on the likelihood ratio was used to predict the abnormal trajectory by constructing the likelihood ratio detection statistic [42]. When this method is applied to abnormal trajectory recognition, it is necessary to count the maximum deviation from the expected situation at the vehicle level. The main idea of the abnormal trajectory detection algorithm of the isolation mechanism is that the abnormal trajectory distribution is sparse and unique [43]. An anomaly detection algorithm based on the isolation forest is a commonly used algorithm for this type of detection method [44,45,46]. The basic idea of the algorithm is that the outliers are sparser than the normal points; therefore, they are easier to divide, which is completely consistent with the above-mentioned isolation mechanism.
Considering the characteristics of several anomaly detections, this study finally selected the iBAT detection method among the grid-based detection methods for preliminary anomaly trajectory screening. The iBAT is a mining algorithm for abnormal trajectories based on the idea of isolation, which is a decision tree based on trajectory gridding. According to the algorithm, trajectories can be divided into two types: normal and abnormal. The normal trajectories are "many and approximate", and the abnormal trajectories are "few and special". The latter is the focus of the present study. Compared to the complexity and intractability of normal trajectories, abnormal trajectories are easier to isolate. It utilizes the inherent "few and different" characteristics of abnormal trajectories and applies a data-induced random tree to divide all trajectories until they are isolated. The specific algorithm steps of iBAT randomly select a grid, dividing the trajectory set into two trees according to the presence or absence of this grid, recursively processing the subtrees, obtaining a complete decision tree, and determining abnormal data according to the iForest algorithm. The iBAT algorithm used in this study is Algorithm 1, which is summarized as follows:
The iBAT algorithm
For example, Figure 8 shows seven trajectories in a set of trajectories. All the trajectories begin at 1 and end at 28. t0 and t6 are significantly different from other trajectories, and the path length of abnormal trajectories in t0 is significantly shorter than that of other trajectories.
t0 : 1→2→3→4→5→6→7→14→21→28
t1 : 1→8→15→22→23→24→25→26→27→28
t2 : 1→8→15→23→24→25→26→27→28
t3 : 1→8→15→16→23→24→25→26→27→28
t4 : 1→8→15→16→24→25→26→27→28
t5 : 1→8→16→23→24→25→26→27→28
t6 : 1→8→15→22→23→16→15→22→23→24→25→26→27→28
In addition, because each trajectory point selection is random, the final abnormal score should be obtained by comprehensively considering the results of multiple judgments. The calculation formula for the abnormal score is given in Eqs (1) and (2).
where x is the track to be detected and n represents the size of the sub-sampling sample, that is, the number of tracks in the same group as x. E(t(x)) represents the number of track points that should be selected to separate the track from other tracks in the same group. c(n) represents the average number of trajectory points that must be selected in a group of n trajectories to separate the trajectories in this group from other trajectories. H is the total number of harmonics that equals ln(n)+0.57721566(Euler Constant), and when E(t(x))→0 and Score(x,n)→1, the trajectory is viewed as an abnormal trajectory. When E(t(x))>c(n) and Score(x,n)<0.5, the trajectory can be viewed as a normal trajectory.
3.2. Abnormal trajectory detection based on iBAT and DTW improved algorithm
This study considers the flaws of iBAT; specifically, the anomaly detection algorithm based on iBAT lacks an accurate judgment of partial trajectory detours. Therefore, in view of the shortcomings of the iBAT algorithm, this study provides several improvements. The trajectory similarity measurement algorithm was used for further anomaly detection on abnormal trajectories processed by the iBAT algorithm, lowering the misjudgment rate.
Commonly used metrics for calculating the spatiotemporal similarity between two trajectories are Euclidean distance, Hausdorff distance, Edit distance, and DTW. The Euclidean distance refers to the distance between two points in Euclidean space. When using Euclidean distance to measure the similarity between trajectories, it is necessary to first convert the two trajectories into sequences of the same dimension and length. The Hausdroff distance describes the distance between two point sets. The distance metric between them can be determined using the exhaustive method at the algorithm level.
Edit distance is a calculation index used to measure the difference between two character sequences. The fundamental idea of using edit distance is to change one string into another by adding, deleting, and replacing. The premise of the change is that the lengths of the strings are equal, and after the lengths of the two strings are the same, they are assessed by measuring the similarity of the longest common sequence between the two character sequences. It is difficult to calculate the distance between sequences of different lengths using the traditional Euclidean distance calculation method, whereas DTW is commonly used to determine the similarity between two time series. Thus, the DTW algorithm was used to calculate the distance between two time series of different lengths. In addition, the DTW algorithm can be executed without reference to time.
Therefore, in this study, the abnormal trajectories initially processed by the iBAT algorithm were further processed by DTW to measure the similarity between the trajectories. The application principle of the DTW algorithm used in this study is as follows: To find the similarity between two trajectories, a distance matrix dG, which can be considered as a multidimensional array, needs to be calculated, and each point of sub-trajectory t1 needs to be mapped with the actual distance of each point of sub-trajectory t2. To determine the distance between two trajectory points, this study used the Harvesine formula shown in the following equation:
where λ1, ϕ1 and λ2, ϕ2 represent the radians of the geographic longitude and latitude of points 1 and 2, respectively.
To use DTW to calculate the distance between the subtrajectories t1 and t2, this study defines a function to calculate the ground distance between two points. Subsequently, using the principles of dynamic programming, the matrix is recursively traversed until the final score representing the DTW between the two trajectories is obtained. The formula is as follows:
In this study, the trajectories were initially divided into two categories through iBAT: normal trajectories were marked with label 0, and abnormal trajectories were marked with label 1. The DTW similarity measurement process is illustrated in Figure 9.
3.3. Abnormal trajectory detection based on iBAT algorithm
In the preceding two sections, this study detects abnormal trajectories using the iBAT and DTW algorithms, and the main goal of this study is to detect detour behavior. Through related research, it was found that the driving distance and time are used to establish if a taxi's trajectory is a detour behavior. Therefore, in this section, we perform a statistical analysis on the spatiotemporal characteristics of the detected abnormal trajectories and investigate the reasons behind the abnormal trajectories. The intentional detour of the taxi can be more accurately identified if evidence to rule out the driver's detour for special reasons such as traffic accidents is provided.
This study assumes that detour behavior will lead to longer route lengths than normal, increasing both time and cost. The mileage and travel time of the normal trajectory between point S and point D can be determined using the historical trajectory database and previous abnormal detection results. If the mileage and travel time of the trajectory are greater than the corresponding thresholds, the abnormal trajectory is deemed to be a detour trajectory. This study defines Max_D and Min_D as the maximum and minimum travelling lengths in a normal trip, respectively, and Max_T and Min_T as the maximum and minimum travelling times in a normal trip, respectively. These values exhibit great variability considering the different traffic conditions at different time periods.
Therefore, for each S-D pair, the mean travel time μT of the normal trajectory and its standard deviation σT were calculated, and the threshold was defined as Max_T=μT+σT. In addition, in order to account for the influence of different time periods on the same road stretch, this study analyzed all trips between S-D pairs in the same time period to exclude taxi drivers detouring for special reasons. Special reasons for this include road traffic accidents and other special events. An illustrative example is shown in Figure 10. The green dotted line trajectories indicate that traffic interference in this time period is unlikely; hence, the red trajectory may be correctly identified as an intentional detour trajectory.
4.
Experimental results and analysis
4.1. Experimental data
In this study, the detour trajectory data from the Nanjing 2021 case list were screened out, indexed by order ID and marked in the experimental data (see Table 4), and then corresponded to the grid sequence.
To verify the algorithm, this study selected three sets of grid trajectory sequences with the same S-D, where the detour trajectory data was obtained from the case list data. Comparing the order ID to the data of the order, the experimental dataset obtained is shown in Table 5.
4.2. Experimental results
4.2.1. Experimental results of iBAT
To improve the effect of the iBAT algorithm, this study investigates the influence of the values of the parameters c(n) and n on the AUC (Area Under Curve) value. c(n) is the number of binary trees, that is, the average number of trajectory points that need to be selected to separate from other trajectories. n is the sample size, which is the number of trajectories in the same group as the trajectory x. In this study, the experiment has been carried out on the T-2 trajectory set with the largest amount of data, where the value of c(n) is [1,150] and the value of n is {2,4,8,16,…,1024}. As shown in Figure 11, the AUC value converges at a smaller c(n).
As shown in Figure 12, we set c(n)=100 and observed the impact of changes in n on performance. If the value of n is too small, more trajectories are isolated. On the other hand, if the amount of data is large, the larger is the value of n, and the longer is the average running time. n=256 was selected in this experiment.
In the preprocessing stage of the detour detection technology framework, this study preliminarily divides all trajectories into abnormal and normal trajectories using the iBAT algorithm and outputs the maximum travel lengths Max_D, minimum travel lengths Min_D, maximum travel times Max_T, and minimum travel times Min_T. The output data are listed in Table 6.
4.2.2. Experimental results of DTW
As shown in Figure 13, DTW threshold experiments were conducted in this study, and the results showed that when the DTW threshold was 0.0006, the false detection rate was the lowest.
4.2.3. Detour trajectory recognition results considering spatiotemporal features
Table 7 shows the distribution of the abnormal trajectories in terms of driving distance and time. It can be seen that more than 60% of abnormal trajectories take longer time and cover more distance than normal trajectories, indicating that intentional detour behavior is one of the main motivations behind abnormal taxi driving behavior.
Each set of trajectories was divided into four categories. The first category is TP (True Positive), indicating that abnormal trajectories are correctly classified as abnormal. The second category is FP (False Positive), indicating that normal trajectories are incorrectly classified as abnormal. The third category is FN (False Negative), indicating that abnormal trajectories are incorrectly classified as normal. The last category is TN (True Negative), indicating that normal trajectories are correctly classified as normal.
In this study, the DR (detection rate) and the FAR (false alarm rate) are used to determine the accuracy of the anomaly detection results. DR indicates that the abnormal trajectory is correctly classified as abnormal, that is, the proportion of anomalous trajectories that were successfully detected. FAR refers to the proportion of normal trajectories that are incorrectly classified as abnormal. DR and FAR can be defined as follows:
The anomaly detection algorithm is more effective when DR is closer to 1 and FAR is closer to 0. This study describes the degree of balance between these two indicators by plotting the ROC curves. This study quantified the trade-off by plotting FAR on the x-axis and DR on the y-axis and measuring AUC value.
This study compares the misjudgment rate of the iBAT, iBAT + DTW, and iBAT + DTW algorithm considering the driving distance and time based on 8 million taxi trajectory datasets. The specific steps are to randomly divide the total data into three data sets, and experiment three methods three times based on each data set. The final misjudgment rate value of each method is the average value of three experiments. The results are presented in Table 8. The second method adds the DTW algorithm to the iBAT algorithm to measure trajectory similarity. The abnormal trajectory is further distinguished based on the difference between the abnormal and normal trajectories, thereby effectively reducing the misjudgment rate. Compared with the first two methods, the third method can more accurately identify the detour fraud trajectory owing to the additional consideration of the travel time and mileage.
This study analyzed the taxi trajectory in Nanjing for one week. The color of the trajectory line represents the speed of the vehicle (blue indicates fast speed, and yellow indicates low speed), as shown in Figure 14(a). Based on spatiotemporal data, this study identified the similarities and trends of abnormal trajectories. The green heat map represents the area from where the abnormal trajectory originates. It has been found that the origin of abnormal trajectories is mainly concentrated in Nanjing Railway Station, Lukou Airport, Nanjing South Railway Station, and other large transportation hubs, important urban subway stations, areas with a large number of tourists or remote areas with fast moving vehicles and scattered roads.
For example, the trajectories from trajectory set T-1 in the same period were selected and visualized to further analyze the microscopic characteristics of the detour trajectory. Figure 14(b) shows the test trajectory from T-1, where points S and D correspond to the starting and ending points of the trajectory set, respectively. T0 is a normal trajectory, and the detected detour trajectories are represented by dotted lines, which include the partial detour trajectories T1, T2, T3, and the global detour trajectories T4, T5.
Matching the T-1 trajectory set with the map, it was found that the trajectory set included trajectories from Nanjing Lukou Airport to Confucius Temple. As shown in Figure 14(a), the areas where the starting and ending points of these trajectories are located have high incidences of detours. Meanwhile, in this normal road section, the average speed of the road from point S to point O was high, and the traffic conditions were good.
Therefore, it can be ruled out that the taxi driver detoured owing to road congestion or a traffic accident in this stretch during this time period, and detours T1, T3, T5 are judged to be intentional detours. In the normal track T0, the section between points O and D experienced traffic congestion, so some of the detour trajectories T2 and T4 might be regarded as detours to avoid congested stretches. In addition, this study generates order feature information corresponding to the detected detour trajectories and provides them to off-site law enforcement managers. The order feature information provided includes mileage, travel time, driving costs, vehicle information, and driver information, which were compared with those of normal trajectories. For comparison, as shown in Table 9, one order is selected for each group of detour trajectories.
5.
Conclusions
This study proposes a technical framework for detour trajectory anomaly detection, which is divided into preprocessing and detection stages. In the preprocessing stage, taxi historical trajectory data were first converted into a grid sequence, and trajectories that passed through the same start-end grid pair were grouped together. Second, anomaly detection based on the iBAT on the trajectory was performed, a trajectory set including normal and abnormal trajectories was obtained, and the travel time of all normal trajectories was analyzed to determine the threshold. During the detection stage, the trajectory grid sequence selected the corresponding groups according to the S-D point pairs and measured the similarity with the normal trajectory. The distance matrix was then calculated, the DTW score was obtained, and the mean and variance of the normal trajectory travel time were compared. Finally, the output detours the trajectory. The innovation of this study is adding distance metrics (DTW) and parameters of normal trajectories (average driving distance and time) to iBAT algorithm to achieve accurate identification of detour trajectories.
To verify the actual effect, this study marked the detour trajectory data in the Nanjing 2021 case list, corresponding to the grid sequence as the experimental data, and used the model proposed in this study to analyze the test data. The experimental results show that the method proposed in this study has a low misjudgment rate for taxi detours. The analysis results of the model test data in this study showed that there is a certain correlation between the choice of detours and geographical location, and that it is highly correlated with time and space factors.
In the future, based on the existing research results, an online detection framework for detecting the abnormal behavior of taxis can be proposed, which can not only detect road segments with abnormal behavior, but also update the route behavior model through newly added trajectories to detect the detour behavior of taxi drivers in real time.
Conflict of interest
The authors declare there is no conflict of interest.