The rising accessibility in gambling products, such as Electronic Gaming Machines (EGM), has increased interest in the effects of gambling; in particular, the potential for impulse control disorders, such as problem gambling. Nevertheless, empirical research of EGM gambling behaviour is scarce. In this exploratory study, we apply data mining techniques on 46,416 gambling sessions, collected in situ from 288 EGMs. Our research focused on identifying the at-risk behavioural markers of sessions to help distinguish gambling personae. Our data included measures of gambling involvement, out-of pocket expense of sessions, amount won, and cost of gambling. This research, discusses the methodology used to collect and analyze the required gambling measures, explains the criteria used for identifying valid sessions, and combines outlier mining methods to identify instances of heavily involved gambling (i.e., outliers). Our results suggest that sessions were classified as potential non-problem, potential low-risk, potential moderate risk, and potential problem gambling sessions. Further, outlier sessions were more heavily involved in terms of gambling intensity and amount redeemed, despite having low duration times. Finally, our methods suggest that the lack of player identification does not prevent one from identifying the potential incidence of problem gambling behaviour.
Citation: Maria Gabriella Mosquera, Vlado Keselj. Identifying electronic gaming machine gambling personae through unsupervised session classification[J]. Big Data and Information Analytics, 2017, 2(2): 141-175. doi: 10.3934/bdia.2017015
[1] | Marco Tosato, Jianhong Wu . An application of PART to the Football Manager data for players clusters analyses to inform club team formation. Big Data and Information Analytics, 2018, 3(1): 43-54. doi: 10.3934/bdia.2018002 |
[2] | Wenxue Huang, Qitian Qiu . Forward Supervised Discretization for Multivariate with Categorical Responses. Big Data and Information Analytics, 2016, 1(2): 217-225. doi: 10.3934/bdia.2016005 |
[3] | Yiwen Tao, Zhenqiang Zhang, Bengbeng Wang, Jingli Ren . Motality prediction of ICU rheumatic heart disease with imbalanced data based on machine learning. Big Data and Information Analytics, 2024, 8(0): 43-64. doi: 10.3934/bdia.2024003 |
[4] | Guojun Gan, Qiujun Lan, Shiyang Sima . Scalable Clustering by Truncated Fuzzy c-means. Big Data and Information Analytics, 2016, 1(2): 247-259. doi: 10.3934/bdia.2016007 |
[5] | Jiaqi Ma, Hui Chang, Xiaoqing Zhong, Yueli Chen . Risk stratification of sepsis death based on machine learning algorithm. Big Data and Information Analytics, 2024, 8(0): 26-42. doi: 10.3934/bdia.2024002 |
[6] | Sayed Mohsin Reza, Md Al Masum Bhuiyan, Nishat Tasnim . A convolution neural network with encoder-decoder applied to the study of Bengali letters classification. Big Data and Information Analytics, 2021, 6(0): 41-55. doi: 10.3934/bdia.2021004 |
[7] | Cai-Tong Yue, Jing Liang, Bo-Fei Lang, Bo-Yang Qu . Two-hidden-layer extreme learning machine based wrist vein recognition system. Big Data and Information Analytics, 2017, 2(1): 59-68. doi: 10.3934/bdia.2017008 |
[8] | Jinyuan Zhang, Aimin Zhou, Guixu Zhang, Hu Zhang . A clustering based mate selection for evolutionary optimization. Big Data and Information Analytics, 2017, 2(1): 77-85. doi: 10.3934/bdia.2017010 |
[9] | M Supriya, AJ Deepa . Machine learning approach on healthcare big data: a review. Big Data and Information Analytics, 2020, 5(1): 58-75. doi: 10.3934/bdia.2020005 |
[10] | Guojun Gan, Kun Chen . A Soft Subspace Clustering Algorithm with Log-Transformed Distances. Big Data and Information Analytics, 2016, 1(1): 93-109. doi: 10.3934/bdia.2016.1.93 |
The rising accessibility in gambling products, such as Electronic Gaming Machines (EGM), has increased interest in the effects of gambling; in particular, the potential for impulse control disorders, such as problem gambling. Nevertheless, empirical research of EGM gambling behaviour is scarce. In this exploratory study, we apply data mining techniques on 46,416 gambling sessions, collected in situ from 288 EGMs. Our research focused on identifying the at-risk behavioural markers of sessions to help distinguish gambling personae. Our data included measures of gambling involvement, out-of pocket expense of sessions, amount won, and cost of gambling. This research, discusses the methodology used to collect and analyze the required gambling measures, explains the criteria used for identifying valid sessions, and combines outlier mining methods to identify instances of heavily involved gambling (i.e., outliers). Our results suggest that sessions were classified as potential non-problem, potential low-risk, potential moderate risk, and potential problem gambling sessions. Further, outlier sessions were more heavily involved in terms of gambling intensity and amount redeemed, despite having low duration times. Finally, our methods suggest that the lack of player identification does not prevent one from identifying the potential incidence of problem gambling behaviour.
Due to the increasing accessibility of gambling products and the rising popularity of Internet gambling [13], [26], [32], [45], interest in the effects of gambling has grown over the past decade. Within the gambling industry, Video Lottery Terminals (VLT) and Electronic Gaming Machines (EGM) are considered the dominant segment [3]. EGMs, in particular, can be found in different types of venues, such as bars, restaurants and hotels, they can hold several types of games within a single machine, and can attract a high number of gamblers due to their structural characteristics (e.g., flashing lights, music, fast paced type of play) [41]. For example, in 2009, Australian gamblers spent $19 billion on gambling products [3], 55.3% of that revenue was generated through EGMs in clubs and hotels [3]. In 2010, EGMs generated 32.9% [16] of the Canadian gambling industry's $13.74 billion revenue [33]. By 2012, EGMs generated 65.32% of the Canadian gambling industry's $13.87 billion revenue, and EGMs located in bars and lounges were responsible for 47.8% of that total revenue [42]. Clearly, there has been an increase in the acceptance and accessibility of gambling products, as evidenced by the increase in revenue. Nevertheless, this growth has also increased the potential for impulse control disorders, such as problem gambling.
Problem gamblers, in particular, have difficulty limiting themselves from gambling excessively, regardless of the detrimental consequences that their actions can cause to themselves or others (e.g., family, friends, colleagues) [9], [32]. However, there is little consensus in regards to the prevalence rate of this disorder. Standardized prevalence rates of problem gambling, according to Williams et al.[52], range from 0.5% to 7.6%, with the lowest prevalence rates occurring in European countries and higher rates in Asian countries. According to Williams et al.[52], Sweden, Switzerland, Canada, Australia, Italy, United States, Estonia, and Finland share a prevalence rate on par with the worldwide average of 2.3%. While the root cause of the disorder is unknown, research suggests problem gambling is likely to occur parallel to other behavioural problems, such as substance abuse, eating disorders, or compulsive shopping [35]. In regards to its incidence, while problem gambling can occur in every demographic group [3], [13], the disorder does seem to be more predominant in men than women [45]; people of lesser means are at a greater risk of problem gambling than individuals in a higher socio-economic status, as they see a greater potential for financial gain and stability in gambling; and due to their increased accessibility to gambling products, casino workers are also at a higher risk for developing this disorder. Problem pathological gamblers, on the other hand, are considered to suffer from a severe type of problem gambling [9].
The American Psychiatric Association (APA) considers problem pathological gambling as an impulse control disorder due to the pleasure that the gambler obtains from the act of gambling [2]. The APA has defined ten criteria to guide in the process of diagnosing problem gambling (i.e., preoccupation, tolerance, withdrawal, escape, chasing, lying, loss of control, illegal acts, risked significant relationship, bailout) [35], some of which are similar to characteristics found in other impulse disorders such as substance abuse (e.g., tolerance, withdrawal, loss of control, preoccupation, mood alteration). However, not all gambling behaviour results in problem gambling, as the disorder can range from at-risk, problem, sub-clinical, pathological, probable pathological, extremely pathological, in-transition, and compulsive gambling [45]. The Canadian Problem Gambling Index (CPGI) [15], which includes questions centered on the factors contributing to gambling addiction [10], [13], is often used to study the social context and predominance of gambling and problem gambling. Though there is a growing need for studies from which to gather a greater understanding of the expression of the disorder [36], research focusing on actual gambling behaviour is limited [46], and empirical research of EGM gambling behaviour is particularly scarce. Thus, in this initial exploratory study we look into the application of data mining techniques on EGM gambling data, with the goal of identifying the at-risk behavioural markers of EGM gambling sessions and distinguishing types of gambling (i.e., gambling personae) based on the behavioural characteristics of gambling sessions.
To identify the gambling involvement measures [5], [6], [26], [55], that can serve as behavioural markers of EGM gambling sessions, our research uses EGM gambling measures collected in situ, over a one-month period (i.e., July 2010), from a single EGM manufacturer. Gambling measures, generated during a session, can provide more information about gambling events, such as game titles, wagered amounts, bet outcomes, bonus rounds activity, out-of-pocket cost of a session, and amount cashed-out at the end of a gambling session. Gambling involvement measures, in particular, have been associated with excessive gambling behaviour [5] and can be measured in regards to financial involvement (e.g., amount wagered), time involvement (e.g., duration), gambling intensity (e.g., bets, bets per minute), and gambling cost (e.g., net loss) [26]. By conducting an analysis of gambling measures, our study aims to provide a better insight into the expression of EGM gambling behaviour and identify potential instances of problem gambling, as individuals suffering from this impulse control disorder may, at times, feel the need to gamble excessively (e.g., larger bets, bigger risks) in order to make up for previous losses [2].
However, the use of EGM data does pose some limitations on this research. For example, as players did not use logins or loyalty cards on the EGMs from which the data was collected, it was difficult to assess if a single subject generated a single gambling session or whether a single subject produced multiple sessions, thus this research did not assume an independence of sessions nor did it attempt to identify individual gamblers; instead, this research focused on identifying gambling personae. Furthermore, similar to LaBrie et al's [26] study of Internet sports gambling behaviour, in this initial exploratory study, we used aggregated measures of EGM gambling. While this approach allowed us to identify measures of gambling involvement in EGMs, the simplicity of this data limited our efforts to perform further analysis that could provide more detailed information in regards to each session (e.g., wager variability, trajectory of wagers, bonus round activity). However, the use of anonymous EGM gambling data has the potential to provide a better understanding of EGM gambling activity than what can be gathered through surveys, as it minimizes the risk of inaccurate results due to evaluation apprehension, as well as self-presentation and recall bias [26], [29].
The remaining sections of this article are organized as follows: In Section 2, a discussion of the background and related literature is provided with the aim of presenting the reader with a view of the research available in the field of gambling studies; Section 3 presents a methodology for unsupervised EGM gambling session detection, and describes our process for preprocessing the identified EGM sessions. Section 4 provides the clustering methodology for identifying the at-risk behavioural markers of EGM gambling sessions from which to recognize cases with similar behavioural characteristics. Section 5 provides a discussion of the identified types of EGM gambling. Finally, Section 6 describes the differences between normal sessions and outliers (i.e., heavily involved gambling sessions), in order to determine the likelihood of a session being assigned to a particular cluster based on its behavioural characteristics.
The growing accessibility of gambling products [13], [32], [45], and rising popularity of Internet gambling [26], has increased interest in the affects of gambling over the past decade [13]; particularly, due to the potential for health risk factors for impulse control disorders such as problem gambling. And though the root cause of the disorder is unknown [35], the likelihood of the disorder to occur parallel to other behavioural problems (e.g., substance abuse, eating disorders, shopping addictions) is quite high [3], [13]. However, despite the growing need for studies that help understand excessive gambling [36], there is little research focusing on the analysis of actual gambling behaviour [46]. For example, the CPGI [15] survey, includes questions centred on the factors contributing to gambling addiction such as the nature of gambling products, gambling experience, accessibility, anonymity, affordability, interactivity, and convenience [10], [13], and is used to study the social context and predominance of gambling, and problem gambling.
Individual's, based their CPGI score, can be classified into one of five categories: non-gambling, non-problem gambling (score=0), low risk gambling (score=1-2), moderate risk gambling (score=3-7), and problem gambling (score=8-27) [3]. Other standards, such as the South Oaks Gambling Screen (SOGS) [3], the National Opinion Research Center Diagnostic and Statistical Manual of Mental Disorders Screen for Gambling Problems (DSM-IV) [2], [32], and the Gambler's Anonymous Scale (GA20) [2], [32], also tend to be used in conjunction with the CPGI survey. However, the use of survey tools as the only means for assessing problem gambling can sometimes produce inaccurate results due to threats to construct validity [26], [29]. Controlled studies have also been done in order to assess the impact of external factors on gambling behaviour such as alcohol consumption and music tempo [12], [14], [31], [48], with results showing positive correlations between alcohol consumption and gambling time, and music tempos and gambling intensity. However, the use of laboratory studies for analyzing gambling behaviour, may impose limitations on the generalizability of the obtained results, as these studies lack the realism of in situ behaviour.
Observational studies by Harrigan and Dixon [20], Dixon et al. [11], and Harrigan [21], [22], have focused on analyzing the impact of the structural characteristics of slot machines on gambling activity (e.g., illusion of control, entrapment, frustration, near misses), with results indicating that certain structural characteristics, such as stop buttons, bonus modes, hand-pays, and 'near-misses', can lead to a player's increase in gambling involvement, and the development of inaccurate beliefs in regard to personal skill and win probability. Though these observational studies provide an insight into the impact of external factors on gambling, they do not focus solely on the gambling patterns generated from the observed gambling activity.
While conducting this literature review, it was evident that studies focusing on EGM gambling behaviour are scarce in the field of data mining; perhaps due to the lack of player identification (i.e., player ID) in EGM data, as these machines often require cash rather than logins or loyalty cards, which can be an obstacle when attempting to identify EGM gambling sessions. Nonetheless, the use of data mining techniques for behaviour analysis [4], [43], has, in the past, been successfully applied for facilitating behaviour classification and identification, such as customer classification (e.g., loyal, discount, opportunistic, wandering, need-based, impulse) [43], identification of at-risk behaviours (e.g., at-risk academic performance, credit risk evaluation) [38], [53], and the recognition of negative risk-taking behaviour such as dangerous driving or substance abuse [34].
This exploratory research focuses on the application of data mining techniques on aggregated measures of EGM gambling, with the goal of identifying the measures of gambling involvement that can serve as the behavioural markers of EGM gambling sessions. These gambling involvement measures allow for the behavioural characteristics of sessions to be explored. The use of real EGM gambling data may increase the likelihood of generalizing our results to the general EGM gambling population. However, unlike Internet live-sport gambling data, EGM gambling data does not contain player identification information, which makes it difficult to identify whether a single individual generated a single session or whether they generated multiple sessions. As such, the lack of player IDs, as EGM gamblers do not tend to use logins or loyalty cards, limits our ability to identify individual gamblers as well as assume an independence of sessions, and suggests the need for unsupervised session detection in EGM data. Therefore, in this research, it was of particular importance to define what constitutes an EGM gambling session. While unsupervised session detection methods have been used for web session detection [30], the structure of EGM messages and the communication protocol used in these machines [17], [18] indicated that specifying time thresholds would not be suitable for the purposes of this research. Instead, specific gambling events could help determine a criterion for defining an EGM gambling session. Once sessions are identified, suitable data preprocessing and transformation techniques (e.g., smoothing, normalization, aggregation) can be applied to increase the quality of our results [19] by providing other variables (e.g., bets per session, cost of gambling, ratio of losses) to assist in our analysis.
Measures of gambling involvement have been associated with excessive gambling behaviour [5], i.e., increasing gambling involvement expressed during a session [36], and can be measured in regards to financial involvement (e.g., amount wagered), time involvement (e.g., duration), gambling intensity (e.g., bets, bets per minute), and gambling cost (e.g., net loss) [26]. In this regard, by conducting an analysis of gambling measures, our study could provide a better insight into the expression of gambling behaviour and identify potential instances of problem gambling, as sessions generated by subjects suffering from this impulse control disorder may incur larger bets or take bigger risks in order to make up for previous losses [2].
In the first of a series of longitudinal studies of Internet sport gambling data, and the only study found to be closely related to the research at hand, LaPlante et al. [29] found certain measures of gambling involvement, such as intensity and frequency, could significantly contribute to the incidence of problem gambling behaviour. Later on, LaBrie et al. [45] found gamblers who imposed limits on their gambling activity, incurred longer duration times than the rest of the sample, despite decreasing their total amount wagered. These findings suggested the importance of session duration (i.e., game time) as another measure of gambling involvement. Subsequently, LaBrie et al. [27] found Internet gamblers who played casino-style games (e.g., slots) incurred larger gambling costs (i.e., net loss) despite playing less than sports bettors, and suggested net loss and total amount wagered as important measures for gambling involvement.
Further studies of Internet live sport gambling [7], [55], focusing on analyzing the betting patterns of gamblers, have also identified gambling intensity, gambling frequency, variability of bet sizes, and the trajectory of gambling activity, to be important variables for analyzing problem gambling, as they take into consideration the general gambling behaviour of at-risk players, such as overconfidence from early large winnings and increasing bet sizes to achieve the same excitement experienced after their first large win [13], and are consistent with the personality traits of problem pathological gamblers, such as negative urgency and sensation seeking [32], [34]. Thus, the aggregated data used in this exploratory research contained information regarding the length of gambling sessions (i.e., duration), total number of bets (i.e., bets), gambling intensity (i.e., bets per minute), total amount wagered (i.e., redeemed), cost of gambling sessions (i.e., net loss), and the ratio of losses (i.e., %loss). While the lack of payer identification limited our ability to explore gender differences in regards to gambling behaviour, the results shown in LaBrie et al. [27] suggested no gender differences in live-sport and casino-type gamblers.
In regards to identifying types of gambling behaviour in EGM gambling sessions, data mining techniques, such as clustering [19], can be used to group together sessions with similar behavioural characteristics (e.g., duration, intensity, frequency of bets). Among the various techniques for clustering data (e.g., partitioning, hierarchical, grid-based, model based, and constraint-based methods) [19], k-means clustering is one of the most widely used partitioning methods [4]. However, there are certain issues that emerge when using k-means clustering in large data sets, such as case order effect [37], selection of suitable evaluation variables [4], [26], [51], data comparability [7], [19], and k-means instability [7], [23], [40]. The results shown in Braverman and Shaffer [7] provide an example of k-means clustering for analyzing the betting patterns of Internet live-sport gamblers.
In their research, Braverman and Shaffer [7] classified gamblers into four clusters. Gamblers within the high-risk sub-group showed intensive and frequent betting, high wager variability, positive gambling trajectory, and were at a much higher-risk for closing their account due to gambling-related issues than the rest of the sample. Furthermore, these results were consistent with those of Xuan and Shaffer [55], who analyzed the gambling patterns of Internet live-sport bettors during their last month of gambling activity; both studies analyzed the same data set used in LaBrie et al. [27]. The results from Braverman and Shaffer [7] highlighted the significance of selecting a stable solution for
There are numerous methods for selecting
To assess the stability of a k-means solution, researchers can define a clustering criteria (e.g.,
Outlier detection methods can be classified into statistical, proximity-based, density based, and clustering-based methods depending on the assumptions they make [19]. Statistical methods assume the data is normal and use a discordancy test [19] to find outliers; however, most statistical methods are only suitable for univariate data, and can at times miss outliers [19]. Density-based approaches assess the degree to which a data object can be an outlier (i.e., Local Outlier Factor) [19], though these methods fail to provide the level of detail that can be obtained by combining proximity-based and clustering-based methods [1], [19]. Proximity-based methods use a distance measure (e.g., standard deviation, median rule, Tukey's outlier labeling method) as a way of assessing the similarity between data points, and avoid excessive efforts associated with discordancy tests. Finally, clustering-based methods focus on exploring the relationship between data objects and their clusters to identify single outliers or a cluster of outliers [19]. Clustering-based and proximity-based outlier detection methods were of particular interest for this research, as the outlier mining methodology consisted of clustering the data points before using a distance measure to identify contextual outliers [19].
Tukey's Outlier Labeling Method (OLM) [44], a commonly used outlier detection method, makes no assumptions of normal distribution, and looks at the bottom (i.e., 25th percentile) and top (i.e., 75th percentile) quartiles of a sample to determine the upper and lower limits (i.e., hinges) of a distribution [24], [25] with data objects beyond these limits labeled as 'outliers'. However, Tukey's OLM is not appropriate for asymmetric data, as the number of outliers tends to increase in skewed data [44]. On the other hand, the standard deviation (SD) method, allows for researchers to examine the presence of data objects at
While the SD method is only appropriate for univariate data, the findings in LaBrie et al. [26] suggested this outlier detection method was well suited for this analysis. In their research, LaBrie et al. [26] showed heavily involved gamblers were discouraged by losses, as an increase in %loss often resulted in other variables decreasing (e.g., frequency, intensity, wagered amount); these findings suggested that heavily involved gamblers tended to assess the risk of a wager and self-moderate their behaviour (e.g., reducing intensity while increasing gambling duration), the latter similar to the controlled behaviour seen in substance abuse [26]. Similarly, Xuan and Shaffer [55] found heavily involved gamblers tended to have an involvement-seeking and risk-averse gambling behaviour. The results from these studies [26], [55] suggested problem gamblers were likely to show heavily involved gambling behaviour on one aspect of gambling rather than across variables.
Despite the growing need for studies to help understand gambling behaviour, as well as problem gambling behaviour, research of actual gambling data is scarce [46]; particularly, research focused on EGM gambling data. While analysis done on Internet sport gambling sheds a light on the behaviour of on-line problem gamblers, it is not possible to generalize their results to EGM gamblers, a deficiency also found in controlled gambling studies [12], [14], [31], [48], due to the impact that EGM gambler proximity may have on EGM gambling behaviour. Another difference is the lack of anonymity for each gambler, which is an appealing characteristic of online gambling [27], as EGM gambling cannot be done remotely. Nevertheless, the results from longitudinal studies [6], [7], [26], [27], [28], [29], [46], [55], highlighted the importance of analyzing actual gambling data in order to more accurately identify unusual changes in gambling patterns. And while there are a number of factors that can facilitate problem gambling, such as gambling accessibility and availability of help services [10], [13], these aspects and their impact on EGM gambling behaviour are beyond the scope of this research.
Behavioral analysis of EGM gambling data can certainly increase understanding of the expression of the disorder during a gambling session. This initial exploratory study expands on the current literature by defining what constitutes an EGM gambling session, identifying the gambling involvement measures, distinguishing gambling personae (e.g., clusters of sessions) based on the behavioural markers of EGM gambling sessions, and recognizing differences between these clusters as well as between normal sessions and outliers (i.e., heavily involved gambling sessions).
The EGM data used in this exploratory study was collected during the month of July 2010 from a single EGM manufacturer. The purpose of this research was to conduct an analysis of gambling measures in order to identify the gambling involvement measures [5], [6], [26], [55], that could serve as the behavioural markers of EGM gambling sessions and, based on the behavioural characteristics of sessions, distinguish types of gambling. The machines from which the data was collected required cash rather than logins or loyalty cards. While the lack of player IDs can be an obstacle when attempting to identify EGM gambling sessions, as it makes it difficult to assume an independence of sessions, the use of anonymous gambling data allowed us to limit threats to construct validity [8] such as the good subject tendency and evaluation apprehension. As such, this research did not attempt to identify individual gamblers but rather focused on identifying types of gambling (i.e., gambling personae).
Thus, in the following sub-sections we discuss the methodology used to collect and analyze the gambling measures required for this study. First, in Section 3.1, we explain the criteria used for identifying EGM gambling sessions during the data selection process. Second, in Section 3.2, we define the necessary data preprocessing steps to help specify what constitutes a valid EGM gambling session. Third, in Section 3.3, we specify the data transformation tasks needed to increase the overall quality of the mined results.
The EGM gambling data used in this research consisted of a sequence of messages containing information related to gambling events, using the Game to System (G2S) protocol. Among other things, this XML-based protocol, developed by the Gaming Standards Association (GSA) [17], [18], supports real-time calculation of wins, remote EGM configurations, and player tracking [17], [18]. A major benefit of this XML-based standard is its extensibility, as EGM manufacturers are able to develop proprietary extensions in order to customize the implementation of this protocol. Our initial approach to identifying sessions consisted of applying a methodology similar to that used in Liu and Keselj [30] for unsupervised web session detection. The implementation of such methodology involved applying a time-lapsed between events approach; in this regard, our research would have used gambling events rather than web pages, and a time threshold of fifteen (15) seconds between events to determine which session an event belonged to. In other words, our assumption was that gambling events taking place within fifteen seconds of each other, and on the same EGM, would belong to the same session; messages taking place after the 15-second mark would be assigned to a new session. However, after exploring the dataset, it was clear that a more precise approach could be considered.
G2S messages operate in one of two levels, the message level and the application level [17], [18]. Messages operating within the message level are responsible for acknowledging requests, and though they may be useful for EGM fault detection, they were not relevant to this research. On the other hand, messages operating in the application level are in charge of handling and processing requests [17], [18]. Within the application level, there are two types of messages, multicast messages and g2sBody messages. Multicast messages were not found to be relevant as they are used for remote configurations [17], [18]. Finally, g2sBody messages, used for communications between a single host and a single EGM, are responsible for processing EGM requests and can contain information related to game-play events such as game title, amount wagered, bet results, and bonus round activity.
In general, the context of a G2S message can be specified through the use of classes, which serve as containers of physical and/or logical devices (e.g., note acceptor device). Thus, in this research, in order to define a gambling session, we set out to identify the specific classes in charge of handling gambling requests (e.g., bets), and reporting the results of a gambling event. Identifying these classes allowed for the researchers to note the commands used to report a gambling event (e.g., a bill is redeemed). These commands make use of attributes, which provide further information on the events taking place during a gambling session (e.g., $5 redeemed). Using the information provided within these relevant messages, we were able to define parameters for identifying gambling sessions. In other words, rather than using a time-lapsed between events approach, we aimed at specifying actions that could serve as markers for the start and end of session.
For the purposes of this research, in order for gambling sessions and play-personae to be identified, messages must first be grouped together according to EGM. Second, sessions can only contain game-play related classes (i.e., g2sBody messages providing game-related information). Third, gambling sessions must start by indicating a session reported no money in the bank and some sort of currency (e.g., bills, coins, or vouchers) was entered into an EGM for the first time, this criteria was necessary as EGM players are able to enter bills throughout a gambling session; the specific type and method of currency is dependent on the EGM manufacturer's configuration. Fourth, if a session has winnings at cash-out, the session would end with a voucher being issued; in the event a session ends without any winnings (i.e., no credits remaining), the gambling session would end with a message showing the result of the last wager was a loss and no credits were left in that session. Once sessions were identified, relevant game-play information was extracted and gambling measures were aggregated. The measures collected included the EGMs IDs, session duration, intensity, amount redeemed throughout a session, and amount received in voucher form.
A total of 288 EGMs were identified as machines involved in actual game-play activity, these EGMs produced a total of 46,514 gambling sessions. The aggregated data within these sessions included the duration of a session measured in seconds, the average intensity of a session, the total amount of money redeemed by a player throughout a session, the amount of money obtained in voucher form per session, and the EGM's ID which was converted to a random number in order to ensure player and EGM anonymity.
In terms of gambling activity, there were five games played in these sessions, four slot-machine-type games and one poker game. The 46,514 sessions amounted to a approximately 35,095.44 hours of gambling activity, the dataset was then binned based on session duration. By binning the data, the researchers aimed to define what constitutes a valid EGM gambling session. As shown on Figure 1, sessions with a duration time within 24 hours, were binned into eight (8) hour bins (e.g.,
Thus, in order for sessions to be considered to be valid, certain conditions must be met. First, a session cannot be considered valid if no gambling activity occurred, thus, valid gambling sessions must contain at least one bet placed. Second, the amount redeemed in a session must be greater than 0. Third, the total duration of a session cannot exceed 18.5 hours (i.e., 1,110 minutes); the specified time threshold was based on the maximum number of hours a non-casino venue would likely be open during the Summer months (i.e., 7:30am to 2am). Though removing sessions with a duration time shorter than five (5) minutes was considered, the researchers noticed that these sessions had a minimum duration time of three minutes, a high gambling intensity and an amount redeemed greater than €5, with €200 being the maximum amount redeemed; as such, these short sessions were not removed from the research sample.
The original research sample consisted of 46,514 sessions, after applying the aforementioned conditions for identifying valid EGM gambling sessions, our final research sample consisted of 46,416 sessions. A total of 98 cases were removed from the original research sample, five of these cases reported no gambling activity (i.e., no bets placed), and 93 cases were removed for having a total duration time greater than the specified total duration threshold (i.e., 18.5 hours or 1,110 minutes). As shown on Figure 2, 98.3% of valid sessions fell within the 8 hour mark (i.e., 480 minutes) and only 0.3% of sessions had a duration time between the 16 and 18.5 hour mark.
In our final dataset, session duration values were transformed from seconds to minutes to increase comparability with gambling intensity. In regards to gambling intensity, this value was calculated by dividing the total number of bets recorded in a session by the total duration (in minutes) of that session, the values for gambling intensity were smoothed by removing decimals. Calculating the intensity of sessions was of particular importance as it was one of the four variables used to assess problem gambling (i.e., intensity, frequency, variability, trajectory) [7], [13]. The total number of bets in a session was tallied, the cost of a session (i.e., net loss) was calculated by deducting the out-of-pocket cost of a session (i.e., redeemed amount) from the winnings reported in a session (i.e., amount issued in voucher form at cash-out), and the ratio for losses (i.e., %loss) was calculated based on the total amount redeemed during a session.
The 46,416 sessions amounted to approximately 32,516.75 hours of gambling activity (i.e., 1,951,005.05 minutes), with a total of 17,329,709 bets placed; the maximum duration of a session was 18.3 hours (i.e., 1099.72 minutes). For the purposes of this research, Euros (€) were used as the unit of measure for the amount of money wagered within a session (i.e., Redeemed), and for the amount obtained in voucher form (i.e., Vouchers) at the end of a session. In these sessions, a total of €3,272,065.00 were redeemed (i.e., amount of money entered into an EGM), and a total of €2,341,148.58 were issued in voucher form (i.e., player winnings at the end of a session). These sessions had a total net loss of €2,417,693.39. In regards to gambling intensity (i.e., bets per minute), as shown on Table 1, the maximum intensity, reported in these sessions, was 68 bets per minute (bpm). Interestingly, only 8,981 sessions reported any winnings, all of these sessions had a duration time within the eight (8) hour mark. One case in particular reported a voucher amount of €40,833.05; in fact, there were sixteen (16) cases in which a significantly large voucher amount was issued, likely the result of a jackpot.
Variables | Mean | SD | Median | Mode | Max. | Min. |
Durationa | 42.03 | 109.09 | 15.67 | 3.87 | 1099.72 | 0.17 |
Bets | 373 | 606 | 170 | 3 | 13,282 | 2 |
Intensity | 16 | 10 | 19 | 20 | 68 | 0 |
Redeemedb | 70.49 | 150.75 | 30.00 | 20.00 | 6,425.00 | 5.00 |
Vouchersb | 50.44 | 384.18 | 0.05 | 0 | 40,833.05 | 0 |
Net Lossb | 52.09 | 133.39 | 19.90 | 0 | 6,424.69 | 0 |
%Loss | 75.46 | 40.97 | 99.97 | 100.00 | 100.00 | 0 |
a. Measured in minutes. b. Measured in Euros. |
While the mean values for all variables, shown on Table 1, do not seem high, the relationship between the mean, median, and standard deviation values suggested a non-normal sample distribution. As such, we explored the use of z-score and min-max normalization prior to conducting a correlation analysis. While both normalization methods preserve relationships among the data [45], z-score normalization has some limitations due to the skewness of the sample, as evidenced by the mean and standard deviation values shown on Table 1. Though there were no clear differences found between both normalization methods on our sample, the min-max normalization method was chosen, as this method has the potential to make outliers more noticeable.
Once the data was normalized, a boxplot analysis was done as part of a normality test. The results from this boxplot analysis, shown on Figure 3, illustrate the non-normality of the sample distribution. For example, Figure 3c shows numerous outliers present in terms of Intensity, with the median (i.e., 19 bpm) closer to the upper quartile of the distribution. Figure 3a, 3b, 3d, 3e, and 3f, show numerous extreme outliers within the Duration, Bets, Redeemed, Voucher, and Net Loss variables, respectively. The results from a normal Q-Q plot analysis, shown on Figure 4, also illustrate the data's clear deviation from the expected normal value. Furthermore, the results of a Skewness test confirmed the non-normal distribution of the sample, as the skewness coefficients were found to be more than twice the value of their respective standard error values. To explore the relationship among the sample variables, the results of the boxplot, Q-Q plot analysis, and Skewness test, suggested the suitability for Spearman's Rank-Order Correlation analysis, as these results showed a monotonic relationship between the aforementioned variables. As explained in Section 4, for the purposes of this research, the results of a correlation analysis can help in identifying suitable evaluation attributes for recognizing gambling personae (i.e., clusters) based on the gambling behaviour expressed in these sessions.
In the following sections we provide an explanation of the methodology used for identifying types of gambling (i.e., gambling personae), based on the behaviour expressed in these sessions. In Section 4.1, we present our methodology for identifying the measures of EGM gambling involvement that can serve as behavioural markers of EGM sessions, which can then be used as evaluation variables for classifying sessions. In Section 4.2, we discuss the clustering techniques applied on the research sample. In particular, we discuss methods for selecting a stable and high-quality yielding solution for
To identify suitable evaluation variables for classifying EGM sessions, the researchers conducted a correlation analysis using Spearman's Rank-Order Correlation coefficient. Also known as Spearman's rho, this non-parametric statistical measure is used for exploring the strength of monotonic relationships among variables of a non-normally distributed data set. One of the benefits of this statistical measure is its lack of sensitivity towards outliers and its assumption of variable independence [9], [35], [46]. Spearman's rho assigns values between -1 and +1 (i.e.,
The results for Spearman's rho, shown on Table 2, indicated a very strong negative monotonic correlation between Vouchers and %Loss (
Spearman™s Rho | ||||||||
Duration | Bets | Intensity | Redeemed | Vouchers | Net Loss | %Loss | ||
Duration | . | .515** | -.208** | .341** | -.334** | .374** | .406** | |
Sig. | . | .000 | .000 | .000 | .000 | .000 | .000 | |
Bets | .515** | . | .606** | .690** | .224** | .356** | -.150** | |
Sig. | .000 | . | .000 | .000 | .000 | .000 | .000 | |
Intensity | -.208** | .606** | . | .433** | .640** | -.003 | -.608** | |
Sig. | .000 | .000 | .000 | .000 | .511 | .000 | 0.000 | |
Redeemed | .341** | .690** | .433** | . | .177** | .584** | -.068** | |
Sig. | .000 | .000 | .000 | . | .000 | .000 | .000 | |
Vouchers | -.334** | .224** | .640** | .177** | . | -.512** | -.980** | |
Sig. | .000 | .000 | .000 | .000 | . | .000 | .000 | |
Net Loss | .374** | .356** | -.003 | .584** | -.512** | . | .592** | |
Sig. | .000 | .000 | .511 | .000 | .000 | . | .000 | |
%Loss | .406** | -.150** | -.608** | -.068** | -.980** | .592** | . | |
Sig. | .000 | .000 | .000 | .000 | .000 | .000 | . | |
** Correlation is significant at the 0.01 level (2-tailed). |
The results from this correlation analysis also showed a moderately positive correlation between intensity and amount redeemed (
In regards to identifying suitable evaluation variables for our clustering analysis, the results shown on Table 2, allowed us to quickly rule out variables with high correlations, such as vouchers and %loss. Other variables such as bets and net loss were excluded due to their redundancy, as these variables can be derived from other variables (e.g., intensity, duration, redeemed, vouchers). Since our primary research question was to identify types of gambling, based on the behaviour expressed throughout these sessions, we used the attributes for duration, intensity, and amount redeemed. The methodology for identifying a suitable clustering criteria is explained in Section 4.2.
In general, clustering algorithms partition data into
For example, the k-means clustering algorithm is known to be vulnerable to the learning effect, particularly when dealing with large data sets [15], as this algorithm can learn to cluster cases based on their order within the dataset. To minimize the clustering algorithm's vulnerability to the learning effect, which could produce biased results, the researchers randomized the order of the cases within the data set, as the case order was originally based on EGM ID. Furthermore, the researchers did not use running means for this analysis.
In regards, to the lack of stability of the k-means clustering solution, the process of selecting
To identify a stable and suitable solution for
To assess the measure of agreement between the cluster memberships for these two sub-samples and the full sample, a Kappa degree of concordance test was done;
The results shown on Table 3 and Table 4 present the resulting cross tabulation tables for sample 1 with the full cohort, and sample 2 with the full cohort, respectively. These tables are useful for obtaining the sensitivity (i.e., Recall) and specificity of a measure. When calculating these values for sample 1, compared with how sessions in this sub-sample were clustered in the full cohort, the results indicated that our test performed quite well when picking sessions as not belonging to a cluster when they did in fact not belong to that cluster (i.e., specificity), as well as when identifying sessions as part of a cluster when they did in fact belong to that cluster (i.e., sensitivity).
Cluster Membership (Sample 1) v. Cluster Membership (Full Sample) Crosstabulation) | |||||||
Cluster Mship. (Full Sample) | |||||||
1 | 2 | 3 | 4 | Total | |||
Cluster Mship. | 1 | Count | 6944 | 15 | 0 | 0 | 6,959 |
(Sample 1) | Expected Count | 2,071.5 | 2,171.1 | 123.2 | 2,593.2 | 6,959 | |
% within Cluster (sample 1) | 99.8% | 0.2% | 0% | 0% | 100% | ||
% within Cluster (full sample) | 100% | 0.2% | 0% | 0% | 29.8% | ||
% of Total | 29.8% | 0.1% | 0% | 0% | 29.8% | ||
2 | Count | 0 | 7,263 | 0 | 44 | 7,307 | |
Expected Count | 2,175.1 | 2,279.7 | 129.4 | 2,722.9 | 7,307 | ||
% within Cluster (sample 1) | 0% | 99.4% | 0% | 0.6% | 100% | ||
% within Cluster (full sample) | 0% | 99.8% | 0% | 0.5% | 31.3% | ||
% of Total | 0% | 31.1% | 0% | 0.2% | 31.3% | ||
3 | Count | 0 | 0 | 413 | 0 | 413 | |
Expected Count | 122.9 | 128.9 | 7.3 | 153.9 | 413 | ||
% within Cluster (sample 1) | 0% | 0% | 100% | 0% | 100% | ||
% within Cluster (full sample) | 0% | 0% | 100% | 0% | 1.8% | ||
% of Total | 0% | 0% | 1.8% | 0% | 1.8% | ||
4 | Count | 0 | 0 | 0 | 8,649 | 8,649 | |
Expected Count | 2,574.5 | 2,698.4 | 153.1 | 3,223 | 8,649 | ||
% within Cluster (sample 1) | 0% | 0% | 0% | 100% | 100% | ||
% within Cluster (full sample) | 0% | 0% | 0% | 99.5% | 37.1% | ||
% of Total | 0% | 0% | 0% | 37.1% | 37.1% | ||
Total | Count | 6,944 | 7,278 | 413 | 8,693 | 23,328 | |
Expected Count | 6,944 | 7,278 | 413 | 8,693 | 23,328 | ||
% within Cluster (sample 1) | 29.80% | 31.20% | 1.8% | 37.3% | 100% | ||
% within Cluster (full sample) | 100% | 100% | 100% | 100% | 100% | ||
% of Total | 29.80% | 31.20% | 1.8% | 37.3% | 100% |
Cluster Membership (Sample 2) v. Cluster Membership (Full Sample) Crosstabulation) | |||||||
Cluster Mship. (Full Sample) | |||||||
1 | 2 | 3 | 4 | Total | |||
Cluster Mship. | 1 | Count | 6,871 | 0 | 0 | 0 | 6,871 |
(Sample 2) | Expected Count | 2,053.1 | 2,148.1 | 122.3 | 2,547.5 | 6,871 | |
% within Cluster (sample 2) | 100% | 0% | 0% | 0% | 100% | ||
% within Cluster (full sample) | 99.6% | 0% | 0% | 0% | 29.8% | ||
% of Total | 29.8% | 0% | 0% | 0% | 29.8% | ||
2 | Count | 28 | 7,162 | 0 | 0 | 7,190 | |
Expected Count | 2,148.5 | 2247.8 | 128 | 2,665.7 | 7,190 | ||
% within Cluster (sample 2) | 0.4% | 9, 9.6% | 0% | 0% | 100% | ||
% within Cluster (full sample) | 0.4% | 99.2% | 0% | 0% | 31.10% | ||
% of Total | 0.1% | 31.0% | 0% | 0% | 31.10% | ||
3 | Count | 0 | 0 | 411 | 0 | 411 | |
Expected Count | 122.8 | 128.5 | 7.3 | 152.4 | 411 | ||
% within Cluster (sample 2) | 0% | 0% | 100% | 0% | 100% | ||
% within Cluster (full sample) | 0% | 0% | 100% | 0% | 1.8% | ||
% of Total | 0% | 0% | 1.8% | 0% | 1.8% | ||
4 | Count | 0 | 56 | 0 | 8,560 | 8,616 | |
Expected Count | 2,574.6 | 2,693.6 | 153.4 | 3,194.4 | 8,616 | ||
% within Cluster (sample 2) | 0% | 0.6% | 0% | 99.4% | 100% | ||
% within Cluster (full sample) | 0% | 0.8% | 0% | 100% | 37.30% | ||
% of Total | 0% | 0.2% | 0% | 37.1% | 37.3% | ||
Total | Count | 6,899 | 7,218 | 411 | 8,560 | 23,088 | |
Expected Count | 6,899 | 7,218 | 411 | 8,560 | 23,088 | ||
% within Cluster (sample 2) | 29.9% | 31.3% | 1.8% | 37.1% | 100% | ||
% within Cluster (full sample) | 100% | 100% | 100% | 100% | 100% | ||
% of Total | 29.9% | 31.3% | 1.8% | 37.1% | 100% |
For example, as shown on Table 3, of the 23,328 sessions in sample 1, a total of 413 were classified into Cluster 3 when being analyzed as part of the full cohort; all of these sessions were correctly classified into Cluster 3 in sample 1, representing a sensitivity and specificity value of 100% (
In regards to specificity, Cluster 2 had the lowest specificity obtained in sample 1. Of 16,050 sessions correctly not classified into Cluster 2 as part of the full sample, 16,006 sessions were correctly not classified into this cluster in sample 1; in other words, in sample 1,
When calculating the values for sensitivity and specificity for the 23,088 sessions clustered in sample 2, shown on Table 4, the results seemed to be on par with those shown on Table 3. For example, Cluster 2 had the lowest sensitivity value as
Furthermore, our macro averaged precision values, as well as our macro averaged recall values, showed that our classification solution performed well on both samples (
However, it must be noted that while there was no significant change in the sample when using min-max normalization compared to z-score normalization, the application of a time threshold on sessions (i.e., 18.5 hours) did drastically improve the results of our split test. For example, without the removal of the 98 sessions with duration times exceeding the 18.5 hour mark, the Kappa value for sub-sample 1 was
When comparing the size of the four resulting clusters, Cluster 4 was the largest with
Cases in Cluster 1, shown on Table 5, seemed to have relatively low intensity (
Cluster 1 Sessions a | ||||||
Variables | Mean | SD | Median | Max. | 25th | 75th |
Duration |
47.14 | 56.17 | 27.65 | 412.77 | 12.33 | 59.20 |
Bets | 122 | 168 | 68 | 3,648 | 34 | 142 |
Intensity | 4.09 | 3.10 | 3.35 | 11.27 | 1.32 | 6.58 |
Redeemed |
30.02 | 45.04 | 20.00 | 1,215.00 | 10.00 | 40.00 |
Vouchers |
1.60 | 23.41 | .00 | 1,962.93 | .00 | .00 |
Net Loss |
29.48 | 45.13 | 20.00 | 1,215.00 | 10.00 | 40.00 |
%Loss | 97.36 | 15.70 | 100.00 | 100.00 | 100.00 | 100.00 |
a. n = 13,843 b. Measured in minutes. c. Measured in Euros. |
On the other hand, cases in Cluster 2, as shown on Table 6, seemed to be characterized by a higher intensity than cases in Cluster 1 (
Cluster 2 Sessions a | ||||||
Variables | Mean | SD | Median | Max. | 25th | 75th |
Duration |
14.34 | 17.26 | 9.07 | 326.85 | 4.57 | 17.32 |
Bets | 247 | 299 | 153 | 5,344 | 75 | 302 |
Intensity | 17.26 | 3.17 | 17.81 | 21.91 | 14.77 | 20.03 |
Redeemed |
54.98 | 94.35 | 30.00 | 2,450.00 | 10.00 | 50.00 |
Vouchers |
44.06 | 412.79 | .20 | 40,833.05 | .00 | 20.00 |
Net Loss |
42.74 | 90.52 | 19.75 | 2,450.00 | 4.60 | 50.00 |
%Loss | 71.77 | 43.03 | 99.25 | 100.00 | 19.05 | 100.00 |
a. n = 14,496 b. Measured in minutes. c. Measured in Euros. |
The results shown on Table 7 indicate that sessions in Cluster 3 have a much longer duration time than those in Cluster 1 and Cluster 2 (
Cluster 3 Sessions a | ||||||
Variables | Mean | SD | Median | Max. | 25th | 75th |
Duration |
782.53 | 160.19 | 780.66 | 1,099.72 | 686.27 | 899.45 |
Bets | 246 | 524 | 86 | 6,715 | 39 | 227 |
Intensity | .32 | .66 | .12 | 7.25 | .05 | .31 |
Redeemed |
52.34 | 127.88 | 20.00 | 1,530.00 | 10.00 | 50.00 |
Vouchers |
.00 | .00 | .00 | .00 | .00 | .00 |
Net Loss |
52.34 | 127.88 | 20.00 | 1,530.00 | 10.00 | 50.00 |
%Loss | 100.00 | .00 | 100.00 | 100.00 | 100.00 | 100.00 |
a. n = 824 b. Measured in minutes. c. Measured in Euros. |
Sessions in Cluster 4, despite having a short duration time (
Cluster 4 Sessions a | ||||||
Variables | Mean | SD | Median | Max. | 25th | 75th |
Duration |
25.84 | 29.94 | 16.07 | 394.65 | 7.57 | 32.75 |
Bets | 688 | 844 | 413 | 13,282 | 193 | 855 |
Intensity | 26.12 | 4.00 | 25.35 | 68.32 | 23.59 | 27.43 |
Redeemed |
116.87 | 218.11 | 50.00 | 6,425.00 | 20.00 | 120.00 |
Vouchers |
97.40 | 499.21 | .45 | 33,602.76 | .20 | 100.00 |
Net Loss |
78.06 | 193.43 | 19.90 | 6,424.69 | .00 | 70.00 |
%Loss | 58.59 | 45.74 | 97.32 | 100.00 | .00 | 99.73 |
a. n = 17,253 b. Measured in minutes. c. Measured in Euros. |
As shown on Tables 5 through 8, with the exception of amount redeemed in Cluster 3, the mean values for all evaluation variables (i.e., duration, intensity, redeemed) were slightly over their respective medians. Figure 5 illustrates the relationships between the evaluation variables. For example, Figure 5a shows the relationship between the duration of sessions and the gambling intensity expressed in these sessions, the relationship between the total amount redeemed throughout a session and the gambling intensity of sessions is displayed in Figure 5b, and the relationship between the total amount redeemed and the duration of sessions is illustrated in Figure 5c.
Interestingly, the application of a time threshold on the duration of sessions had a noticeable impact on the skewness of the clustered results, as this threshold seemed to reduce the distance between data points within their respective clusters. Nevertheless, the results shown on Tables 5 through 8 did indicate a non-normal distribution within our clusters, potentially caused by heavily involved sessions (i.e., outliers); the results of our Outlier Mining analysis are discussed in Section 6. To examine the structure and strength of the relationships between the resulting clusters and their means, and identify any differences between these clusters, a one-way analysis of variance (ANOVA) test, along with the post-hoc comparisons, were carried out on our clustered results. The results of our ANOVA test are discussed in Section 5.1.
In order to examine clusters more closely, the researchers conducted a comparison of means between these clusters to identify any meaningful differences. As the resulting clusters followed an independent groups design [36], a between-groups ANOVA test was chosen instead of a repeated measures ANOVA. In this regard, the researchers were particularly interested on the relationships between duration, intensity, and amount redeemed, with cluster membership. As shown on Table 9, there was a significant difference among the mean scores of the dependent variables across all four clusters.
ANOVA | ||||||
Sum of Squares | df | Mean Square | F | Sig. | ||
Duration | Between Groups | 386.957 | 3 | 128.986 | 85580.335 | .000 |
Within Groups | 69.952 | 46412 | .002 | |||
Total | 456.909 | 46415 | ||||
Intensity | Between Groups | 847.409 | 3 | 282.470 | 110089.371 | .000 |
Within Groups | 119.085 | 46412 | .003 | |||
Total | 966.493 | 46415 | ||||
Redeemed | Between Groups | 1.542* | 3 | .514 | 991.576 | .000 |
Within Groups | 24.052 | 46412 | .001 | |||
Total | 25.593 | 46415 |
As previously mentioned, the relationship between the mean, median, and standard deviation values shown on Table 1, and the boxplot analysis done as part of a normality test shown on Figure 3, suggested a non-normal distribution of our sample. Furthermore, the standard deviations of the dependent variables, shown on Tables 5 through 8, indicated a violation of the homogeneity of variances assumption, which implied the need for a post-hoc analysis [36]. As illustrated in Figure 6, the results of a Tukey's Honestly Significant Difference (HSD) test revealed significant differences (
However, due to the large size of this sample (
In summary, the results from the between groups one-way ANOVA test indicated that there was a statistically significant difference (
While some of the dependent variables (i.e., duration and redeemed) were approximately distributed, the values for the intensity expressed in gambling sessions in Cluster 1 and Cluster 2 were not. As a result, due to its assumption of non-normality of the data [36], the non-parametric Kruskal-Wallis test was used to confirm the differences identified through our ANOVA test. The Kruskal-Wallis test allows for continuous variables, from more than two clusters, to be compared by converting the values for each of the variables into ranks, the mean rank for each of the clusters can then compared [36] to find differences between the groups.
The results of the Kruskal-Wallis test confirmed a statistical difference in the dependent variables across clusters, and provided an insight into how each of the clusters were ranked based on each of the dependent variables. For example, as shown on Table 10, in regards to duration, Cluster 3 had the highest rank (i.e., longest mean session duration when compared to other clusters), followed by Cluster 1. In regards to intensity and amount redeemed, Cluster 4 had the highest mean intensity and redeemed amount when compared to other clusters, closely followed by Cluster 2.
Ranks | |||
Cluster Membership | N | Mean Rank | |
Duration | 1 | 13,843 | 28907.31 |
2 | 14,496 | 16544.68 | |
3 | 824 | 46004.50 | |
4 | 17,253 | 23146.26 | |
Total | 46,416 | ||
Intensity | 1 | 13,843 | 7706.21 |
2 | 14,496 | 21919.71 | |
3 | 824 | 1090.35 | |
4 | 17,253 | 37786.01 | |
Total | 46,416 | ||
Redeemed | 1 | 13,843 | 16455.75 |
2 | 14,496 | 22349.74 | |
3 | 824 | 17893.58 | |
4 | 17,253 | 29601.97 | |
Total | 46,416 |
The results from both tests, ANOVA and Kruskal-Wallis, agree that there are significant differences across all clusters. However, the power of the one-way ANOVA test lies in its ability to provide more detailed information into where these differences may occur. For example, while all clusters were found to be different from each other in regards to amount redeemed, the results of our ANOVA test showed no significant difference between Clusters 2 and 3. Furthermore, the high Partial Eta Squared values obtained for duration (
The non-normality of our clustered results, as shown on Tables 5 through 8 in Section 5, indicated the possible presence of outliers. Though outliers may be caused by measurement errors [45], analyzing cases that do not follow the general model of the data set has been the focus of previous research in fraud detection, customized marketing, medical analysis, and network security [43], [45]. Similarly, in this research, outliers may represent sessions with a passive gambling activity or sessions with riskier behaviour (i.e., heavily involved gambling sessions), as such outliers were not removed from our research sample.
In general, outliers may be classified into global, contextual or collective outliers, though any one outlier may belong to more than one type [4]. Global outliers are the most common as these data points noticeably drift from the rest of the data set. Contextual outliers, on the other hand, drift from the rest of the data objects within a specific context (e.g., cluster of data points). Finally, collective outliers consist of an entire subset of data points that deviate from the rest of the data set [4]. In this case, our focus was to identify contextual outliers within our resulting clusters, and understand why these outliers were placed in these clusters. Thus the outlier identification method chosen must provide some justification of the detection [4], [45].
Detection methods can be classified into supervised, unsupervised, and semi-supervised, depending on whether data objects have been labeled as 'normal' or 'outlier' [4]. In our research, we focused on unsupervised detection methods, since cases had been clustered but not labeled. Outlier detection methods can also be classified into statistical, proximity-based, density based, and clustering-based methods, depending on the assumptions they make [4], [45]. For the purposes of this analysis, we combined two outlier detection methods, proximity-based and clustering-based methods since our main goal was to explore the relationship between data objects and the clusters they belonged to.
Proximity-based methods use a distance measure such as the standard deviation, median rule, or Tukey's OLM, as a way of assessing the similarity between data points. Clustering-based methods focus on exploring the relationship between data objects and their clusters to identify single outliers or a cluster of outliers [19]. Tukey's OLM [44] is one the most commonly used outlier detection methods, it makes no assumptions of normal distribution, and looks at the bottom (i.e., 25th percentile) and top (i.e., 75th percentile) quartiles of a sample to determine the upper and lower limits (i.e., hinges) of a distribution [24], [25]; data objects beyond these limits are labeled as 'outliers'.
However, Tukey's OLM is not always appropriate for asymmetric data, as the number of outliers tends to increase in skewed data [44]. On the other hand, the SD method, allows for researchers to examine the presence of data objects at
The scatterplots shown on Figure 5a through 5c, and the boxplots shown on Figure 7, indicated the existence of heavily involved gambling sessions within Clusters 1, 2, and 4, particularly within the session duration attribute. Thus our focus was to identify outliers, within these three clusters, with respect to amount of time spent gambling (i.e., duration). Specifically, we explored the application of the SD outlier detection method to explore the presence of data objects at three standard deviations from the mean duration (i.e.,
Initially, as shown on Table 5, the 13,843 sessions in Cluster 1 were characterized by having a passive type of gambling activity, with low intensity (
When comparing outliers to normal sessions, as shown on Table 11, Cluster 1 outliers seemed to be more heavily involved in their gambling activity than normal sessions, despite their relatively low involvement. For example, as shown on Figure 8, outliers in this cluster had higher average duration and amount redeemed than normal sessions; the mean duration time for outliers was over five (5) times the average for normal sessions. Though, interestingly, outliers experienced a higher ratio for losses than normal sessions. It seemed that outliers in this cluster were classified into Cluster 1 due to their low gambling intensity and financial involvement (i.e., amount redeemed).
Descriptive Statistics: Cluster 1 Sessions (Normal v. Outliers) | ||||||||
Normal | Outlier | |||||||
Mean | SD | Median | Count | Mean | SD | Median | Count | |
Duration | 23.89 | 16.98 | 10815 | 130.20 | 67.56 | 3028 | ||
Bets | 93 | 93 | 62 | 224 | 292 | 116 | ||
Intensity | 4.67 | 3.02 | 4.22 | 1.99 | 2.37 | .98 | ||
Redeemed | 25.74 | 33.03 | 20.00 | 45.31 | 71.28 | 20.00 | ||
Voucher | 2.02 | 26.37 | .00 | .08 | 4.37 | .00 | ||
Net Loss | 25.05 | 33.09 | 15.00 | 45.31 | 71.28 | 20.00 | ||
%Loss | 96.63 | 17.67 | 100.00 | 99.97 | 1.82 | 100.00 |
The 14,496 sessions originally classified into Cluster 2, as shown on Table 6, seemed to be characterized by a more intense type of play than sessions in Cluster 1 and 3. However, their short duration time, and low ratio for losses, implied the gambling activity in these cases may have been influenced by the degree of risk in a wager. When comparing outliers to normal sessions within this cluster, as shown on Table 12, the 298 identified outliers had a much higher gambling involvement than Cluster 1 outliers. For example, the average duration time for Cluster 1 outliers was 7.5 times greater than that of normal sessions. The average amount redeemed in these outliers was over four (4) times greater than normal sessions, and the %Loss experienced by outliers was also higher than normal sessions.
Descriptive Statistics: Cluster 2 Sessions (Normal v. Outliers) | ||||||||
Normal | Outlier | |||||||
Mean | SD | Median | Count | Mean | SD | Median | Count | |
Duration | 12.64 | 11.73 | 14198 | 95.54 | 34.68 | 298 | ||
Bets | 218 | 209 | 149 | 1599 | 622 | 1461 | ||
Intensity | 17.27 | 3.18 | 17.84 | 16.80 | 2.95 | 16.99 | ||
Redeemed | 51.40 | 81.72 | 25.00 | 225.45 | 292.33 | 150.00 | ||
Voucher | 44.46 | 415.10 | .20 | 24.98 | 281.31 | .00 | ||
Net Loss | 39.02 | 76.65 | 19.65 | 219.91 | 294.73 | 147.50 | ||
%Loss | 71.28 | 43.23 | 99.10 | 94.85 | 21.61 | 100.00 |
Cluster 2 outliers may have been classified into Cluster 2 due to their higher gambling involvement in regards to intensity and amount redeemed. These results suggested the gambling behaviour expressed Cluster 2 outliers could be at a higher risk of problem gambling than normal sessions. While there were clear differences between these two types of sessions in Cluster 2, as illustrated in Figure 9, the results shown in Figure 8b and Figure 9b also suggested the importance of amount redeemed as a measure for EGM gambling involvement.
A total of 17,253 sessions were classified into Cluster 4, this amount represented 37.2% of the total research sample. Sessions in this cluster, as shown on Table 8, were characterized by high gambling intensity, short duration times, and higher redeemed amounts, which suggested the expressed gambling behaviour in these sessions was, potentially, that of a riskier type of gambling. However, the smaller amount of losses produced by these sessions implied that shorter and more intense sessions may produce smaller losses. There were 346 sessions identified as outliers in this cluster.
When comparing outliers to normal cases within this cluster, as shown on Table 13, outliers were more heavily involved in certain aspects of their gambling activity; see Figure 10. For example, the average duration time for outliers was just over seven (7) times greater than that of normal sessions, and just under two (2) times greater than that of Cluster 2 outliers. Despite little differences in regards to intensity, Cluster 4 outliers had a much higher amount of total bets (
Descriptive Statistics: Cluster 4 Sessions (Normal v. Outliers) | ||||||||
Normal | Outlier | |||||||
Mean | SD | Median | Count | Mean | SD | Median | Count | |
Duration | 23.04 | 21.79 | 16907 | 162.67 | 49.02 | 346 | ||
Bets | 609 | 603 | 401 | 4519 | 1656 | 4071 | ||
Bets per Minute | 26.09 | 3.99 | 25.32 | 27.65 | 4.58 | 26.88 | ||
Redeemed | 106.51 | 173.11 | 50.00 | 623.16 | 805.07 | 350.00 | ||
Voucher | 94.80 | 496.65 | .45 | 224.20 | 598.56 | .20 | ||
Net Loss | 69.34 | 151.70 | 19.85 | 504.17 | 746.68 | 259.93 | ||
%Loss | 59.55 | 45.81 | 97.00 | 72.47 | 40.45 | 99.97 |
The overall purpose of this research was to identify gambling personae (e.g., non-problem, low risk, moderate risk, high-risk problem gambler), based on the attributes found within these gambling sessions. Thus allowing the researchers to explore data mining techniques to not only analyze problem gambling through EGMs, but also explore ways to predict the incidence of this condition based on the type of gambling currently taking place. Therefore, a very important part in our research was to identify messages, and attributes related to gambling activity from which we could extract gambling related data, and determine what constitutes a valid EGM gambling session.
After exploring the G2S protocol, and understanding the structure of G2S messages, it was clear that specific events could mark the start (e.g., player enters a bill) and end of a session (e.g., player cashes out) if particular criteria were met, such as explained in Section 3. The gambling sessions used in this research were collected in situ over a period of one-month. Once sessions were identified, details about their duration, intensity, amount redeemed, final amount won (i.e., vouchers), were extracted. Additional variables such as total number of bets, net loss reported within a session, and percent loss of a session, were later added. The sessions were then clustered into four groups, using k-means, based on the sessions' gambling intensity, duration, and amount redeemed.
A between groups one-way ANOVA test suggested significant differences among the variables' mean score, across all clusters. However, in regards to the intensity and duration of gambling activity in clusters, this difference was not necessarily significant. The results of the cluster analysis suggested that cases in Cluster 1 (
Meanwhile, Cluster 3 (
While the use of aggregated data was useful for identifying clusters, a longitudinal analysis, where each gambling event can be analyzed, would allow researchers to analyze the type of gambling strategy used in sessions. Furthermore, this type of study would provide a better understanding of the type of decisions taking place throughout an EGM gambling session based on other measures of gambling involvement, such as wager variability, frequency, and trajectory. Greater data granularity could give more insight into the how the outcome of a bet or bonus round may affect gambling strategies, as the ability to increase wagers, upon entering bonus rounds, would clearly facilitate a change in strategy. Furthermore, a longitudinal study could help identify any differences based on the game being played. Breaks between gambling events (i.e., bets) could also be analyzed based on the distribution of bets per minute during the length of a session. Finally, a longitudinal analysis of EGM gambling measures could help describe cluster profiles, and identify play-personae, more accurately.
[1] |
C. C. Aggarwal, Outlier Analysis, Springer, New York, 2013. MR3024573 |
[2] | American Psychiatric Association, Diagnostic and Statistical Manual of Mental Disorders, 4th edition, American Psychiatric Association, Washington, DC, 1994. |
[3] | G. Banks, R. Fitzgerald and L. Sylvan, Gambling: Productivity Commission Inquiry Report, Technical Report 50,2010, http://www.pc.gov.au/inquiries/completed/gambling-2009/report/gambling-report-volume1.pdf(visited on: 09/12/2012). |
[4] | M. Berry and G. Linoff, Data Mining Techniques for Marketing, Sales, and Customer Relationship Management, 2nd edition, Wiley Publishing Inc., Indianapolis, 2004. |
[5] |
Braverman J., LaBrie R.A., Shaffer H.J. (2011) A taxometric analysis of actual Internet sport gambling behavior. Psychological Assessment 23: 234-244. doi: 10.1037/a0021404
![]() |
[6] |
Braverman J., LaPlante D.A., Nelson S.E., Shaffer H.J. (2013) Using cross-game behavioral markers for early identification of high-risk Internet gamblers. Psychology of Addictive Behaviors 27: 868-877. doi: 10.1037/a0032818
![]() |
[7] |
Braverman J., Shaffer H.J. (2012) How do gamblers start gambling: Identifying behavioral markers for high-risk Internet gambling. European Journal of Public Health 22: 273-278. doi: 10.1093/eurpub/ckp232
![]() |
[8] |
S. Carpendale, Evaluating information visualizations, in Information Visualization, Lecture Notes in Computer Science, A simple univariate outlier identification procedure, 4950 (2008), 19-45. 10.1007/978-3-540-70956-5_2 |
[9] | National Research Council (1999) Pathological Gambling: A Critical Review Washington D.C.: National Academies Press. |
[10] | P. Delfabbro, A. Osborn, M. Nevile, L. Skelt and J. MacMillen, Identifying Problem Gamblers in Gambling Venues, Technical report, 2007. |
[11] |
Dixon M.J., Harrigan K.A., Jarrick M., MacLaren V., Fugelsang J.A., Sheepy E. (2011) Psychophysiological arousal signatures of near-misses in slot machine play. International Gambling Studies 11: 393-407. doi: 10.1080/14459795.2011.603134
![]() |
[12] |
Dixon L., Trigg R., Griffiths M. (2007) An empirical investigation of music and gambling behaviour. International Gambling Studies 7: 315-326. doi: 10.1080/14459790701601471
![]() |
[13] |
Dragicevic S., Tsogas G., Kudic A. (2011) Analysis of casino online gambling data in relation to behavioural risk markers for high-risk gambling and player protection. International Gambling Studies 11: 377-391. doi: 10.1080/14459795.2011.629204
![]() |
[14] |
Ellery M., Stewart S.H., Loba P. (2005) Alcohol's effects on video lottery terminal (vlt) play among probable pathological and non-pathological gamblers. Journal of Gambling Studies 21: 299-324. doi: 10.1007/s10899-005-3101-0
![]() |
[15] | J. Ferris and H. Wynne, The Canadian Problem Gambling Index: Final Report, Technical Report, 2001, http://www.ccgr.ca/en/projects/resources/CPGI-Final-Report-English.pdf(visited on: 06/28/2013). |
[16] | G. Data, Canadian Gaming Market Report, Technical report, 2011, http://www.gamblingdata.com/files/Gambling%20Data%20Canadian%20Gaming%20Market%20Report%20Final_0.pdf (visited on: 04/10/2013). |
[17] | GSA, G2S Message Protocol v1. 1 Game-to-system, Technical Report GSA-P0075. 024. 00-2011, GSA, 2011. |
[18] | GSA, G2S Message Protocol v2. 0 Game-to-system, Technical Report GSA-P0075. 0800. 00-2006, GSA, 2006. |
[19] | J. Han and M. Kamber, Data Mining: Concepts and Techniques, 3rd edition, Morgan Kaufmann, Waltham, 2012. |
[20] | Harrigan K.A., Dixon M. (2009) Par sheets, probabilities, and slot machine play: Implications of problem and non-problem gambling. Journal of Gambling Issues 23: 81-110. |
[21] | Harrigan K.A. (2007) Slot machine structural characteristics: Distorted player views of payback percentages. Journal of Gambling Issues 20: 215-234. |
[22] |
Harrigan K.A. (2009) Slot machines: Pursuing responsible gaming practices for virtual reels and near misses. International Journal of Mental Health Addiction 7: 68-83. doi: 10.1007/s11469-007-9139-8
![]() |
[23] |
Hennig C. (2007) Cluster-wise assessment of cluster stability. Computational Statistics & Data Analysis 52: 258-271. doi: 10.1016/j.csda.2006.11.025
![]() |
[24] |
Hoaglin D.C. (2003) John W. Tukey and data analysis. Statistical Science 18: 311-318. doi: 10.1214/ss/1076102418
![]() |
[25] | B. Iglewicz and S. Banerjee, A Simple Univariate Outlier Identification Procedure, Proceedings of Annual Meeting of the American Statistical Association, 2001. |
[26] |
LaBrie R.A., LaPlante D.A., Nelson S.E., Schumann A., Shaffer H.J. (2007) Assessing the playing field: A prospective longitudinal study of Internet sports gambling behavior. Journal of Gambling Studies 23: 347-362. doi: 10.1007/s10899-007-9067-3
![]() |
[27] |
LaBrie R.A., Kaplan S.A., LaPlante D.A., Nelson S.E., Shaffer H.J. (2008) Inside the virtual casino: A prospective longitudinal study of actual Internet casino gambling. European Journal of Public Health 18: 410-416. doi: 10.1093/eurpub/ckn021
![]() |
[28] |
LaPlante D. A., Nelson S. E., LaBrie R. A., Shaffer H. J. (2008) Stability and progression of disordered gambling: Lessons from longitudinal studies. Canadian Journal of Psychiatry 53: 52-60. doi: 10.1177/070674370805300108
![]() |
[29] |
LaPlante D.A., Nelson S.E., LaBrie R.A., Shaffer H.J. (2011) Disordered gambling, type of gambling and gambling involvement in the British gambling prevalence survey 2007. European Journal of Public Health 21: 532-537. doi: 10.1093/eurpub/ckp177
![]() |
[30] |
Liu H., Keselj V. (2007) Combined mining of web server logs and web contents for classifying user navigation patterns and predicting users' future requests. Data & Knowledge Engineering 61: 304-330. doi: 10.1016/j.datak.2006.06.001
![]() |
[31] | Loba P., Stewart S. H., Klein R. M., Blackburn J. R. (2001) Manipulations of the features of standard video lottery terminal (VLT) games: Effects in pathological and non-pathological gamblers. Journal of Gambling Studies 17: 94-98. |
[32] |
MacLaren V.V., Fugelsang J.A., Harrigan K., Dixon M. (2011) The personality of pathological gamblers: A meta-analysis. Clinical Psychology Review 31: 1057-1067. doi: 10.1016/j.cpr.2011.02.002
![]() |
[33] | K. Marshall, Gambling 2011, Technical Report 4,2011, http://www.statcan.gc.ca/pub/75-001-x/2011004/article/11551-eng.pdf(visited on: 04/10/2013). |
[34] |
Mishra S., Lumiére M.L., Williams R.J. (2010) Gambling as a form of risk-taking: Individual differences in personality, risk-accepting attitudes, and behavioral preferences for risk. Personality and Individual Differences 49: 616-621. doi: 10.1016/j.paid.2010.05.032
![]() |
[35] | National Research Council (1999) Pathological Gambling: A Critical Review Washington D.C.: The National Academies Press. |
[36] |
Nelson S.R., LaPlante D.A., Peller A.J., Schumann A., LaBrie R.A., Shaffer H.J. (2008) Real limits in the virtual world: Self-limiting behavior of Internet gamblers. Journal of Gambling Studies 24: 463-477. doi: 10.1007/s10899-008-9106-8
![]() |
[37] | J. Pallant, SPSS Survival Manual: A Step By Step Guide to Data Analysis Using SPSS, 4th edition, Allen & Unwin, Sydney, 2011. |
[38] |
Y. Peng, K. Gang and Y. Shi (eds. ), Knowledge-rich data mining in financial risk detection, in Computational Science - ICCS 2009 (eds. G. Allen, J. Nabrzyski, E. Seidel, G. D. van Albada, J. Dongarra and P. M. A. Sloot), Springer Berlin Heidelberg, 5545 (2009), 534-542. 10.1007/978-3-642-01973-9_60 |
[39] |
Pham D. T., Dimov S. S., Nguyen C. D. (2005) Selection of k in k-means clustering. Journal of Mechanical Engineering Science 219: 103-119. doi: 10.1243/095440605X8298
![]() |
[40] | A. Rakhlin and A. Caponnetto (eds. ), Stability of k-means clustering, in Advances in Neural Information Processing Systems 19 (eds. B. Schölkopf, J. Platt and T. Hoffman), MIT Press, (2006), 1121-1128. http://papers.nips.cc/paper/3116-stability-of-k-means-clustering (visited on: 12/10/2014) |
[41] | Responsible Gambling Council, Electronic Gaming Machines and Problem Gambling, Saskachewan Liquour and Gaming Authority, 2006, http://www.responsiblegambling.org/docs/research-reports/electronic-gaming-machines-and-problem-gambling.pdf?sfvrsn=10 (visited on: 06/28/2013). |
[42] | Responsible Gambling Council, Canadian Gambling Digest 2011-2012, Technical report, 2013, http://www.responsiblegambling.org/docs/default-document-library/20130605_canadian_gambling_digest_2011-12.pdf?sfvrsn=2 (visited on: 05/04/2015). |
[43] | G. Schwartz, The Impulse Economy, Atria Books, New York, 2011. |
[44] | S. Seo, A Review and Comparison of Methods for Detecting Outliers in Univariate Data Sets, M. S thesis, University of Pittsburg in Pensylvania, 2006. |
[45] |
Shaffer H.J., Korn D.A. (2002) Gambling and related mental disorders: A public health analysis. Annual Review of Public Health 23: 171-212. doi: 10.1146/annurev.publhealth.23.100901.140532
![]() |
[46] |
Shaffer H.J., Peller A.J., LaPlante D.A., Nelson S.E., LaBrie R.A. (2010) Toward a paradigm shift in Internet gambling research: From opinion and self-report to actual behavior. Addiction Research and Theory 18: 270-283. doi: 10.3109/16066350902777974
![]() |
[47] | Sim J., Wright C.C. (2005) Understanding interobserver agreement: The Kappa statistic. Family Medicine 37: 360-363. |
[48] |
Stewart S. H., Collins P., Blackburn J. R., Ellery M., Klein R. M. (2005) Heart rate increase to alcohol administration and video lottery terminal (VLT) play among regular VLT players. Psychology of Addictive Behaviors 19: 94-98. doi: 10.1037/0893-164X.19.1.94
![]() |
[49] |
S. Tufféry, Data Mining and Statistics for Decision Making, John Wiley & Sons, Ltd., Chichester, 2011.
10.1002/9780470979174 |
[50] | Viera A.J., Garrett J.M. (2005) The Kappa statistic in reliability studies: Use, interpretation, and sample size requirements. Journal of the American Physical Therapy Association 85: 257-268. |
[51] | C. Wheelan, Naked Statistics: Stripping the Dread from the Data, W. W. Norton and Company, New York, 2013. |
[52] | R. J. Williams, R. A. Volberg and R. M. G. Stevens, The Population Prevalence of Problem Gambling: Methodological Influences, Standardized Rates, Jurisdictional Differences, and Worldwide Trends, Technical report, 2012, https://www.uleth.ca/dspace/bitstream/handle/10133/3068/2012-PREVALENCE-OPGRC%20(2).pdf?sequence=3 (visited on: 08/12/2013). |
[53] |
Wilson D. S., Kauffman R. A., Purdy M. S. (2002) A program for at-risk high school students
informed by evolutionary science. PLoS ONE 31: 76-77. doi: 10.1371/journal.pone.0027826
![]() |
[54] |
Witten I.H., Frank E. (2002) Data mining: Practical machine learning tools and techniques. Newsletter: ACM SIGMOD Record Homepage archive 31: 76-77. doi: 10.1145/507338.507355
![]() |
[55] |
Xuan Z., Shaffer H. (2009) How do gamblers end gambling: Longitudinal analysis of Internet gambling behaviors prior to account closure due to gambling related problems. Journal of Gambling Studies 25: 239-252. doi: 10.1007/s10899-009-9118-z
![]() |
1. | Kasra Ghaharian, Brett Abarbanel, Dylan Phung, Piyush Puranik, Shane Kraus, Alan Feldman, Bo Bernhard, Applications of data science for responsible gambling: a scoping review, 2022, 1445-9795, 1, 10.1080/14459795.2022.2135753 |
Variables | Mean | SD | Median | Mode | Max. | Min. |
Durationa | 42.03 | 109.09 | 15.67 | 3.87 | 1099.72 | 0.17 |
Bets | 373 | 606 | 170 | 3 | 13,282 | 2 |
Intensity | 16 | 10 | 19 | 20 | 68 | 0 |
Redeemedb | 70.49 | 150.75 | 30.00 | 20.00 | 6,425.00 | 5.00 |
Vouchersb | 50.44 | 384.18 | 0.05 | 0 | 40,833.05 | 0 |
Net Lossb | 52.09 | 133.39 | 19.90 | 0 | 6,424.69 | 0 |
%Loss | 75.46 | 40.97 | 99.97 | 100.00 | 100.00 | 0 |
a. Measured in minutes. b. Measured in Euros. |
Spearman™s Rho | ||||||||
Duration | Bets | Intensity | Redeemed | Vouchers | Net Loss | %Loss | ||
Duration | . | .515** | -.208** | .341** | -.334** | .374** | .406** | |
Sig. | . | .000 | .000 | .000 | .000 | .000 | .000 | |
Bets | .515** | . | .606** | .690** | .224** | .356** | -.150** | |
Sig. | .000 | . | .000 | .000 | .000 | .000 | .000 | |
Intensity | -.208** | .606** | . | .433** | .640** | -.003 | -.608** | |
Sig. | .000 | .000 | .000 | .000 | .511 | .000 | 0.000 | |
Redeemed | .341** | .690** | .433** | . | .177** | .584** | -.068** | |
Sig. | .000 | .000 | .000 | . | .000 | .000 | .000 | |
Vouchers | -.334** | .224** | .640** | .177** | . | -.512** | -.980** | |
Sig. | .000 | .000 | .000 | .000 | . | .000 | .000 | |
Net Loss | .374** | .356** | -.003 | .584** | -.512** | . | .592** | |
Sig. | .000 | .000 | .511 | .000 | .000 | . | .000 | |
%Loss | .406** | -.150** | -.608** | -.068** | -.980** | .592** | . | |
Sig. | .000 | .000 | .000 | .000 | .000 | .000 | . | |
** Correlation is significant at the 0.01 level (2-tailed). |
Cluster Membership (Sample 1) v. Cluster Membership (Full Sample) Crosstabulation) | |||||||
Cluster Mship. (Full Sample) | |||||||
1 | 2 | 3 | 4 | Total | |||
Cluster Mship. | 1 | Count | 6944 | 15 | 0 | 0 | 6,959 |
(Sample 1) | Expected Count | 2,071.5 | 2,171.1 | 123.2 | 2,593.2 | 6,959 | |
% within Cluster (sample 1) | 99.8% | 0.2% | 0% | 0% | 100% | ||
% within Cluster (full sample) | 100% | 0.2% | 0% | 0% | 29.8% | ||
% of Total | 29.8% | 0.1% | 0% | 0% | 29.8% | ||
2 | Count | 0 | 7,263 | 0 | 44 | 7,307 | |
Expected Count | 2,175.1 | 2,279.7 | 129.4 | 2,722.9 | 7,307 | ||
% within Cluster (sample 1) | 0% | 99.4% | 0% | 0.6% | 100% | ||
% within Cluster (full sample) | 0% | 99.8% | 0% | 0.5% | 31.3% | ||
% of Total | 0% | 31.1% | 0% | 0.2% | 31.3% | ||
3 | Count | 0 | 0 | 413 | 0 | 413 | |
Expected Count | 122.9 | 128.9 | 7.3 | 153.9 | 413 | ||
% within Cluster (sample 1) | 0% | 0% | 100% | 0% | 100% | ||
% within Cluster (full sample) | 0% | 0% | 100% | 0% | 1.8% | ||
% of Total | 0% | 0% | 1.8% | 0% | 1.8% | ||
4 | Count | 0 | 0 | 0 | 8,649 | 8,649 | |
Expected Count | 2,574.5 | 2,698.4 | 153.1 | 3,223 | 8,649 | ||
% within Cluster (sample 1) | 0% | 0% | 0% | 100% | 100% | ||
% within Cluster (full sample) | 0% | 0% | 0% | 99.5% | 37.1% | ||
% of Total | 0% | 0% | 0% | 37.1% | 37.1% | ||
Total | Count | 6,944 | 7,278 | 413 | 8,693 | 23,328 | |
Expected Count | 6,944 | 7,278 | 413 | 8,693 | 23,328 | ||
% within Cluster (sample 1) | 29.80% | 31.20% | 1.8% | 37.3% | 100% | ||
% within Cluster (full sample) | 100% | 100% | 100% | 100% | 100% | ||
% of Total | 29.80% | 31.20% | 1.8% | 37.3% | 100% |
Cluster Membership (Sample 2) v. Cluster Membership (Full Sample) Crosstabulation) | |||||||
Cluster Mship. (Full Sample) | |||||||
1 | 2 | 3 | 4 | Total | |||
Cluster Mship. | 1 | Count | 6,871 | 0 | 0 | 0 | 6,871 |
(Sample 2) | Expected Count | 2,053.1 | 2,148.1 | 122.3 | 2,547.5 | 6,871 | |
% within Cluster (sample 2) | 100% | 0% | 0% | 0% | 100% | ||
% within Cluster (full sample) | 99.6% | 0% | 0% | 0% | 29.8% | ||
% of Total | 29.8% | 0% | 0% | 0% | 29.8% | ||
2 | Count | 28 | 7,162 | 0 | 0 | 7,190 | |
Expected Count | 2,148.5 | 2247.8 | 128 | 2,665.7 | 7,190 | ||
% within Cluster (sample 2) | 0.4% | 9, 9.6% | 0% | 0% | 100% | ||
% within Cluster (full sample) | 0.4% | 99.2% | 0% | 0% | 31.10% | ||
% of Total | 0.1% | 31.0% | 0% | 0% | 31.10% | ||
3 | Count | 0 | 0 | 411 | 0 | 411 | |
Expected Count | 122.8 | 128.5 | 7.3 | 152.4 | 411 | ||
% within Cluster (sample 2) | 0% | 0% | 100% | 0% | 100% | ||
% within Cluster (full sample) | 0% | 0% | 100% | 0% | 1.8% | ||
% of Total | 0% | 0% | 1.8% | 0% | 1.8% | ||
4 | Count | 0 | 56 | 0 | 8,560 | 8,616 | |
Expected Count | 2,574.6 | 2,693.6 | 153.4 | 3,194.4 | 8,616 | ||
% within Cluster (sample 2) | 0% | 0.6% | 0% | 99.4% | 100% | ||
% within Cluster (full sample) | 0% | 0.8% | 0% | 100% | 37.30% | ||
% of Total | 0% | 0.2% | 0% | 37.1% | 37.3% | ||
Total | Count | 6,899 | 7,218 | 411 | 8,560 | 23,088 | |
Expected Count | 6,899 | 7,218 | 411 | 8,560 | 23,088 | ||
% within Cluster (sample 2) | 29.9% | 31.3% | 1.8% | 37.1% | 100% | ||
% within Cluster (full sample) | 100% | 100% | 100% | 100% | 100% | ||
% of Total | 29.9% | 31.3% | 1.8% | 37.1% | 100% |
Cluster 1 Sessions a | ||||||
Variables | Mean | SD | Median | Max. | 25th | 75th |
Duration |
47.14 | 56.17 | 27.65 | 412.77 | 12.33 | 59.20 |
Bets | 122 | 168 | 68 | 3,648 | 34 | 142 |
Intensity | 4.09 | 3.10 | 3.35 | 11.27 | 1.32 | 6.58 |
Redeemed |
30.02 | 45.04 | 20.00 | 1,215.00 | 10.00 | 40.00 |
Vouchers |
1.60 | 23.41 | .00 | 1,962.93 | .00 | .00 |
Net Loss |
29.48 | 45.13 | 20.00 | 1,215.00 | 10.00 | 40.00 |
%Loss | 97.36 | 15.70 | 100.00 | 100.00 | 100.00 | 100.00 |
a. n = 13,843 b. Measured in minutes. c. Measured in Euros. |
Cluster 2 Sessions a | ||||||
Variables | Mean | SD | Median | Max. | 25th | 75th |
Duration |
14.34 | 17.26 | 9.07 | 326.85 | 4.57 | 17.32 |
Bets | 247 | 299 | 153 | 5,344 | 75 | 302 |
Intensity | 17.26 | 3.17 | 17.81 | 21.91 | 14.77 | 20.03 |
Redeemed |
54.98 | 94.35 | 30.00 | 2,450.00 | 10.00 | 50.00 |
Vouchers |
44.06 | 412.79 | .20 | 40,833.05 | .00 | 20.00 |
Net Loss |
42.74 | 90.52 | 19.75 | 2,450.00 | 4.60 | 50.00 |
%Loss | 71.77 | 43.03 | 99.25 | 100.00 | 19.05 | 100.00 |
a. n = 14,496 b. Measured in minutes. c. Measured in Euros. |
Cluster 3 Sessions a | ||||||
Variables | Mean | SD | Median | Max. | 25th | 75th |
Duration |
782.53 | 160.19 | 780.66 | 1,099.72 | 686.27 | 899.45 |
Bets | 246 | 524 | 86 | 6,715 | 39 | 227 |
Intensity | .32 | .66 | .12 | 7.25 | .05 | .31 |
Redeemed |
52.34 | 127.88 | 20.00 | 1,530.00 | 10.00 | 50.00 |
Vouchers |
.00 | .00 | .00 | .00 | .00 | .00 |
Net Loss |
52.34 | 127.88 | 20.00 | 1,530.00 | 10.00 | 50.00 |
%Loss | 100.00 | .00 | 100.00 | 100.00 | 100.00 | 100.00 |
a. n = 824 b. Measured in minutes. c. Measured in Euros. |
Cluster 4 Sessions a | ||||||
Variables | Mean | SD | Median | Max. | 25th | 75th |
Duration |
25.84 | 29.94 | 16.07 | 394.65 | 7.57 | 32.75 |
Bets | 688 | 844 | 413 | 13,282 | 193 | 855 |
Intensity | 26.12 | 4.00 | 25.35 | 68.32 | 23.59 | 27.43 |
Redeemed |
116.87 | 218.11 | 50.00 | 6,425.00 | 20.00 | 120.00 |
Vouchers |
97.40 | 499.21 | .45 | 33,602.76 | .20 | 100.00 |
Net Loss |
78.06 | 193.43 | 19.90 | 6,424.69 | .00 | 70.00 |
%Loss | 58.59 | 45.74 | 97.32 | 100.00 | .00 | 99.73 |
a. n = 17,253 b. Measured in minutes. c. Measured in Euros. |
ANOVA | ||||||
Sum of Squares | df | Mean Square | F | Sig. | ||
Duration | Between Groups | 386.957 | 3 | 128.986 | 85580.335 | .000 |
Within Groups | 69.952 | 46412 | .002 | |||
Total | 456.909 | 46415 | ||||
Intensity | Between Groups | 847.409 | 3 | 282.470 | 110089.371 | .000 |
Within Groups | 119.085 | 46412 | .003 | |||
Total | 966.493 | 46415 | ||||
Redeemed | Between Groups | 1.542* | 3 | .514 | 991.576 | .000 |
Within Groups | 24.052 | 46412 | .001 | |||
Total | 25.593 | 46415 |
Ranks | |||
Cluster Membership | N | Mean Rank | |
Duration | 1 | 13,843 | 28907.31 |
2 | 14,496 | 16544.68 | |
3 | 824 | 46004.50 | |
4 | 17,253 | 23146.26 | |
Total | 46,416 | ||
Intensity | 1 | 13,843 | 7706.21 |
2 | 14,496 | 21919.71 | |
3 | 824 | 1090.35 | |
4 | 17,253 | 37786.01 | |
Total | 46,416 | ||
Redeemed | 1 | 13,843 | 16455.75 |
2 | 14,496 | 22349.74 | |
3 | 824 | 17893.58 | |
4 | 17,253 | 29601.97 | |
Total | 46,416 |
Descriptive Statistics: Cluster 1 Sessions (Normal v. Outliers) | ||||||||
Normal | Outlier | |||||||
Mean | SD | Median | Count | Mean | SD | Median | Count | |
Duration | 23.89 | 16.98 | 10815 | 130.20 | 67.56 | 3028 | ||
Bets | 93 | 93 | 62 | 224 | 292 | 116 | ||
Intensity | 4.67 | 3.02 | 4.22 | 1.99 | 2.37 | .98 | ||
Redeemed | 25.74 | 33.03 | 20.00 | 45.31 | 71.28 | 20.00 | ||
Voucher | 2.02 | 26.37 | .00 | .08 | 4.37 | .00 | ||
Net Loss | 25.05 | 33.09 | 15.00 | 45.31 | 71.28 | 20.00 | ||
%Loss | 96.63 | 17.67 | 100.00 | 99.97 | 1.82 | 100.00 |
Descriptive Statistics: Cluster 2 Sessions (Normal v. Outliers) | ||||||||
Normal | Outlier | |||||||
Mean | SD | Median | Count | Mean | SD | Median | Count | |
Duration | 12.64 | 11.73 | 14198 | 95.54 | 34.68 | 298 | ||
Bets | 218 | 209 | 149 | 1599 | 622 | 1461 | ||
Intensity | 17.27 | 3.18 | 17.84 | 16.80 | 2.95 | 16.99 | ||
Redeemed | 51.40 | 81.72 | 25.00 | 225.45 | 292.33 | 150.00 | ||
Voucher | 44.46 | 415.10 | .20 | 24.98 | 281.31 | .00 | ||
Net Loss | 39.02 | 76.65 | 19.65 | 219.91 | 294.73 | 147.50 | ||
%Loss | 71.28 | 43.23 | 99.10 | 94.85 | 21.61 | 100.00 |
Descriptive Statistics: Cluster 4 Sessions (Normal v. Outliers) | ||||||||
Normal | Outlier | |||||||
Mean | SD | Median | Count | Mean | SD | Median | Count | |
Duration | 23.04 | 21.79 | 16907 | 162.67 | 49.02 | 346 | ||
Bets | 609 | 603 | 401 | 4519 | 1656 | 4071 | ||
Bets per Minute | 26.09 | 3.99 | 25.32 | 27.65 | 4.58 | 26.88 | ||
Redeemed | 106.51 | 173.11 | 50.00 | 623.16 | 805.07 | 350.00 | ||
Voucher | 94.80 | 496.65 | .45 | 224.20 | 598.56 | .20 | ||
Net Loss | 69.34 | 151.70 | 19.85 | 504.17 | 746.68 | 259.93 | ||
%Loss | 59.55 | 45.81 | 97.00 | 72.47 | 40.45 | 99.97 |
Variables | Mean | SD | Median | Mode | Max. | Min. |
Durationa | 42.03 | 109.09 | 15.67 | 3.87 | 1099.72 | 0.17 |
Bets | 373 | 606 | 170 | 3 | 13,282 | 2 |
Intensity | 16 | 10 | 19 | 20 | 68 | 0 |
Redeemedb | 70.49 | 150.75 | 30.00 | 20.00 | 6,425.00 | 5.00 |
Vouchersb | 50.44 | 384.18 | 0.05 | 0 | 40,833.05 | 0 |
Net Lossb | 52.09 | 133.39 | 19.90 | 0 | 6,424.69 | 0 |
%Loss | 75.46 | 40.97 | 99.97 | 100.00 | 100.00 | 0 |
a. Measured in minutes. b. Measured in Euros. |
Spearman™s Rho | ||||||||
Duration | Bets | Intensity | Redeemed | Vouchers | Net Loss | %Loss | ||
Duration | . | .515** | -.208** | .341** | -.334** | .374** | .406** | |
Sig. | . | .000 | .000 | .000 | .000 | .000 | .000 | |
Bets | .515** | . | .606** | .690** | .224** | .356** | -.150** | |
Sig. | .000 | . | .000 | .000 | .000 | .000 | .000 | |
Intensity | -.208** | .606** | . | .433** | .640** | -.003 | -.608** | |
Sig. | .000 | .000 | .000 | .000 | .511 | .000 | 0.000 | |
Redeemed | .341** | .690** | .433** | . | .177** | .584** | -.068** | |
Sig. | .000 | .000 | .000 | . | .000 | .000 | .000 | |
Vouchers | -.334** | .224** | .640** | .177** | . | -.512** | -.980** | |
Sig. | .000 | .000 | .000 | .000 | . | .000 | .000 | |
Net Loss | .374** | .356** | -.003 | .584** | -.512** | . | .592** | |
Sig. | .000 | .000 | .511 | .000 | .000 | . | .000 | |
%Loss | .406** | -.150** | -.608** | -.068** | -.980** | .592** | . | |
Sig. | .000 | .000 | .000 | .000 | .000 | .000 | . | |
** Correlation is significant at the 0.01 level (2-tailed). |
Cluster Membership (Sample 1) v. Cluster Membership (Full Sample) Crosstabulation) | |||||||
Cluster Mship. (Full Sample) | |||||||
1 | 2 | 3 | 4 | Total | |||
Cluster Mship. | 1 | Count | 6944 | 15 | 0 | 0 | 6,959 |
(Sample 1) | Expected Count | 2,071.5 | 2,171.1 | 123.2 | 2,593.2 | 6,959 | |
% within Cluster (sample 1) | 99.8% | 0.2% | 0% | 0% | 100% | ||
% within Cluster (full sample) | 100% | 0.2% | 0% | 0% | 29.8% | ||
% of Total | 29.8% | 0.1% | 0% | 0% | 29.8% | ||
2 | Count | 0 | 7,263 | 0 | 44 | 7,307 | |
Expected Count | 2,175.1 | 2,279.7 | 129.4 | 2,722.9 | 7,307 | ||
% within Cluster (sample 1) | 0% | 99.4% | 0% | 0.6% | 100% | ||
% within Cluster (full sample) | 0% | 99.8% | 0% | 0.5% | 31.3% | ||
% of Total | 0% | 31.1% | 0% | 0.2% | 31.3% | ||
3 | Count | 0 | 0 | 413 | 0 | 413 | |
Expected Count | 122.9 | 128.9 | 7.3 | 153.9 | 413 | ||
% within Cluster (sample 1) | 0% | 0% | 100% | 0% | 100% | ||
% within Cluster (full sample) | 0% | 0% | 100% | 0% | 1.8% | ||
% of Total | 0% | 0% | 1.8% | 0% | 1.8% | ||
4 | Count | 0 | 0 | 0 | 8,649 | 8,649 | |
Expected Count | 2,574.5 | 2,698.4 | 153.1 | 3,223 | 8,649 | ||
% within Cluster (sample 1) | 0% | 0% | 0% | 100% | 100% | ||
% within Cluster (full sample) | 0% | 0% | 0% | 99.5% | 37.1% | ||
% of Total | 0% | 0% | 0% | 37.1% | 37.1% | ||
Total | Count | 6,944 | 7,278 | 413 | 8,693 | 23,328 | |
Expected Count | 6,944 | 7,278 | 413 | 8,693 | 23,328 | ||
% within Cluster (sample 1) | 29.80% | 31.20% | 1.8% | 37.3% | 100% | ||
% within Cluster (full sample) | 100% | 100% | 100% | 100% | 100% | ||
% of Total | 29.80% | 31.20% | 1.8% | 37.3% | 100% |
Cluster Membership (Sample 2) v. Cluster Membership (Full Sample) Crosstabulation) | |||||||
Cluster Mship. (Full Sample) | |||||||
1 | 2 | 3 | 4 | Total | |||
Cluster Mship. | 1 | Count | 6,871 | 0 | 0 | 0 | 6,871 |
(Sample 2) | Expected Count | 2,053.1 | 2,148.1 | 122.3 | 2,547.5 | 6,871 | |
% within Cluster (sample 2) | 100% | 0% | 0% | 0% | 100% | ||
% within Cluster (full sample) | 99.6% | 0% | 0% | 0% | 29.8% | ||
% of Total | 29.8% | 0% | 0% | 0% | 29.8% | ||
2 | Count | 28 | 7,162 | 0 | 0 | 7,190 | |
Expected Count | 2,148.5 | 2247.8 | 128 | 2,665.7 | 7,190 | ||
% within Cluster (sample 2) | 0.4% | 9, 9.6% | 0% | 0% | 100% | ||
% within Cluster (full sample) | 0.4% | 99.2% | 0% | 0% | 31.10% | ||
% of Total | 0.1% | 31.0% | 0% | 0% | 31.10% | ||
3 | Count | 0 | 0 | 411 | 0 | 411 | |
Expected Count | 122.8 | 128.5 | 7.3 | 152.4 | 411 | ||
% within Cluster (sample 2) | 0% | 0% | 100% | 0% | 100% | ||
% within Cluster (full sample) | 0% | 0% | 100% | 0% | 1.8% | ||
% of Total | 0% | 0% | 1.8% | 0% | 1.8% | ||
4 | Count | 0 | 56 | 0 | 8,560 | 8,616 | |
Expected Count | 2,574.6 | 2,693.6 | 153.4 | 3,194.4 | 8,616 | ||
% within Cluster (sample 2) | 0% | 0.6% | 0% | 99.4% | 100% | ||
% within Cluster (full sample) | 0% | 0.8% | 0% | 100% | 37.30% | ||
% of Total | 0% | 0.2% | 0% | 37.1% | 37.3% | ||
Total | Count | 6,899 | 7,218 | 411 | 8,560 | 23,088 | |
Expected Count | 6,899 | 7,218 | 411 | 8,560 | 23,088 | ||
% within Cluster (sample 2) | 29.9% | 31.3% | 1.8% | 37.1% | 100% | ||
% within Cluster (full sample) | 100% | 100% | 100% | 100% | 100% | ||
% of Total | 29.9% | 31.3% | 1.8% | 37.1% | 100% |
Cluster 1 Sessions a | ||||||
Variables | Mean | SD | Median | Max. | 25th | 75th |
Duration |
47.14 | 56.17 | 27.65 | 412.77 | 12.33 | 59.20 |
Bets | 122 | 168 | 68 | 3,648 | 34 | 142 |
Intensity | 4.09 | 3.10 | 3.35 | 11.27 | 1.32 | 6.58 |
Redeemed |
30.02 | 45.04 | 20.00 | 1,215.00 | 10.00 | 40.00 |
Vouchers |
1.60 | 23.41 | .00 | 1,962.93 | .00 | .00 |
Net Loss |
29.48 | 45.13 | 20.00 | 1,215.00 | 10.00 | 40.00 |
%Loss | 97.36 | 15.70 | 100.00 | 100.00 | 100.00 | 100.00 |
a. n = 13,843 b. Measured in minutes. c. Measured in Euros. |
Cluster 2 Sessions a | ||||||
Variables | Mean | SD | Median | Max. | 25th | 75th |
Duration |
14.34 | 17.26 | 9.07 | 326.85 | 4.57 | 17.32 |
Bets | 247 | 299 | 153 | 5,344 | 75 | 302 |
Intensity | 17.26 | 3.17 | 17.81 | 21.91 | 14.77 | 20.03 |
Redeemed |
54.98 | 94.35 | 30.00 | 2,450.00 | 10.00 | 50.00 |
Vouchers |
44.06 | 412.79 | .20 | 40,833.05 | .00 | 20.00 |
Net Loss |
42.74 | 90.52 | 19.75 | 2,450.00 | 4.60 | 50.00 |
%Loss | 71.77 | 43.03 | 99.25 | 100.00 | 19.05 | 100.00 |
a. n = 14,496 b. Measured in minutes. c. Measured in Euros. |
Cluster 3 Sessions a | ||||||
Variables | Mean | SD | Median | Max. | 25th | 75th |
Duration |
782.53 | 160.19 | 780.66 | 1,099.72 | 686.27 | 899.45 |
Bets | 246 | 524 | 86 | 6,715 | 39 | 227 |
Intensity | .32 | .66 | .12 | 7.25 | .05 | .31 |
Redeemed |
52.34 | 127.88 | 20.00 | 1,530.00 | 10.00 | 50.00 |
Vouchers |
.00 | .00 | .00 | .00 | .00 | .00 |
Net Loss |
52.34 | 127.88 | 20.00 | 1,530.00 | 10.00 | 50.00 |
%Loss | 100.00 | .00 | 100.00 | 100.00 | 100.00 | 100.00 |
a. n = 824 b. Measured in minutes. c. Measured in Euros. |
Cluster 4 Sessions a | ||||||
Variables | Mean | SD | Median | Max. | 25th | 75th |
Duration |
25.84 | 29.94 | 16.07 | 394.65 | 7.57 | 32.75 |
Bets | 688 | 844 | 413 | 13,282 | 193 | 855 |
Intensity | 26.12 | 4.00 | 25.35 | 68.32 | 23.59 | 27.43 |
Redeemed |
116.87 | 218.11 | 50.00 | 6,425.00 | 20.00 | 120.00 |
Vouchers |
97.40 | 499.21 | .45 | 33,602.76 | .20 | 100.00 |
Net Loss |
78.06 | 193.43 | 19.90 | 6,424.69 | .00 | 70.00 |
%Loss | 58.59 | 45.74 | 97.32 | 100.00 | .00 | 99.73 |
a. n = 17,253 b. Measured in minutes. c. Measured in Euros. |
ANOVA | ||||||
Sum of Squares | df | Mean Square | F | Sig. | ||
Duration | Between Groups | 386.957 | 3 | 128.986 | 85580.335 | .000 |
Within Groups | 69.952 | 46412 | .002 | |||
Total | 456.909 | 46415 | ||||
Intensity | Between Groups | 847.409 | 3 | 282.470 | 110089.371 | .000 |
Within Groups | 119.085 | 46412 | .003 | |||
Total | 966.493 | 46415 | ||||
Redeemed | Between Groups | 1.542* | 3 | .514 | 991.576 | .000 |
Within Groups | 24.052 | 46412 | .001 | |||
Total | 25.593 | 46415 |
Ranks | |||
Cluster Membership | N | Mean Rank | |
Duration | 1 | 13,843 | 28907.31 |
2 | 14,496 | 16544.68 | |
3 | 824 | 46004.50 | |
4 | 17,253 | 23146.26 | |
Total | 46,416 | ||
Intensity | 1 | 13,843 | 7706.21 |
2 | 14,496 | 21919.71 | |
3 | 824 | 1090.35 | |
4 | 17,253 | 37786.01 | |
Total | 46,416 | ||
Redeemed | 1 | 13,843 | 16455.75 |
2 | 14,496 | 22349.74 | |
3 | 824 | 17893.58 | |
4 | 17,253 | 29601.97 | |
Total | 46,416 |
Descriptive Statistics: Cluster 1 Sessions (Normal v. Outliers) | ||||||||
Normal | Outlier | |||||||
Mean | SD | Median | Count | Mean | SD | Median | Count | |
Duration | 23.89 | 16.98 | 10815 | 130.20 | 67.56 | 3028 | ||
Bets | 93 | 93 | 62 | 224 | 292 | 116 | ||
Intensity | 4.67 | 3.02 | 4.22 | 1.99 | 2.37 | .98 | ||
Redeemed | 25.74 | 33.03 | 20.00 | 45.31 | 71.28 | 20.00 | ||
Voucher | 2.02 | 26.37 | .00 | .08 | 4.37 | .00 | ||
Net Loss | 25.05 | 33.09 | 15.00 | 45.31 | 71.28 | 20.00 | ||
%Loss | 96.63 | 17.67 | 100.00 | 99.97 | 1.82 | 100.00 |
Descriptive Statistics: Cluster 2 Sessions (Normal v. Outliers) | ||||||||
Normal | Outlier | |||||||
Mean | SD | Median | Count | Mean | SD | Median | Count | |
Duration | 12.64 | 11.73 | 14198 | 95.54 | 34.68 | 298 | ||
Bets | 218 | 209 | 149 | 1599 | 622 | 1461 | ||
Intensity | 17.27 | 3.18 | 17.84 | 16.80 | 2.95 | 16.99 | ||
Redeemed | 51.40 | 81.72 | 25.00 | 225.45 | 292.33 | 150.00 | ||
Voucher | 44.46 | 415.10 | .20 | 24.98 | 281.31 | .00 | ||
Net Loss | 39.02 | 76.65 | 19.65 | 219.91 | 294.73 | 147.50 | ||
%Loss | 71.28 | 43.23 | 99.10 | 94.85 | 21.61 | 100.00 |
Descriptive Statistics: Cluster 4 Sessions (Normal v. Outliers) | ||||||||
Normal | Outlier | |||||||
Mean | SD | Median | Count | Mean | SD | Median | Count | |
Duration | 23.04 | 21.79 | 16907 | 162.67 | 49.02 | 346 | ||
Bets | 609 | 603 | 401 | 4519 | 1656 | 4071 | ||
Bets per Minute | 26.09 | 3.99 | 25.32 | 27.65 | 4.58 | 26.88 | ||
Redeemed | 106.51 | 173.11 | 50.00 | 623.16 | 805.07 | 350.00 | ||
Voucher | 94.80 | 496.65 | .45 | 224.20 | 598.56 | .20 | ||
Net Loss | 69.34 | 151.70 | 19.85 | 504.17 | 746.68 | 259.93 | ||
%Loss | 59.55 | 45.81 | 97.00 | 72.47 | 40.45 | 99.97 |