A multi-length-scale computational analysis is used to carry out the design of polyurea for maximum shockwave-mitigation performance. The computational analysis involves a combined all-atom/coarse-grained molecular-level investigation of shockwave-propagation within polyurea and a finite-element analysis of direct quantification of the shockwave-mitigation capacity of this material as a function of its chemistry (or, more specifically, of its soft-segment molecular weight). The results obtained suggest that the approach employed can correctly identify the optimal chemistry of polyurea and, thus, be of great benefit in the efforts to develop new highly-efficient blastwave-protective materials, in a cost-effective manner.
1.
Introduction
Social networking is currently sustaining the exchange of information among individuals, societies, and nations. An ever-increasing number of individuals tend to share their experiences and comments partially or entirely through online media platforms such as Weibo [1] and Facebook. The privacy features of social networks can remove communication barriers between individuals, allowing them to express themselves candidly and openly. Anyone can freely express and share their feelings [2], assess others' perspectives, and acknowledge supported opinions on social media platforms. Central to the entire communication discipline in social networks stems from viral marketing [3]. This strategic approach to information diffusion has been widely adopted in various domains, including product promotion [4], personalized recommendations [5], targeted advertising [6], the selection of influential users [7,8,9,10].
Information diffusion phenomena in social networks has brought about both immense convenience and potential threats in the dissemination of groundbreaking ideas. The perceptions of individuals within a social network have the ability to influence the commenting behavior and awareness of their neighboring users, leading to intermittent changes in the network topology. These individual perceptions, known as user influence, are critical for understanding user behavior, uncovering network propagation dynamics, and examining topology evolutions. The influence maximization (IM) problem, which was formulated by Kempe et al. [11], is a challenging issue that has been proven to be NP-hard. Cecilia et al. [12] discovered that examining the citizenship competencies plays an important role in a complex system like a society, and it is crucial to study their effects given the significance of these competences in shaping social systems. Within the domain of social network analysis, IM emerges as a pivotal undertaking involving carefully selecting a seed group within a given social network to maximize its influence on a broad spectrum of individuals. This optimization problem holds profound significance for the process of refining information diffusion strategies to accomplish diverse objectives, which can range from viral marketing and opinion-shaping to social mobilization. When executed adeptly, IM engenders remarkable enhancements in the efficiency and effectiveness of such campaigns. Notably, within the context of viral marketing, IM facilitates the identification of potential customers, thereby curtailing marketing costs and bolstering profits. Furthermore, IM plays a pivotal role in molding public opinion across various domains such as politics, health, and the environment, while also galvanizing individuals for social causes encompassing protests, donations, and petitions. By harnessing IM effectively, substantial dividends can be reaped, efficiently leveraging the power of social networks to accomplish a multitude of objectives. By employing diffusion cascades, it becomes possible to optimize the reach of influence for the chosen seed set [13,14]. Identifying influential nodes within social networks offers invaluable insight into the underlying mechanisms that govern information diffusion phenomena, thereby informing effective strategies for message propagation. Moreover, the exploration of IM contributes to the development of novel algorithms and techniques that maximize the extent and impact of information dissemination in social networks.
An independent cascade (a stochastic diffusion model) is generally employed in IM to simulate the dissemination of information by seed nodes. The spread of influence is commonly measured in terms of the number of activated users. However, the majority of research in IM has focused on stochastic diffusion patterns, and few studies have explored the global-scale role approximation of social network users. In reality, each user in a social network plays a specific role, whether as an opinion leader, a structural hole user, or an ordinary user. All of them contribute to the overall diffusion of information. This study takes into account the social reality that users with similar roles in a social network exhibit comparable behaviors and attributes. For instance, structural hole users can facilitate the exchange between two communities in a social network, while ordinary users typically receive information passively. By identifying user roles in a social network, researchers can better understand the mechanisms behind information diffusion phenomena and develop more accurate models for solving IM problems.
In recent years, several user role identification studies have recommended the knowledge contribution approach proposed in [15]. This approach identifies three user roles, namely givers, takers, and matchers, based on their knowledge contribution to disseminated information. Research context in light electric vehicle applications promotes the differentiation of more roles such as the vigilant user, passive collaborator, active decision-maker, and ambassador [16]. This four-dimensional role classification, determined by participation degrees, is particularly important in the service promotion of electric vehicles. Some studies [17,18] have emphasized the importance of user role division in the information diffusion process. However, these studies have yet to consider the key factor of user roles in the existing IM research. Furthermore, network structure information has not been fully incorporated into studies on user role division.
The study delves into a novel network embedding algorithm that integrates user role information, thereby adding a fresh and invaluable perspective to the realm of efficient IM solutions. In particular, the proposed approach utilizes network embedding's benefits to incorporate similarities between users' global roles, and is hence termed as role-based network embedding (RbNE). This methodology entails the mapping of social network nodes into a fixed-dimensional space, whereby they are represented as low-dimensional vectors that capture both the structural and user role information. After calculating the embedding vectors of nodes, a novel logical propagation network can be constructed based on their similarities. Finally, a greedy heuristic algorithm is introduced to select a seed set of $ k $ nodes.
The main contributions of our work are enumerated as follows:
1. To address the existing gap in the literature where network structure information is not fully incorporated into role division analyses, we propose a novel user role division algorithm that incorporates both coarse-grained user role division (CGURD) and fine-grained user role division (FGURD). CGURD aims to divide users into different groups based on their overall contribution to the network, while FGURD focuses on identifying more precise user roles based on their relationships with neighboring nodes. By combining these two approaches, we can achieve a more comprehensive understanding of user roles in social networks, which can provide new insights into the IM problem.
2. Current research has not examined the effects of the user's global roles in efforts to solve the IM problem. A systematic understanding of how the global roles contribute to capturing the seed set with the most significant influence is still lacking. Therefore, this paper proposes a novel network embedding algorithm, entitled RbNE, to preserve the approximation between users' global roles. Subsequently, a greedy heuristic algorithm, named RbneIM, is developed to select the seed set $ \mathcal{S} $. This algorithm considers the users' global roles as an essential factor in selecting the seed set and enables the identification of users who can maximize the influence spread in a social network.
3. Our study extensively evaluates the performance of RbneIM on four real-world datasets. The experimental results indicate that our approach significantly outperforms the state-of-the-art methods, thereby demonstrating its superior performance in terms of solving the IM problem.
2.
Related work
2.1. Social networks and linear threshold models
The task of IM was initially specified in social networks. Accordingly, this paper follows the same applied background. A social network can be represented by a two-tuple $ G = (\mathcal{V}, \mathcal{E}) $, where $ \mathcal{V} = \{v_1, v_2, ..., v_n\} $ and $ \mathcal{E} \subseteq \mathcal{V} \times \mathcal{V} = \{e_{ij}\} $ represent a set of $ n $ nodes and a set of edges between the nodes, respectively. An edge $ e_{ij} = (v_i, v_j) $ indicates a potential relationship between nodes $ v_i $ and $ v_j $, which is also associated with a weight $ w_{ij} \geqslant 0 $ of their connection strength. Table 1 summarizes the notations employed in this paper.
In social networks, many spreading processes can be modeled as complex chain reactions. Kempe et al. [19] introduced two probability diffusion models, namely independent cascade (IC) and linear threshold (LT), to explain these processes. A framework based on submodular functions is proposed to analyze the performance guarantees of algorithms for influence problems. Kempe et al. show that a greedy strategy can be within 63% of optimal for several classes of models, and they present computational experiments that demonstrate the superiority of their approximation algorithms over other node-selection heuristics. Among them, LT models are commonly involved as an extremely representative framework for understanding these mechanisms. This study evaluates the effectiveness of the seed node selection using the LT model [20]. We assume that a node $ v\in\mathcal{V} $ of a social network $ G $ is influenced by each incoming neighbor weight $ w \in [0, 1] $, which contributes to the idea behind the LT. An inactive user becomes active once the number of its neighbors reaches a certain threshold $ \theta $ of active states. Specifically, each node $ v $ in a social network $ G $ has a threshold $ \theta $ in the interval $ [0, 1] $, representing the conditional value of the node $ v $ being activated. The activation of node $ v $ can be formalized as follows:
where $ Nei(v) $ refers to the direct neighbors of node $ u $. Specifically, in the LT model, each node has two attributes: threshold and weight. A node's threshold is a value between 0 and 1, indicating how many active neighbors it needs to become activated. The weight of a neighbor is also a value between 0 and 1, representing the neighbor's influence on the node. If the threshold of a node $ i $ is t and the neighbor set is $ N $, then the node $ i $ will be activated under the following condition:
where the total weight of $ j $ when activated refers to the sum of all edge weights connecting to node $ j $.
2.2. Influence maximization
In social network analysis, IM is a crucial concept that involves selecting a subset of nodes or edges in a given network to maximize the impact of a particular objective. Individual characteristics and the effects between individuals are expressed in the form of social network topology. Hence, influence has both global and local scopes. A node with stronger global influence in a network has the ability to control the spread of information and behavior in the network. Moreover, a small subset of highly influential nodes in a social network can control the propagation of most of the information. Thus, selecting the correct seed nodes is crucial to achieving maximum impact in the spread of information in a social network. IM technology is useful for identifying critical nodes to maximize the impact of information diffusion throughout the network. Compared to a random selection of nodes, IM technology can predict and quantify influence, leading to more effective resource allocation and planning strategies. A node's influence on another node is considered local influence, and the more a node influences another node, the more the latter will follow and imitate the former's behavior in the social network. The process of defining node influence through local influence and network structure can yield better results, taking into account the requirements of different applications. Studies of dynamic social network nodes' influence has mostly been on static network topologies, examining users' influence or users' influence variation on static topologies over time.
The literature on IM problems can be traced back to a study by Domingos and Richardson [21]. Kempe et al. [19] then formulated the IM problem for the first time and presented two essential conclusions. First, IM issues can be modeled as a class of discrete optimization problems. Second, IM problems are NP-hard, which limits the development of existing IM approaches. Most of the current approaches rely on simple greedy calculations, traversing each node in the social network to calculate its marginal impact benefit. Nodes with larger marginal impact benefits are then included in the seed set $ \mathcal{S} $. The formation process can be expressed as follows.
where the value of $ \sigma(\mathcal{S}) $ represents the propagation range of a node set $ \mathcal{S} $, typically measured by the number of activated nodes. To construct the final seed set, greedy methods have been commonly employed to select the nodes with the highest influence benefit continuously. However, these methods have been criticized for their low efficiency and high time complexity, making them impractical for large-scale networks. As a result, recent research has focused on improving the extensibility and efficiency of the IM problem.
Most research on IM is based on traditional propagation models such as IC and LT [20] and their variations. However, some studies have shown that the IC and LT models may not accurately approximate influence. To address this issue, Oriedi et al. [22] proposed a selective breadth-first traversal algorithm that efficiently generates an optimal seed set for IM. According to their argument, using models like the IC and LT models may result in an incorrect influence estimation. The authors have proposed an algorithm to create the best seed set for maximizing influence. They have tested their method using real data and proved that it is better than traditional IM algorithms. Oriedi et al. have effectively developed a more precise approach for modeling social network influence. Similarly, Sun et al. [23] introduced the self-Aactivation IC (SAIC) model that incorporates self-activation as an additional factor in influence propagation, where nodes can be self-activated and selected as seeds. They characterized two optimization problems arising from self-activation: preemptive IM (PIM) and boosted PIM (BPIM). Specifically, the PIM problem involves identifying nodes that can reach the most number of nodes before other self-activated nodes if self-activated. In contrast, the BPIM problem aims to select seeds that are guaranteed to reach the most number of nodes before other self-activated nodes. They proposed scalable algorithms for both PIM and BPIM to address these challenges and assessed their approximation guarantees. The results of their study indicate that the algorithms perform much better than baseline methods, particularly for the PIM problem and the BPIM problem when there are varying self-activation behaviors among nodes.
Guo and Wu [24] investigated adaptive influence maximization with multiple activations problems, which take into account that not all users are willing to become influencers in the seed set. The researchers addressed a problem wherein each user is connected with a probability of activation as a seed, allowing for multiple triggers. To mathematically model this scenario, Guo and Wu have proposed a novel concept called adaptive-dr-submodularity, defined on the domain of an integer lattice, to maximize an adaptive monotone and dr-submodular functions while satisfying the expected knapsack constraint. This problem has not been previously investigated in existing studies, necessitating a comprehensive exploration of its approximability. They have developed a strategy that combines an adaptive greedy policy with sampling techniques to tackle the challenge of estimating expected influence spread while maintaining the approximation ratio and reducing time complexity. Other related work can be found in [25,26,27,28]. Luo et al. [25] have proposed the iterative competitive opinion maximization model, which aims to maximize the total opinions in competitive scenarios by combining user opinions and rival strategies. Unlike existing IM approaches, this model effectively suppresses the propagation of negative opinions and identifies optimal responses to opponents' seed node choices. The authors employ an iterative inference algorithm based on the greedy strategy to reduce computational complexity and achieve optimal outcomes. Zhang and Zhang [26] investigated the computational complexity of IM and analyzed the approximation guarantee of the greedy algorithm within the generalized model. Their research introduces a coordination game model that offers a game-theoretic perspective on IM. This model extends existing frameworks such as the majority vote model and the LT model. Furthermore, the incorporation of strategies to improve the algorithm's performance represents a significant contribution to the existing body of literature. However, as mentioned in the introduction section, every user in a social network plays a role in disseminating information. The previous IM studies mentioned above ignore the global-scale role approximation of users in the network. Liu et al. [27] introduced CONE, an active learning framework designed to address the estimation of user opinions in multi-round campaigns involving influence propagation. Their methodological approach to modeling user preference data is notable for their ability to handle scenarios in which prior knowledge of user opinions is unavailable. This approach holds practical implications, particularly in viral marketing, and including precision advertising and reputation management domains. Banerjee et al. [28] have presented a pioneering model, termed UIC, to overcome the existing constraints in the literature. The UIC model stands out by integrating users' economic factors into their product adoption and purchase decisions, aiming to maximize social welfare and foster customer loyalty within the network. Additionally, the authors shed light on the underexplored realm of complementary items, which has received limited scrutiny in previous studies on multiple items.
Several recent studies on IM have effectively utilized deep learning techniques to identify and evaluate user influence in social networks [29,30,31,32,33,34,35,36]. These studies have shown promising results in the area of improving the performance of IM algorithms. Keikha et al. [29] have presented a novel methodology to tackle the challenge of IM on interconnected networks, employing deep learning techniques. Their proposed algorithm harnesses the power of deep learning for feature engineering, allowing for the preservation of both local and global structural information. By showcasing monotonicity and submodularity, the algorithm provides an assurance of an optimal solution. Notably, this study pioneers the utilization of network embedding to address the IM problem, marking a significant advancement in the field. Zhan et al. [30] proposed a general framework called NE-IM that leverages representation learning to address computational cost and improve stability. NE-IM contains two components: structure-based embedding and feature-based embedding. Their work incorporates heterogeneous information in IM models and applies representation learning to improve the efficiency and accuracy of IM models. Tian et al. [31] proposed two topic-aware social influence propagation models based on IC and LT models and developed a deep influence evaluation model to evaluate the user influence under different circumstances. They encoded the feature of each node by a vector, which enabled them to construct a solution efficiently without considering the complex graph structure. Their network learns a generalized heuristic framework to solve the NP-hard TIM problem using meta-learning, without requiring specialized knowledge and improving advertising injections. Li et al. [32] have presented a framework aimed at maximizing market influence in the USA domestic air passenger transportation market by adjusting flight frequencies. They used neural networks to predict market influence while considering several features such as air carrier performance features and transportation network features. They integrated neural networks to predict market influence and developed an adaptive gradient ascent method for solving the nonlinear optimization problem in flight frequency optimization. Zhang et al. [33] designed a network dynamic GCN to extract the in-depth structural information of social networks for IM. The proposed algorithm utilizes a leader fake labeling mechanism to generate node labels that are helpful for seed node selection during training. Finally, a heuristic method based on the Mahalanobis distance was developed to select influential seed nodes with learned node representations. Li et al. [34] suggested a Gaussian propagation model based on social networks and a multi-dimensional space modeling approach for propagation simulation. Their approach uses an improved CELF algorithm to accelerate the IM algorithm and evaluate the proposed technique based on theoretical proofs. Li et al. [35] then proposed a new approach to IM in social networks that takes into account multi-dimensional characteristics such as user emotions and group features. Specifically, Li et al. defined user emotion power and cluster credibility as measures of the interaction effects of individual emotions and proposed a potential influence user discovery algorithm based on an emotion aggregation mechanism to locate seed candidate sets. Li et al. [36] proposed a novel adaptive agent-based evolutionary approach to solve the IM problem in dynamic and large-scale social networks. A key component of the proposed approach is an adaptive solution optimizer that drives the evolutionary process and adapts candidate solutions dynamically. Motivated by the success of these works, our paper aims to take the next step and integrate the global role information of users in the final embedding vector using network embedding. By incorporating this additional information, our proposed method will further enhance the accuracy and effectiveness of IM in social networks.
2.3. Network representation learning
Network representation learning, also known as network embedding [], has garnered significant attention in recent years. This technology aims to transform a network's features into a low-dimensional continuous representation matrix while retaining the network structure and inherent properties. Recent advances in deep learning have enabled researchers to generate node embeddings through social network analysis techniques, such as DeepWalk [38], LINE [39], and node2vec [40]. These techniques utilize a prearranged random walk strategy to construct a corpus that displays the connections between the network components while preserving the characteristics of the network structure. The SkipGram model [41] is employed to acquire the node vector representation of a network. This model uses the context of words to identify the underlying relationships between nodes in the network. After an extensive process, the low-dimensional embedding representations of the nodes in the network are established, allowing a greater understanding of the network structure and the underlying relationships between nodes. Although these random walk methods have been proven to achieve better performance in network embedding, they ignore nodes' global structure and properties. Analyzing these global structures and node characteristics is essential to understanding the network accurately. Consequently, a few investigators have started to integrate network exploration techniques with other node properties. For example, Keikha et al. [42] devised a network embedding algorithm, community aware random walk for network embedding (CARE), that aims to conserve the local neighborhood and community information of a network while maintaining its global structure.
Drawing on prior literature, we innovatively utilized random walks to extract the embedding matrix of the target network. Diverging from previous studies, our novel approach places emphasis on incorporating the user's global role information. This integration enables a comprehensive representation of roles and their local neighbors within the network. As a result, our methodology provides an enhanced perspective of the user's network position and their potential associations with other users.
3.
Methodology
3.1. Analysis of user role division
Information dissemination is a complex process due to the dynamic influence of one user on another [43]. The structural attributes of users in a social network reflect their roles in different communities. In this context, the primary challenge is to understand how the network structure affects the dissemination of information in a role-divided scenario. Most of the existing random walk methods based on the network structure only consider the influence of the direct domain nodes in a network, such as edge propagation probabilities between two nodes, while ignoring the roles of users in the network. Users with similar roles in the network tend to have similar structural attributes, and previous research methods have not accurately captured this feature.
Figure 1 shows a classic social network scenario. Each node in the network represents a user, and the connection table between the nodes indicates the relationship between the users. In addition, the shared colors (yellow or red) in the figure imply that these nodes have similar global roles. Two communities, labeled as $ C_1 $ and $ C_2 $, are also depicted in the figure, each having its own opinion leaders (yellow nodes 1 and 4) that usually have similar attributes, such as higher node degrees. Red nodes 2 and 3 span multiple communities and typically play a critical role in the exchange of community information. Such red users are generally called structural hole nodes in a social network. This example highlights two essential aspects of following users on social networks:
1. Similar user roles usually have similar attributes;
2. Different user roles have distinct functions in the exchange of information.
The global role information of users in the network plays a vital role in information dissemination. In the traditional random walk sampling process, first-order or high-order neighbor nodes are considered, but the similarity of users with similar global roles is overlooked. In network representation learning, we hope that similar nodes in the network will eventually have similar vector representations. Therefore, users with comparable global roles should have corresponding vector representations. However, conventional or biased random walks cannot accurately approximate the user's global role. This paper addresses this limitation by incorporating the user's global role into the traditional random walk process. By sampling from the training corpus, we obtain the vector representation of each node. In the upcoming research, the aim is to investigate the role division of users in the network. The problem will be approached from two perspectives. First, the focus will be on CGURD. Second, the issue of user role division will be addressed in greater detail.
In a given social network $ G(V, E) $, it is possible to represent its attributes or structural characteristics using a matrix $ A $ of dimensions $ \lvert V \rvert \times D $, where $ D $ represents the embedded dimension in $ A_{|V| \times D} $. Users in the network with similar attributes or structural characteristics are expected to belong to the same role set. This study aims to map each user $ V_i $ to its corresponding role $ R_j $ in the set of user roles $ R = {R_1, R_2, ..., R_K} $. It is assumed that user roles in the network can be classified into $ K $ categories where $ K $ is significantly smaller than $ \lvert V \rvert $, i.e., $ k \ll \lvert V \rvert $. Our aim is to determine a mapping function $ \phi: V_i \rightarrow R_k $ that can map each user to its role $ R $ based on its attributes or structural information.
In network representation learning, the concepts of coarse-grained and fine-grained refer to two distinct levels of abstraction concerning the network graph. Coarse-grained clustering aims to consolidate nodes to maximize their similarity within groups while minimizing the similarity between groups. This method typically produces larger clusters consisting of nodes with similar properties or roles in the network. Conversely, fine-grained clustering aims to group nodes with highly specific features or roles, resulting in smaller clusters with nodes possessing more precise properties or roles within the network.
Coarse-grained clustering typically yields smaller clusters comprising nodes with more specific properties or roles, facilitating the identification of larger-scale patterns and communities in the network. This approach is especially advantageous when computational efficiency is a priority. In contrast, fine-grained clustering focuses on identifying highly specific patterns or roles within the network. As a result, this approach may generate a larger number of clusters and require more computational resources. Nevertheless, the fine-grained approach offers valuable insights into the intricate structural properties and relationships present in the network.
3.1.1. CGURD
We can intuitively express the simple mapping of vector $ A_i $ by using either its cumulative sum or average value. Specifically, this can be expressed as follows:
The vector $ x = [A_{i1}, A_{i2}, A_{i3}, ..., A_{iD}] $ is obtained from the matrix $ A $, where $ R(x) $ represents the specific user role of $ x $. This means that when a vector value $ x $ is input, its corresponding role category is output. It should be noted that if the matrix $ A $ represents the structural characteristics of users, such as the adjacency matrix of social network $ G $, then the sum of features for each user represents the degree of its node. Intuitively, the mapping relationship described above is divided based on the node degree of users.
Unfortunately, Eq (3.1) is not interpretable if the matrix $ A $ represents the attribute characteristics of the nodes. Therefore, we need an alternative approach to address this issue. In this paper, we introduce the Non-negative matrix factorization (NMF) algorithm to handle this problem. The process of obtaining user roles from the user characteristic matrix can be represented in Figure 2:
Based on the concept of NMF matrix decomposition, the dimensionality of the user characteristic matrix $ A_{\lvert V \rvert \times D} $ can be reduced using the following iterative formula:
In the NMF algorithm, we utilize the search matrix $ R_{\lvert V \rvert \times M } = [r_1\ r_2 \cdots \ r_M] \in \mathcal{R}{+}_{\lvert V \rvert \times M } $ and the coefficient matrix $ F_{M \times D} = [f_1\ f_2\cdots\ f_D] \in \mathcal{R}{+}_{M \times D} $ to reduce the dimensionality of the user characteristic matrix $ A_{\lvert V \rvert \times D} $. Among them, $ M $ is the number of basis vectors and is often much smaller than $ \lvert V \rvert $ or $ D $, i.e., $ M \ll \lvert V \rvert, M \ll D $. In this paper, the matrix $ R $ is regarded as a user role matrix, where each row represents the role feature vector to which the user belongs. On the other hand, the $ F $ matrix represents the probability of each role to which each user belongs. We aim to minimize the loss function:
The final user role matrix $ R $ is obtained after minimizing the loss function with a regularization penalty $ \mathcal{R}(R, F) $. Afterward, the matrix $ R \in \mathcal{R} ^{\lvert V \rvert \times M} $ is partitioned into $ K $ disjoint sets of nodes $ V_1, V_2, ..., V_K $ by solving the k-means objective as follows:
In the process described above, the user role division algorithm can be described as Algorithm 1.
3.1.2. FGURD
To analyze the characteristics of user roles during information dissemination in online social networks, this study classifies all users into three categories: opinion leaders, spanner holes, and ordinary users. The definition and primary properties of a node are illustrated in Figure 1. using a simple example.
Opinion leader: This refers to the minority of individuals at the core of the network who serve as a crucial source of information and influence within the community, capable of shaping the attitudes of the majority. As illustrated in the figure above, the yellow node is located at the center of the community to which it belongs, representing an opinion leader.
Structural hole users: Structural hole users are in a key position in the network but differ from opinion leader nodes in terms of high influence. They significantly impact the depth and breadth of information dissemination by acting as bridges between different communities. As demonstrated in the above figure, the red nodes represent such users.
Ordinary users: In the context of online social networks, ordinary users refer to those who do not possess the characteristics of structural holes or belong to a group of opinion leaders. Despite not having a central position in the network, ordinary users represent the majority of users and are considered edge users. They play a crucial role in information dissemination. In the presented figure, the white nodes depict ordinary users, emphasizing their significance as an integral part of the network. Although ordinary user nodes may not directly affect the global structure and evolution of the network like influential user nodes, they are also an indispensable part of the social network. Ordinary users nodes play the following roles in the benefits of social networks:
● Provide content: Ordinary users can provide rich and diverse content to social networks, attract more users to join the network and increase the value of the whole social network.
● Spreading information: Ordinary users can spread information and opinions by their own behavior, so they can help content and opinion diffusion by spreading and reposting even if they have no influence.
● Guide the diffusion of the network: Ordinary users establish their own social relationships, improve their exposure rate, and then attract more ordinary users like them to join the network, thus contributing to the prosperity and development of social networks.
Online social network users exhibit several significant traits, prompting the search for a viable approach to differentiate user roles based on these attributes.
(1) Opinion leader influence
The present study focuses on identifying opinion leaders within the network by analyzing both the local and global characteristics of users and studying each node's influence. In the information dissemination process, the local influence that node $ u $ has on node $ v $ is determined by two main factors. First, the influence of node $ u $ itself is typically evaluated by its degree within the social network. Second, the number of mutual friends of node $ u $ that have an influence on node $ v $ which can be measured by the Jaccard coefficient.
where, $ Inf_{uv} $ represents the local influence of node $ u $ on node $ v $. $ D(u) $ is an expression related to 1-hop neighbors of user $ u $, which is applied to measure the local influence of user $ u $. $ Nei(u) $ is the set of direct neighbors of $ u $. Jaccard's coefficient $ \frac{\lvert Nei(u) \cap Nei(v)\rvert}{\lvert Nei(u) \cup Nei(v) \rvert} $ is a widely-used measure to estimate the mutual friends of nodes $ u $ and $ v $, and it is adopted to calculate the local influence of nodes $ u $ and $ v $. The balance-parameters $ \alpha_1 $ and $ \alpha_2 $ satisfy $ \alpha_1 + \alpha_2 = 1 $. The formula to calculate $ D(i) $ is as follows:
To calculate the influence value for node $ i $ on its neighbors, the influence-gathering equation is used. This equation takes into account the local influence of the node $ i $ on node $ v $ and the influence of each node by other nodes. After obtaining the local influence of a node, we can calculate the influence value for node $ i $ on its neighbors using Eq (3.7):
where $ N(i) $ is the neighbor set of node $ i $. In terms of the global influence of nodes in social networks, this study focuses on the betweenness centrality of nodes.
Equation (3.8) defines the betweenness centrality of a node, which is used to determine the global influence of nodes in social networks. Here, $ P_{vk} $ represents the number of shortest paths between two nodes $ v $ and $ k $, while $ P_{vuk} $ represents the number of shortest paths between nodes $ v $ and $ k $ passing through node $ u $. $ \hat{B}(u) $ stands for the normalized global influence. Finally, we can obtain the betweenness centrality of node $ u $.
This study combines local and global structural information to obtain the influence of node $ i $ in the entire network.
where $ OLI_{u} $ represents the opinion leader influence OLI score of node $ u $.
In order to achieve a balance between global and local influences in the calculation of $ OLI_{u} $, the balance parameters $ \beta_1 $ and $ \beta_2 $ are utilized. These parameters are designed to adjust the relative weight of each factor, and they are subject to the constraint that their sum must equal 1. By combining global and local structural information, the opinion leader index of each node can be determined. The process of identifying opinion leaders in social networks is described in Algorithm 2.
As presented in Table 2, we employed diverse methods to assess the influence of nodes in the case network, which is illustrated in Figure 1.
To evaluate the effectiveness of our proposed approach, we compared it against several well-known centrality measures, including degree centrality (DC), betweenness centrality (BC), closeness centrality (CC), and eigenvector centrality (EC). The final row of Table 2 displays the sum of values for each method. It is worth noting that the sum of the respective columns for each method is different. In order to visualize the data more intuitively, we employed a stacked line chart to demonstrate the trends in the different node measurement methods within the case network, as illustrated in Figure 3.
The network structure depicted in Figure 1 reveals that node $ 1 $ and node $ 4 $ have higher centrality, which is consistent with the trends illustrated in Figure 3 for all methods. Figure 3 indicates that the proposed OLI method exhibits a similar trend as the other methods, but with a higher degree of discrimination. Therefore, compared to the other methods evaluated in Table 2, the method proposed in this paper performs better. For the case network structure of Figure 1, we set $ \alpha_1 $ and $ \alpha_2 $ to $ 0.8 $ and $ 0.2 $, respectively, and $ \beta_1 $ and $ \beta_2 $ to $ 0.5 $ each. Since the network structure is small, this study emphasizes the node's own influence when computing the local influence of the node. The local and global structural information of the nodes are integrated, and the same weight is assigned to the node's local influence and global influence to calculate the final OLI.
(2) Structural hole score
Burt's theory of structural holes [44] explains the competitive relationships in social networks. In the realm of social networks, it is a common occurrence for individuals with similar professional or personal interests to seek each other out and form tight-knit communities. The ties between these groups, however, tend to be comparatively sparse. In network parlance, nodes that serve as inter-group conduits, known as "structural holes", play a crucial role in facilitating the exchange of information across community boundaries. As shown in Figure 1, nodes $ 2 $ and $ 3 $ act as bridges for communication between two communities. The ability of a node to utilize structural holes is measured by the network constraint coefficient, as shown in Eq (3.10). A smaller network constraint coefficient indicates a greater possibility of structural holes, which can be beneficial for information dissemination.
The network constraint coefficient value for node $ u $ is denoted by $ NC(u) $:
where, $ w_{uv} $ represents the ratio of the energy invested by node $ u $ to maintain the relationship with node $ v $ to the total energy invested by node $ u $, as shown in Eq (3.11).
Equation (3.10) is applied to calculate the network constraint coefficient value for all nodes in the network. In an unweighted graph, the edge weight $ weight_{uv} $ is equal to $ 1 $ if there is a connecting edge between node $ u $ and node $ v $, and it represents the weight value of the edge from node $ u $ to node $ v $ otherwise. After calculating the network constraint coefficient values for all nodes, we can identify the first $ k $ nodes with smaller values as the target structural hole nodes.
(3) Ordinary nodes
The selection thresholds for opinion leaders and structural hole users in the network are determined as $ \gamma_1 $ and $ \gamma_2 $, respectively, based on the findings of Wu et al. [45]. According to their study, only 1% of users in a network are considered as opinion leaders or structural hole users. However, they play a crucial role in creating or participating in 50% of the links in the network. To obtain the set of ordinary user nodes in the network, Equation 3.12 is utilized.
where $ Or $ represents a collection of ordinary nodes, $ Op $ represents a collection of opinion leader nodes, $ St $ represents a collection of structural hole nodes, and $ V $ represents all nodes in the network. Finally, the process of FGURD of nodes in the network can be described by Algorithm 3.
3.2. Role-based random walk for embedding
In the previous section, the process of identifying user roles in social networks was discussed. To obtain network embedding representations of users that preserve the local structure and global role approximations, a random walk-based network embedding approach is adopted, as illustrated in Figure 4. First, the role of nodes in the network is calculated using either Algorithm 1 or Algorithm 3. Second, the first-order or second-order local approximation of the node is captured by the random walk of the topological structure, and the approximation of the global role of the node is preserved through the random walk of the node role. By combining all random walks into one corpus, node embeddings can be learned by training the SkipGram model with negative sampling [46]. The SkipGram model can predict the conditional probability of co-occurrence among words within a fixed window size, and maximizing this probability allows the model to obtain vector representations of words in the corpus. The objective of the SkipGram model is to maximize the average log probability of a sequence of training words $ w_1 $, $ w_2 $, $ w_3 $, ..., $ w_T $, as shown in the equation above.
where the parameter $ win $ represents a predefined window size, with a larger value leading to more additional training examples and potentially higher accuracy. The softmax function is employed to estimate the probability distribution of $ p(w_{t+j} \mid w_t) $, which is defined as follows:
where the word representations of "input" and "output" data are denoted as $ v_w $ and $ v^{'}_w $, respectively. $ W $ is the total number of words in the vocabulary.
In the context of network embedding, the random walk algorithm is commonly used to generate sequences of nodes. Starting from an initial node $ u\in G $, the algorithm randomly selects a neighboring node and moves to it. This process is repeated for a predefined number of steps. A method for network embedding called Role-based Random Walk Network Embedding (RbNE) has been developed, and its pseudocode is presented in Algorithm 4. RbNE takes a social network as input and outputs low-dimensional representations for each node in the network, where nodes with similar roles will have similar representations. To obtain the node representations in RbNE, each node in the network is first divided into roles using either Algorithm 1 or Algorithm 3, which correspond to CGURD and FGURD, respectively. Then, the role of the node is used to perform random walk simulation, resulting in the node's representation. The random walk network embedding methods that are divided by user roles in Algorithm 1 and Algorithm 2 are named RbNE-CG and RbNE-FG, respectively. The parameter settings for the algorithms are detailed in the corresponding sections.
The traditional random walk method (line 5, Algorithm 4 captures the local approximation of nodes, similar to Deepwalk and node2vec. In contrast, the random walk method based on node role (line 6, Algorithm 4 captures the global node approximation. As a result, the final training corpus $ walks $ contains both the local neighbor approximation relationship and the global role approximation relationship of each node.
Embedding techniques in network analysis entail the inclusion of neighboring nodes, degrees, labels, and other relevant attributes to impart specifications onto the individual nodes within the network. This approach unveils valuable associations and connections between nodes, ultimately amplifying the efficacy of both network analysis and machine learning methodologies.
3.3. IM with node embeddings
The representations of all nodes in the network $ G $ are now available and are denoted as $ \Phi \in R^{\lvert V \rvert\times d} $. To measure the relationship between any two nodes, the cosine similarity of their representation vectors can be calculated directly using the following equation:
The relationship score between two nodes $ u $ and $ v $ is defined as $ Sim(u, v) $ in our approach. A higher score indicates a higher probability of node $ u $ influencing node $ v $. Thus, a new logically structured network called a propagation probability network can be constructed, where the similarity between two nodes is determined by Eq (3.15). To simplify the network, a similarity threshold $ \theta $ is introduced. Only when their similarity score is greater than $ \theta $ is a logical connection edge added between two nodes $ u $ and $ v $. Hence, the adjacency matrix representation of the connection strength between any two nodes in the network can be described as follows:
where $ \theta $ represents the hyper parameter and $ \theta $ belongs to the interval $ (0, 1) $. By applying the weight calculation method, we can obtain the desired new logical structure network. In this structure, $ p_{u, v} $ denotes the probability of information propagation from node $ u $ to node $ v $. Assuming the independence of influence probabilities among users, the probability of node $ i $ being activated by its neighbor $ Nei(i) $ can be computed using the propagation probability of its friends:
Similarly, the total influence spread of all non-seed nodes under the influence of the seed node set $ \mathcal{S} $ can be quantified for each vertex $ u \in V $:
The objective of the IM task is to increase the number of activated nodes influenced by the seed node set $ \mathcal{S} $, which is equivalent to maximizing the value of $ Pr(V \mid \mathcal{S}) $. Therefore, our optimization goal can be formulated as follows:
As the direct optimization of the optimization goal is not feasible, a greedy heuristic algorithm is utilized in this study. Specifically, for undirected networks, a Connected components [19] type of heuristic is employed to compute the score for each node; and subsequently, the $ k $ nodes with the highest scores are chosen. Algorithm 5. presents the three-step procedure, which includes the following:
1. Calculation of the similarity between each user utilizing the network embedding matrix $ \Phi $ and construction of a new logical network structure matrix $ A $ (lines 2–3).
2. Random deletion of edges according to their weights to acquire connected components in the network (lines 11–20).
3. Assign each node a weight value based on the number of its neighbors and select the $ k $ nodes with the highest weight values as the seed node set (lines 21–28).
By implementing this heuristic algorithm, the seed node set $ \mathcal{S} $ can be effectively selected.
3.4. Algorithmic complexity discussion
The proposed RbneIM algorithm (Algorithm 5) has a time complexity of $ O(R*\lvert V \rvert*k) $, where $ R $ and $ k $ are both constants. The outer loop runs for a constant number of iterations $ R $, while the inner loop traverses the network and the current cropped subgraph, which has a constant size $ k $. In contrast, the RbNE algorithm (Algorithm 4) has greater time complexity determined by the most expensive of its three parts. First, Algorithm 3 is called to perform role division, which has a time complexity of $ O(\lvert V \rvert*n) $, where $ n $ is the largest number of node neighbors in the network $ \lvert Nei(i)\rvert $. Second, in the sampling process, the algorithm iterates according to the predefined sampling length $ len $ (constant) and randomly adds nodes to the sampling sequence according to predefined rules, which takes $ O(\lvert V \rvert*len) $ time. Finally, the SkipGram algorithm has a time complexity of $ O(\lvert V \rvert) $. Therefore, the time complexity of the RbNE algorithm is $ O(\lvert V \rvert) * (O(\lvert V \rvert * n) + O(\lvert V \rvert * len) + O(\lvert V \rvert)) $. Since $ n $, $ k $, and $ len $ are all constants, the final time complexity of the proposed RbneIM algorithm is $ O(\lvert V \rvert^2) $. Several baseline methods utilize matrix operations, but this approach can lead to memory insufficiency when dealing with large-scale graphs. In contrast, the RbneIM method employs heuristic algorithms that significantly reduce the computational complexity, resulting in strong scalability.
4.
Experimental evaluation
This section begins by presenting the social network dataset and parameter settings employed in this study. Subsequently, the baseline algorithm used is briefly introduced, followed by an analysis of the experimental findings.
4.1. Datasets
This study utilized six public real-world datasets to provide varying-sized networks. This aimed to assess the feasibility and effectiveness of the proposed IM method. Table 3 presents a comprehensive overview of the datasets. The datasets were carefully selected based on their diversity, which includes different types of social networks, ranging from online social networks to co-authorship networks. Moreover, the datasets contain a varying number of nodes and edges, ranging from small-scale networks to large-scale networks, thus providing a diverse range of networks for our analysis. By using such diverse datasets, we aim to evaluate the performance of our proposed method under different network settings, which can help to enhance the generalizability of our findings.
(i) Dolphins* [47]. The dataset used in this study is an undirected social network consisting of 62 dolphins living in a community off of Doubtful Sound, New Zealand. The dataset encompasses frequent associations between the dolphins in the form of links between them.
*http://www-personal.umich.edu/mejn/netdata/
(ii) Facebook_Caltech36† [48]. A social friendship network extracted from Facebook consisting of people as nodes, with edges representing friendship ties.
†https://networkrepository.com/socfb-Caltech36.php
(iii) NetScience‡ [49]. The NetScience dataset is a co-authorship network that involves scientists working on network theory and experiments. A visual representation of the largest component of this network can be accessed via the URL.
‡http://www-personal.umich.edu/mejn/centrality/
(iv) Cora§. The Cora dataset is a collection of machine learning papers, and it includes the citation relationships between them. These relationships are used to construct the network topology for this dataset.
§https://linqs.soe.ucsc.edu/data
(v) Ca-GrQc¶ [50]. The Ca-GrQc dataset is a collaboration network of arXiv General Relativity and Quantum Cosmology. It is derived from the e-print arXiv and includes scientific collaborations between author papers submitted to the General Relativity and Quantum Cosmology categories. The dataset covers papers submitted between January 1993 and April 2003.
¶http://snap.stanford.edu/data/ca-GrQc.html
(vi) Facebook-Government|| [51].The data collection process involved gathering information on the Facebook pages of politicians in November 2017. The resulting network is represented as nodes, which correspond to the politician pages, and edges, which indicate mutual relationships between them.
||https://networkrepository.com/fb-pages-government.php
We have plotted the frequency distribution of user node degrees to characterize the Cora and NetScience datasets.
The user node degree distributions for the Cora and NetScience datasets are presented in Figure 5, indicating a power-law distribution. This suggests that certain users are more susceptible to influence from their social connections.
4.1.1. Baseline methods
This paper introduces three typical initial ranking methods and a state-of-the-art IM method to evaluate the comparative performance of the proposed RbneIM method.
(i) Random: Nodes are initially ranked randomly.
(ii) Degree centrality [52]. Degree centrality measures the influence of a node based on the number of its neighbors, with nodes having higher degrees being considered as more influential.
(iii) Betweenness centrality [53]. The betweenness centrality measures the extent to which a node acts as a bridge along the shortest paths between other nodes. A node with higher betweenness centrality has a greater number of shortest paths passing through it. The betweenness of node $ u $ is calculated by Eq (3.8).
(iv) Pagerank centrality [54]. The PageRank centrality measures the importance of a node based on the structure of the network. It was originally created to evaluate the importance of web pages by using their link structures. Since then, it has been applied in various fields, such as social network analysis, link prediction and recommendation analysis.
(v) DeepIM [29]. The DeepIM algorithm is the first to employ deep learning techniques to solve the IM problem. It uses the CARE algorithm [42] to learn node embeddings. Cosine similarity is employed to measure similarity between nodes, $ k $ similar nodes are recorded for each node, and a set of seed nodes is selected through statistical analysis.
(vi) GCNIM [33].The research contributes to the field of social network analysis by proposing a new technique that overcomes the limitations of traditional algorithms and deep learning-based approaches while achieving high performance and efficiency in the area of seed set identification for IM tasks.
(vii) ABEM [36]. The approach utilizes agent-based modeling and genetic algorithms to effectively address the complex task of selecting key influencers in a distributed environment. By leveraging these techniques, the approach identifies users' influence capability and optimizes the influencer set's selection. This innovative solution tackles the challenge of capturing real-time user and diffusion features, enabling the accurate and efficient identification of key influencers.
It is noteworthy that this paper presents two algorithms, namely CGURD (Algorithm 1) and FGURD (Algorithm 3), for role division. The role division outcome will have an impact on the sampling outcome of the final random walk process, leading to a different embedding representation vector of the node under the two algorithms. Therefore, the use of these two node partitioning algorithms will ultimately influence the selection of seed nodes. In this paper, the RbneIM algorithm is executed using Algorithm 1. and Algorithm 3. for node division, resulting in RbneIM-CG and RbneIM-FG, respectively.
(i) RbneIM-CG: This model utilizes the CGURD algorithm to determine the global role of users in the network. The RbNE algorithm is then employed to perform the sampling of the training corpus. The final selection of seed nodes is accomplished through RbneIM.
(ii) RbneIM-FG: Compared with the RbneIM-CG model, only the user role division algorithm is different.
4.2. Analysis and comparison
This section presents an analysis of the key techniques proposed in this paper and compares them with existing approaches to demonstrate the feasibility of the proposed approach. IM is the foundation for introducing and understanding influence dissemination within social networks.
(i) Our proposed methodology presents several advantages over existing deep learning IM methods. It leverages network embedding techniques to assign attribute values to user nodes, allowing for a more comprehensive analysis of user influence. Unlike other methods that primarily focus on a single global factor or the node's own attributes, our approach also takes into account the influence factor between users, providing a more nuanced understanding of influence dynamics in social networks. Additionally, our methodology considers attributes at multiple levels of granularity, enabling a more fine-grained analysis and capturing the diverse factors that contribute to user influence.
(ii) Our study presents an innovative algorithm for user role division, integrating both CGURD and FGURD to offer a comprehensive and refined approach. The CGURD component of our algorithm focuses on classifying users into distinct groups based on their overall network contribution, allowing for a broader understanding of user roles. In contrast, the FGURD aspect concentrates on analyzing the relationships between users and their adjacent nodes to identify more specific and localized user roles. By combining CGURD and FGURD, our algorithm provides a robust and precise user role division strategy that captures the intricacies of user dynamics in the network. Furthermore, the methods evaluated in this paper employ various network embedding techniques, as outlined in Table 4.
4.3. Parameter setting
The experiments conducted in this study used default values for various parameters mentioned in the paper. Specifically, the node local influence balance parameters $ \alpha_1 $ and $ \alpha_2 $ in Eq (3.15) were set to $ 0.8 $ and $ 0.2 $, respectively. The OLI score balance parameters $ \beta_1 $ and $ \beta_2 $ in Eq (3.5) were set to $ 0.5 $ and $ 0.5 $, respectively. In Eq (3.12), the thresholds $ \gamma_1 $ and $ \gamma_2 $ for opinion leaders and structural hole nodes were set to $ 0.2 $ and $ 0.1 $, respectively. Other parameters used in the experiments were set as follows: random walk length $ len = 80 $, random walk number $ number = 10 $ and embedding matrix dimension $ d = 256 $. The SkipGram training window size was set to $ win = 5 $, and the negative sampling frequency and learning rate were both set to $ 0.025 $. The threshold $ \theta = 0.5 $ was used to establish a connection edge between any two nodes of the new logical network in Eq (3.16). In the RbneIM algorithm, the propagation probability of the IC model was set to $ 0.5 $, and the number of algorithm iterations $ R = 20 $. The non-default value parameters for each dataset are shown in Table 5.
All methods were implemented using Python 3, and the experiments were performed on a Windows OS with AMD Ryzen 5 3500U, 2.10 GHz CPU and 16 GB memory. Details of our software and hardware environments were as follows: Windows 11, Python ver. 3.6.6, NumPy ver. 1.19.2, NetworkX ver. 2.1, Gensim ver. 3.8.3, Pandas ver. 0.24.2, Matplotlib ver. 2.2.3.
5.
Results
In Eq (3.16), a threshold parameter $ \theta $ was defined to create a new logical network structure, which directly affects the performance of the RbneIM model. Therefore, the analysis of $ \theta $ parameters is performed first. Seed node sizes were selected based on the dataset scale, ranging from 2 to 20 with a stride of 2 for the Dolphins dataset, and from 5 to 50 with a stride of 5 for the remaining datasets. Experimental results are displayed in Figure 6, where $ \theta $ was chosen as a value between 0.1 to 0.9 with a step size of 0.2. The figure shows that different datasets have different optimal $ \theta $ values. The influence spread was considered for various numbers of seed nodes, and a counting method was used to evaluate the pros and cons of each $ \theta $ in the current dataset. The best performing $ \theta $ was then selected for each dataset value. Finally, the chosen values for $ \theta $ were 0.7, 0.9, 0.5, 0.5, 0.1 and 0.7 for the Dolphins, Facebook_Caltech36, Netscience, Cora, CA-GrQc and Facebook-Government datasets, respectively.
We present a comprehensive comparison of the influence spread achieved by different algorithms, namely random, DC, BC, PageRank centrality, DeepIM and RbneIM, utilizing the LT model. The corresponding results are illustrated in Figure 7 for six distinct networks. Upon examining smaller datasets, as depicted in Figure 7a for Dolphins and Figure 7b for Facebook_Caltech36, we notice that the influence spread generated by various models exhibit similar outcomes. However, our proposed RbneIM method maintained its superior effectiveness. As the dataset size increased, both DeepIM and RbneIM consistently outperformed the other approaches by a substantial margin, with RbneIM exhibiting the highest level of performance. Experiments on multiple datasets demonstrated the superior performance of the proposed RbneIM method.
Additionally, the size of the seed nodes selected for each dataset varied due to the differences in dataset size. For example, in the Dolphins network, the seed node size $ [2, 4, 6, ...] $ was selected, with a maximum of 20 nodes selected for the seed node set. For Facebook_Caltech36, a maximum of 50 nodes were selected, and for the remaining datasets, up to 200 nodes were selected. The average results of LT simulations for each of the six networks, with varying numbers of seed nodes, are reported in Table 7.
The average influence diffusion of each algorithm on six datasets with different seed node numbers, taken from 20 experiments was calculated in Table 7. The results indicate that our proposed RbneIM algorithm outperforms the baseline algorithms, particularly Random and DC. Random selects the seed node randomly from any network node, while DC centrality uses the number of neighboring nodes in one hop. BC and PageRank centrality calculate the number of shortest paths through a node and the importance of its links, respectively. DeepIM takes into account the node community structure factor in the network embedding and calculates node similarity to select seed nodes. However, these baseline algorithms do not consider users' global role similarity, which reduces their efficiency in selecting seed nodes.
As the proposed network embedding algorithm relies on a sequence of random walk sampling, an analysis was conducted to examine the impact of different random walk lengths on the RnbeIM model. To this end, the experiment involved selecting random walk lengths $ len $ ranging from 20 to 200 with a step size of 20; the analysis results are presented in Table 8. Seed node sets were determined based on the size of the datasets, with a size of 20 for the Dolphins dataset and 50 for the remaining datasets. Corresponding to the step size, the number of seed node sets was calculated, and 200 rounds of LT model propagation simulation were carried out. The final results represent the average value, with one decimal place reserved. The analysis in Table 8 highlights that different step lengths of various datasets have a notable impact on the experimental outcomes. The optimal result is identified in bold font in the table, and its corresponding random walk length was selected as the parameter value on this dataset. Table 5 presents the selected parameter values.
In addition, a comparison was made between the RbneIM-FG and RbneIM-CG models proposed in this paper on six datasets. It is noteworthy that in the previous experiments, the RbneIM model used the FGURD (Algorithm 3.) algorithm by default to divide user roles in the network, which is the RbneIM-FG model. The experimental results are presented in Figure 8, which shows that the RbneIM-FG model outperforms the RbneIM-CG model.
The results clearly demonstrate that the use of a global influence algorithm alone to select the seed set yields extremely low propagation efficiency. However, by leveraging the proposed FGURD algorithm to identify different influence roles, the efficiency of influence propagation is significantly improved. The observed discrepancy in experimental outcomes underscores the inadequacy of relying solely on global influence. The proposed model combining global and local information for network embedding achieved the best results on the six datasets, serving as a confirmation of the method's effectiveness on IM issues.
The paper sets the thresholds for opinion leaders and structural hole nodes as $ \gamma_1 $ and $ \gamma_2 $ (as described in Eq (3.12), respectively. The values of $ \gamma_1 $ and $ \gamma_2 $ are set to $ 0.2 $ and $ 0.1 $, respectively, in the parameter setting section. It should be noted that these values may vary depending on the network structure. Here, we assumed that 70% of the nodes in any social network are ordinary nodes. To explore the impact of varying these two parameters, this study included an experiment for which the result is presented in Table 6. The number of seed nodes was set to a fixed value for each dataset, with different divisions made based on the dataset size. For example, the Dolphins dataset was set to 20, while the Facebook_Caltech36 and Netscience datasets were set to 50. The remaining three datasets were set to 200. We varied the value of $ \gamma_1 $ from 0.02 to 0.2 with a step size of 0.02, corresponding to $ \gamma_2 = 0.3 - \gamma_1 $. The optimal results are displayed in bold font in Table 6, and the corresponding x-axis value was selected as the value of $ \gamma_1 $ for the dataset, with $ 0.3 - \gamma_1 $ being the value of $ \gamma_2 $. The selected parameter values are shown in Table 5.
6.
Conclusion and future work
The present paper introduces a novel network embedding algorithm, named RbNE, for social networks; it incorporates users' global roles into the embedding process. The proposed RbNE approach merges the CGURD and FGURD methods, aiming at gathering both the overall contribution of the user to the network and the relationships between the user and its neighboring nodes. This results in a more comprehensive representation of the user's global role and approximate user information. Building on this embedding method, we propose a greedy heuristic algorithm, RbneIM, to solve the IM problem by fully integrating the global role information and filtering out the seed set.
Previous studies have encountered challenges in effectively integrating both local and global information concerning users in social networks. A notable limitation has been the neglect of the IM problems' sensitivity to the global roles of users. Additionally, there is a lack of comprehensive understanding regarding the potential contribution of global roles in identifying seed sets that exhibit substantial influence. To address these gaps, this paper proposes the RbNEIM approach, which considers users' global role as a crucial criterion in selecting seed sets and identifying users with the potential to maximize their impact on social networks. We evaluated RbNEIM on six popular social network datasets and compared its performance with state-of-the-art methods and recent baselines. The results demonstrate that our proposed method outperforms existing techniques, highlighting its superior performance in terms of solving IM problems. In future work, we will explore the optimization potential of graph neural networks and attention mechanisms to further enhance the performance of RbNEIM.
Experimental analysis reveals the following: (1) RbNEIM can combine global and local information for network embedding, and it can comprehensively maintain the approximation between the user's global roles; (2) By integrating heuristic calculation and role embedding methods, RbNEIM can significantly improve the performance of the IM problem by considering the user's global role as an essential factor in selecting the seed set and identifying users who can spread the maximum influence in the social network; (3) The proposed method is robust to hyperparameter tuning. The insights gained from this study have the potential to advance the development of future social networks and IM problems.
Use of AI tools declaration
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgments
This work is supported by the Science and Technology Program of Sichuan Province (Grant nos. 2023YFS0424, 2022YFG0378) and the National Natural Science Foundation (Grant nos. 61902324, 11426179, and 61872298).
Conflict of interest
The authors declare that there is no conflict of interest.