The typical aim of user matching is to detect the same individuals cross different social networks. The existing efforts in this field usually focus on the users' attributes and network embedding, but these methods often ignore the closeness between the users and their friends. To this end, we present a friend closeness based user matching algorithm (FCUM). It is a semi-supervised and end-to-end cross networks user matching algorithm. Attention mechanism is used to quantify the closeness between users and their friends. We considers both individual similarity and their close friends similarity by jointly optimize them in a single objective function. Quantification of close friends improves the generalization ability of the FCUM. Due to the expensive costs of labeling new match users for training FCUM, we also design a bi-directional matching strategy. Experiments on real datasets illustrate that FCUM outperforms other state-of-the-art methods that only consider the individual similarity.
Citation: Tinghuai Ma, Lei Guo, Xin Wang, Yurong Qian, Yuan Tian, Najla Al-Nabhan. Friend closeness based user matching cross social networks[J]. Mathematical Biosciences and Engineering, 2021, 18(4): 4264-4292. doi: 10.3934/mbe.2021214
The typical aim of user matching is to detect the same individuals cross different social networks. The existing efforts in this field usually focus on the users' attributes and network embedding, but these methods often ignore the closeness between the users and their friends. To this end, we present a friend closeness based user matching algorithm (FCUM). It is a semi-supervised and end-to-end cross networks user matching algorithm. Attention mechanism is used to quantify the closeness between users and their friends. We considers both individual similarity and their close friends similarity by jointly optimize them in a single objective function. Quantification of close friends improves the generalization ability of the FCUM. Due to the expensive costs of labeling new match users for training FCUM, we also design a bi-directional matching strategy. Experiments on real datasets illustrate that FCUM outperforms other state-of-the-art methods that only consider the individual similarity.
[1] | C. T. Lu, S. Xie, W. Shao, L. He, S. Y. Philip, Item recommendation for emerging online businesses, in Ijcai, (2016), 3797–3803. |
[2] | W. Zhou, W. Han, Personalized recommendation via user preference matching, Inf. Process. Manage., 56 (2019), 955–968. doi: 10.1016/j.ipm.2019.02.002 |
[3] | A. Guille, H. Hacid, C. Favre, D. A. Zighed, Information diffusion in online social networks: A survey, ACM Sigmod Rec., 42 (2013), 17–28. doi: 10.1145/2503792.2503797 |
[4] | I. Nurgaliev, Q. Qu, S. M. H. Bamakan, M. Muzammal, Matching user identities across social networks with limited profile data, Front. Comput. Sci., 14 (2020), 146809. doi: 10.1007/s11704-019-8235-9 |
[5] | J. Qian, X. Y. Li, C. Zhang, L. Chen, De-anonymizing social networks and inferring private attributes using knowledge graphs, in IEEE Infocom-the IEEE International Conference on Computer Communications, IEEE, (2016). |
[6] | Z. Yin, T. Xu, H. Zhu, C. Zhu, E. Chen, H. Xiong, Matching of social events and users: a two-way selection perspective, World Wide Web, 23 (2020), 853–871. doi: 10.1007/s11280-019-00724-7 |
[7] | R. Zafarani, H. Liu, Connecting corresponding identities across communities, in Proceedings of the International AAAI Conference on Web and Social Media, (2009), 354–357. |
[8] | Y. Li, Y. Peng, Z. Zhang, H. Yin, Q. Xu, Matching user accounts across social networks based on username and display name, World Wide Web, 22 (2019), 1075–1097. doi: 10.1007/s11280-018-0571-4 |
[9] | D. Perito, C. Castelluccia, M. A. Kaafar, P. Manils, How unique and traceable are usernames?, in International Symposium on Privacy Enhancing Technologies Symposium, Springer, (2011), 1–17. |
[10] | S. Liu, S. Wang, F. Zhu, J. Zhang, R. Krishnan, Hydra: Large-scale social identity linkage via heterogeneous behavior modeling, in Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, (2014), 51–62. |
[11] | R. Zafarani, H. Liu, Connecting users across social media sites: a behavioral-modeling approach, in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2013), 41–49. |
[12] | O. Goga, H. Lei, S. H. K. Parthasarathi, G. Friedland, R. Sommer, R. Teixeira, Exploiting innocuous activity for correlating users across sites, in Proceedings of the 22nd International Conference on World Wide Web, (2013), 447–458. |
[13] | F. Zhou, L. Liu, K. Zhang, G. Trajcevski, J. Wu, T. Zhong, Deeplink: A deep learning approach for user identity linkage, in IEEE INFOCOM 2018-IEEE Conference on Computer Communications, IEEE, (2018), 1313–1321. |
[14] | J. Zhang, S. Y. Philip, Integrated anchor and social link predictions across social networks, in Twenty-fourth international joint conference on artificial intelligence, (2015). |
[15] | S. Tan, Z. Guan, D. Cai, X. Qin, J. Bu, C. Chen, Mapping users across networks by manifold alignment on hypergraph, in Proceedings of the AAAI Conference on Artificial Intelligence, Citeseer, (2014). |
[16] | X. Kong, J. Zhang, P. S. Yu, Inferring anchor links across multiple heterogeneous social networks, in Proceedings of the 22nd ACM international conference on Information & Knowledge Management, (2013), 179–188. |
[17] | Y. Zhang, J. Tang, Z. Yang, J. Pei, P. S. Yu, Cosnet: Connecting heterogeneous social networks with local and global consistency, in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2015), 1485–1494. |
[18] | L. Liu, W. K. Cheung, X. Li, L. Liao, Aligning users across social networks using network embedding, in Ijcai, (2016), 1774–1780. |
[19] | W. Zhao, S. Tan, Z. Guan, B. Zhang, M. Gong, Z. Cao, et al., Learning to map social network users by unified manifold alignment on hypergraph, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 5834–5846. doi: 10.1109/TNNLS.2018.2812888 |
[20] | T. Man, H. Shen, S. Liu, X. Jin, X. Cheng, Predict anchor links across social networks via an embedding approach, in Ijcai, (2016), 1823–1829. |
[21] | L. Liu, Y. Zhang, S. Fu, F. Zhong, J. Hu, P. Zhang, Abne: an attention-based network embedding for user alignment across social networks, IEEE Access, 7 (2019), 23595–23605. doi: 10.1109/ACCESS.2019.2900095 |
[22] | A. Belesiotis, D. Skoutas, C. Efstathiades, V. Kaffes, D. Pfoser, Spatio-textual user matching and clustering based on set similarity joins, VLDB J., 27 (2018), 297–320. doi: 10.1007/s00778-018-0498-5 |
[23] | C. Riederer, Y. Kim, A. Chaintreau, N. Korula, S. Lattanzi, Linking users across domains with location data: Theory and validation, in Proceedings of the 25th International Conference on World Wide Web, (2016), 707–719. |
[24] | A. Narayanan, H. Paskov, N. Z. Gong, J. Bethencourt, E. Stefanov, E. C. R. Shin, et al., On the feasibility of internet-scale author identification, in 2012 IEEE Symposium on Security and Privacy, IEEE, (2012), 300–314. |
[25] | A. Malhotra, L. Totti, W. Meira, P. Kumaraguru, V. Almeida, Studying user footprints in different online social networks, in 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, IEEE, (2012), 1065–1070. |
[26] | F. Carmagnola, F. Osborne, I. Torre, User data discovery and aggregation: The cs-udd algorithm, Inf. Sci., 270 (2014), 41–72. doi: 10.1016/j.ins.2014.02.111 |
[27] | H. Zhang, M.-Y. Kan, Y. Liu, S. Ma, Online social network profile linkage, in Asia Information Retrieval Symposium, Springer, (2014), 197–208. |
[28] | A. Narayanan, V. Shmatikov, De-anonymizing social networks, in 2009 30th IEEE symposium on security and privacy, IEEE, (2009), 173–187. |
[29] | T. Iofciu, P. Fankhauser, F. Abel, K. Bischoff, Identifying users across social tagging systems, in Proceedings of the International AAAI Conference on Web and Social Media, (2011). |
[30] | O. Peled, M. Fire, L. Rokach, Y. Elovici, Matching entities across online social networks, Neurocomputing, 210 (2016), 91–106. doi: 10.1016/j.neucom.2016.03.089 |
[31] | Y. Li, Z. Zhang, Y. Peng, H. Yin, Q. Xu, Matching user accounts based on user generated content across social networks, Future Gener. Comput. Syst., 83 (2018), 104–115. doi: 10.1016/j.future.2018.01.041 |
[32] | C. Li, S. Wang, P. S. Yu, L. Zheng, X. Zhang, Z. Li, et al., Distribution distance minimization for unsupervised user identity linkage, in Proceedings of the 27th ACM International Conference on Information and Knowledge Management, (2018), 447–456. |
[33] | M. Jiang, P. Cui, N. J. Yuan, X. Xie, and S. Yang, Little is much: Bridging cross-platform behaviors through overlapped crowds, in Proceedings of the AAAI Conference on Artificial Intelligence, (2016), 13–19. |
[34] | X. Luo, F. Zhou, M. Liu, Y. Liu, C. Xiao, Efficient multi-account detection on ugc sites, in 2016 IEEE Symposium on Computers and Communication (ISCC), IEEE, (2016), 450–455. |
[35] | S. Ji, W. Li, M. Srivatsa, J. S. He, R. Beyah, Structure based data de-anonymization of social networks and mobility traces, in International Conference on Information Security, Springer, (2014), 237–254. |
[36] | N. Korula, S. Lattanzi, An efficient reconciliation algorithm for social networks, preprint, arXiv: 1307.1690. |
[37] | X. Zhou, X. Liang, H. Zhang, and Y. Ma, Cross-platform identification of anonymous identical users in multiple social media networks, IEEE Trans. Knowl. Data Eng., 28 (2016), 411–424. doi: 10.1109/TKDE.2015.2485222 |
[38] | X. Zhou, X. Liang, X. Du, J. Zhao, Structure based user identification across social networks, IEEE Trans. Knowl. Data Eng., 30 (2018), 1178–1191. doi: 10.1109/TKDE.2017.2784430 |
[39] | T. Derr, Y. Ma, and J. Tang, Signed graph convolutional networks, in 2018 IEEE International Conference on Data Mining (ICDM), IEEE, (2018), 929–934. |
[40] | M. Heimann, H. Shen, T. Safavi, D. Koutra, Regal: Representation learning-based graph alignment, in Proceedings of the 27th ACM international conference on information and knowledge management, (2018), 117–126. |
[41] | Z. Sun, W. Hu, Q. Zhang, Y. Qu, Bootstrapping entity alignment with knowledge graph embedding, in Ijcai, (2018), 4396–4402. |
[42] | S. Bartunov, A. Korshunov, S.-T. Park, W. Ryu, H. Lee, Joint link-attribute user identity resolution in online social networks, in Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, Workshop on Social Network Mining and Analysis. ACM, (2012). |
[43] | T. Ma, R. Al-Sabri, L. Zhang, B. Marah, N. Al-Nabhan, The impact of weighting schemes and stemming process on topic modeling of arabic long and short texts, in ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), (2020), 1–23. |
[44] | H. Rong, T. Ma, J. Cao, Y. Tian, A. Al-Dhelaan, M. Al-Rodhaan, Deep rolling: A novel emotion prediction model for a multi-participant communication context, Inf. Sci., 488 (2019), 158–180. doi: 10.1016/j.ins.2019.03.023 |
[45] | T. Ma, H. Yang, Q. Tian, Y. Tian, N. Al-Nabhan, A hybrid chinese conversation model based on retrieval and generation, Future Gener. Comput. Syst., 114 (2021), 481–490. doi: 10.1016/j.future.2020.08.030 |
[46] | T. Ma, W. Shao, Y. Hao, J. Cao, Graph classification based on graph set reconstruction and graph kernel feature reduction, Neurocomputing, 296 (2018), 33–45. doi: 10.1016/j.neucom.2018.03.029 |
[47] | T. Ma, Y. Zhao, H. Zhou, Y. Tian, A. Al-Dhelaan, M. Al-Rodhaan, Natural disaster topic extraction in sina microblogging based on graph analysis, Expert Syst. Appl., 115 (2019), 346–355. doi: 10.1016/j.eswa.2018.08.010 |
[48] | T. Ma, Q. Liu, J. Cao, Y. Tian, A. Al-Dhelaan, M. Al-Rodhaan, Lgiem: Global and local node influence based community detection, Future Gener. Comput. Syst., 105 (2020), 533–546. doi: 10.1016/j.future.2019.12.022 |
[49] | J. Zhang, P. S. Yu, Pct: partial co-alignment of social networks, in Proceedings of the 25th International Conference on World Wide Web, (2016), 749–759. |
[50] | Y. Nie, Y. Jia, S. Li, X. Zhu, A. Li, and B. Zhou, "Identifying users across social networks based on dynamic core interests, " Neurocomputing, vol. 210, pp. 107–115, 2016. |
[51] | Z. Zhong, Y. Cao, M. Guo, Z. Nie, Colink: An unsupervised framework for user identity linkage, in Proceedings of the AAAI Conference on Artificial Intelligence, (2018), 5714–5721. |
[52] | I. Jolliffe, Principal component analysis, Technometrics, 45 (2003), 276. |
[53] | S. T. Roweis, L. K. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science, 290 (2000), 2323–2326. doi: 10.1126/science.290.5500.2323 |
[54] | L. K. Saul, S. T. Roweis, An introduction to locally linear embedding, unpublished. Available at: http://www.cs.toronto.edu/roweis/lle/publications.html. |
[55] | M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in Nips, (2001), 585–591. |
[56] | L. Tang, H. Liu, Leveraging social media networks for classification, Data Min. Knowl. Dis., 23 (2011), 447–478. doi: 10.1007/s10618-010-0210-x |
[57] | M. Chen, Q. Yang, X. Tang, Directed graph embedding, in Ijcai, (2007), 2707–2712. |
[58] | T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, preprint, arXiv: 1310.4546. |
[59] | T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space, preprint, arXiv: 1301.3781. |
[60] | T. Mikolov, M. Karafiát, L. Burget, J. Černockỳ, S. Khudanpur, Recurrent neural network based language model, in Eleventh annual conference of the international speech communication association, (2010). |
[61] | B. Perozzi, R. Al-Rfou, and S. Skiena, Deepwalk: Online learning of social representations, in Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, (2014), 701–710. |
[62] | A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks, in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, (2016), 855–864. |
[63] | W. Cheng, C. Greaves, and M. Warren, From n-gram to skipgram to concgram, Int. J. Corpus Linguist., 11 (2006), 411–433. doi: 10.1075/ijcl.11.4.04che |
[64] | H. Chen, B. Perozzi, Y. Hu, S. Skiena, Harp: Hierarchical representation learning for networks, in Proceedings of the AAAI Conference on Artificial Intelligence, (2018). |
[65] | J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, Q. Mei, Line: Large-scale information network embedding, in Proceedings of the 24th international conference on world wide web, (2015), 1067–1077. |
[66] | S. Cao, W. Lu, Q. Xu, Grarep: Learning graph representations with global structural information, in Proceedings of the 24th ACM international on conference on information and knowledge management, (2015), 891–900. |
[67] | Z. Zhang, P. Cui, X. Wang, J. Pei, X. Yao, W. Zhu, Arbitrary-order proximity preserved network embedding, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2018), 2778–2786. |
[68] | M. Ou, P. Cui, J. Pei, Z. Zhang, W. Zhu, Asymmetric transitivity preserving graph embedding, in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, (2016), 1105–1114. |
[69] | J. Ma, P. Cui, X. Wang, W. Zhu, Hierarchical taxonomy aware network embedding, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2018), 1920–1929. |
[70] | D. Wang, P. Cui, W. Zhu, Structural deep network embedding, in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, (2016), 1225–1234. |
[71] | S. Cao, W. Lu, Q. Xu, Deep neural networks for learning graph representations, in Proceedings of the AAAI Conference on Artificial Intelligence, (2016). |
[72] | D. Zhu, P. Cui, D. Wang, W. Zhu, Deep variational network embedding in wasserstein space, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2018), 2827–2836. |
[73] | P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, Y. Bengio, Graph attention networks, preprint, arXiv: 1710.10903. |
[74] | W. L. Hamilton, R. Ying, J. Leskovec, Inductive representation learning on large graphs, preprint, arXiv: 1706.02216. |
[75] | T. N. Kipf, M. Welling, Variational graph auto-encoders, preprint, arXiv: 1611.07308. |
[76] | A. Narayanan, M. Chandramohan, R. Venkatesan, L. Chen, Y. Liu, S. Jaiswal, graph2vec: Learning distributed representations of graphs, preprint, arXiv: 1707.05005. |
[77] | P. Ristoski, H. Paulheim, Rdf2vec: Rdf graph embeddings for data mining, in International Semantic Web Conference, Springer, (2016), 498–514. |
[78] | H. Wang, J. Wang, J. Wang, M. Zhao, W. Zhang, F. Zhang, et al., Graphgan: Graph representation learning with generative adversarial nets, in Proceedings of the AAAI conference on artificial intelligence, (2018). |
[79] | M. Bayati, M. Gerritsen, D. F. Gleich, A. Saberi, Y. Wang, Algorithms for large, sparse network alignment problems, in 2009 Ninth IEEE International Conference on Data Mining, IEEE, (2009), 705–710. |
[80] | J. Wang, P. Huang, H. Zhao, Z. Zhang, B. Zhao, D. L. Lee, Billion-scale commodity embedding for e-commerce recommendation in alibaba, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, (2018), 839–848. |
[81] | A. Mnih, Y. W. Teh, A fast and simple algorithm for training neural probabilistic language models, preprint, arXiv: 1206.6426. |
[82] | L. Sang, M. Xu, S. Qian, X. Wu, Aaane: Attention-based adversarial autoencoder for multi-scale network embedding, in Pacific-Asia Conference on Knowledge Discovery and Data Mining, Springer, (2019), 3–14. |
[83] | A. Prado, M. Plantevit, C. Robardet, J.-F. Boulicaut, Mining graph topological patterns: Finding covariations among vertex descriptors, IEEE Trans. Knowl. Data Eng., 25 (2013), 2090–2104. doi: 10.1109/TKDE.2012.154 |