Embeddings in normed spaces are a widely used tool in automatic linguistic analysis, as they help model semantic structures. They map words, phrases, or even entire sentences into vectors within a high-dimensional space, where the geometric proximity of vectors corresponds to the semantic similarity between the corresponding terms. This allows systems to perform various tasks like word analogy, similarity comparison, and clustering. However, the proximity of two points in such embeddings merely reflects metric similarity, which could fail to capture specific features relevant to a particular comparison, such as the price when comparing two cars or the size of different dog breeds. These specific features are typically modeled as linear functionals acting on the vectors of the normed space representing the terms, sometimes referred to as semantic projections. These functionals project the high-dimensional vectors onto lower-dimensional spaces that highlight particular attributes, such as the price, age, or brand. However, this approach may not always be ideal, as the assumption of linearity imposes a significant constraint. Many real-world relationships are nonlinear, and imposing linearity could overlook important non-linear interactions between features. This limitation has motivated research into non-linear embeddings and alternative models that can better capture the complex and multifaceted nature of semantic relationships, offering a more flexible and accurate representation of meaning in natural language processing.
Citation: Pedro Fernández de Córdoba, Carlos A. Reyes Pérez, Enrique A. Sánchez Pérez. Mathematical features of semantic projections and word embeddings for automatic linguistic analysis[J]. AIMS Mathematics, 2025, 10(2): 3961-3982. doi: 10.3934/math.2025185
Embeddings in normed spaces are a widely used tool in automatic linguistic analysis, as they help model semantic structures. They map words, phrases, or even entire sentences into vectors within a high-dimensional space, where the geometric proximity of vectors corresponds to the semantic similarity between the corresponding terms. This allows systems to perform various tasks like word analogy, similarity comparison, and clustering. However, the proximity of two points in such embeddings merely reflects metric similarity, which could fail to capture specific features relevant to a particular comparison, such as the price when comparing two cars or the size of different dog breeds. These specific features are typically modeled as linear functionals acting on the vectors of the normed space representing the terms, sometimes referred to as semantic projections. These functionals project the high-dimensional vectors onto lower-dimensional spaces that highlight particular attributes, such as the price, age, or brand. However, this approach may not always be ideal, as the assumption of linearity imposes a significant constraint. Many real-world relationships are nonlinear, and imposing linearity could overlook important non-linear interactions between features. This limitation has motivated research into non-linear embeddings and alternative models that can better capture the complex and multifaceted nature of semantic relationships, offering a more flexible and accurate representation of meaning in natural language processing.
[1] | C. D. Aliprantis, K. C. Border, Infinite Dimensional Analysis, 3 Eds., Germany: Springer, 2006. |
[2] |
R. F. Arens, J. Eels Jr., On embedding uniform and topological spaces, Pacific J. Math., 6 (1956), 397–403. https://doi.org/10.2140/pjm.1956.6.397 doi: 10.2140/pjm.1956.6.397
![]() |
[3] |
R. Arnau, J. M. Calabuig, E. A. Sánchez Pérez, Representation of Lipschitz Maps and Metric Coordinate Systems, Mathematics, 10 (2022), 3867. https://doi.org/10.3390/math10203867 doi: 10.3390/math10203867
![]() |
[4] | M. Baroni, R. Zamparelli, Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space, Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010, 1183–1193. |
[5] |
G. Boleda, Distributional semantics and linguistic theory, Ann. Rev. Linguist., 6 (2020), 213–234. https://doi.org/10.1146/annurev-linguistics-011619-030303 doi: 10.1146/annurev-linguistics-011619-030303
![]() |
[6] | S. Clark, Vector space models of lexical meaning, In: The Handbook of Contemporary Semantics, Malden: Blackwell, 2015,493–522. |
[7] | Ş. Cobzaş, R. Miculescu, A. Nicolae, Lipschitz functions, Berlin: Springer, 2019. |
[8] |
J. Dai, Y. Zhang, H. Lu, H. Wang, Cross-view semantic projection learning for person re-identification, Pattern Recognit., 75 (2018), 63–76. http://dx.doi.org/10.1016/j.patcog.2017.04.022 doi: 10.1016/j.patcog.2017.04.022
![]() |
[9] |
K. Erk, Vector space models of word meaning and phrase meaning: A survey, Lang. Linguist. Compass, 6 (2012), 635–653. http://dx.doi.org/10.1002/lnco.362 doi: 10.1002/lnco.362
![]() |
[10] |
G. Grand, I. A. Blank, F. Pereira, E. Fedorenko, Semantic projection recovers rich human knowledge of multiple object features from word embeddings, Nat. Hum. Behav., 6 (2022), 975–987. https://doi.org/10.1038/s41562-022-01316-8 doi: 10.1038/s41562-022-01316-8
![]() |
[11] | N. J. Kalton, Spaces of Lipschitz and Hölder functions and their applications, Collect. Math., 55 (2004), 171–217. |
[12] | J. L. Kelley, General Topology, Graduate Texts in Mathematics, New York: Springer, 1975. |
[13] |
A. Lenci, Distributional models of word meaning, Ann. Rev. Linguist., 4 (2018), 151–171. http://dx.doi.org/10.1146/annurev-linguistics-030514-125254 doi: 10.1146/annurev-linguistics-030514-125254
![]() |
[14] | O. Levy, Y. Goldberg, Neural word embedding as implicit matrix factorization, Adv. Neural Inf. Proc. Syst., 2014, 2177–2185. |
[15] |
H. Lu, Y. N. Wu, K. J. Holyoak, Emergence of analogy from relation learning, Proc. Natl. Acad. Sci. U. S. A., 116 (2019), 4176–4181. http://dx.doi.org/10.1073/pnas.1814779116 doi: 10.1073/pnas.1814779116
![]() |
[16] | T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality, Adv. Neural Inf. Proc. Syst., 2013, 3111–3119. |
[17] | J. Pennington, R. Socher, C. Manning, Glove: Global vectors for word representation, 2014. Available from: https://nlp.stanford.edu/projects/glove/. |
[18] | N. Weaver, Lipschitz Algebras, Singapore: World Scientific Publishing Co., 1999. |
[19] | Y. Xian, S. Choudhury, Y. He, B. Schiele, Z. Akata, Semantic projection network for zero-and few-label semantic segmentation, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, 8256–8265. http://dx.doi.org/10.1109/CVPR.2019.00845 |
[20] | L. A. Zadeh, A Fuzzy-Set-Theoretic Interpretation of Linguistic Hedges, J. Cybern., 2 (1972), 34–34. |