Loading [MathJax]/jax/element/mml/optable/BasicLatin.js
Research article Special Issues

Multi-scale Jones polynomial and persistent Jones polynomial for knot data analysis

  • Many structures in science, engineering, and art can be viewed as curves in 3-space. The entanglement of these curves plays a crucial role in determining the functionality and physical properties of materials. Many concepts in knot theory provide theoretical tools to explore the complexity and entanglement of curves in 3-space. However, classical knot theory focuses on global topological properties and lacks the consideration of local structural information, which is critical in practical applications. In this work, two localized models based on the Jones polynomial were proposed, namely, the multi-scale Jones polynomial and the persistent Jones polynomial. The stability of these models, especially the insensitivity of the multi-scale and persistent Jones polynomial models to small perturbations in curve collections, was analyzed, thus ensuring their robustness for real-world applications.

    Citation: Ruzhi Song, Fengling Li, Jie Wu, Fengchun Lei, Guo-Wei Wei. Multi-scale Jones polynomial and persistent Jones polynomial for knot data analysis[J]. AIMS Mathematics, 2025, 10(1): 1463-1487. doi: 10.3934/math.2025068

    Related Papers:

    [1] Li Shen, Jian Liu, Guo-Wei Wei . Evolutionary Khovanov homology. AIMS Mathematics, 2024, 9(9): 26139-26165. doi: 10.3934/math.20241277
    [2] Linlin Tan, Meiying Cui, Bianru Cheng . An approach to the global well-posedness of a coupled 3-dimensional Navier-Stokes-Darcy model with Beavers-Joseph-Saffman-Jones interface boundary condition. AIMS Mathematics, 2024, 9(3): 6993-7016. doi: 10.3934/math.2024341
    [3] Mei Li, Wanqiang Shen . Integral method from even to odd order for trigonometric B-spline basis. AIMS Mathematics, 2024, 9(12): 36470-36492. doi: 10.3934/math.20241729
    [4] Oussama Bouanani, Salim Bouzebda . Limit theorems for local polynomial estimation of regression for functional dependent data. AIMS Mathematics, 2024, 9(9): 23651-23691. doi: 10.3934/math.20241150
    [5] Rolly Czar Joseph Castillo, Renier Mendoza . On smoothing of data using Sobolev polynomials. AIMS Mathematics, 2022, 7(10): 19202-19220. doi: 10.3934/math.20221054
    [6] Yunan He, Jian Liu . Multi-scale Hochschild spectral analysis on graph data. AIMS Mathematics, 2025, 10(1): 1384-1406. doi: 10.3934/math.2025064
    [7] Wan Anis Farhah Wan Amir, Md Yushalify Misro, Mohd Hafiz Mohd . Flexible functional data smoothing and optimization using beta spline. AIMS Mathematics, 2024, 9(9): 23158-23181. doi: 10.3934/math.20241126
    [8] Zhe Su, Yiying Tong, Guo-Wei Wei . Persistent de Rham-Hodge Laplacians in Eulerian representation for manifold topological learning. AIMS Mathematics, 2024, 9(10): 27438-27470. doi: 10.3934/math.20241333
    [9] Raju Doley, Saifur Rahman, Gayatri Das . On knot separability of hypergraphs and its application towards infectious disease management. AIMS Mathematics, 2023, 8(4): 9982-10000. doi: 10.3934/math.2023505
    [10] Chao Wang, Fajie Wang, Yanpeng Gong . Analysis of 2D heat conduction in nonlinear functionally graded materials using a local semi-analytical meshless method. AIMS Mathematics, 2021, 6(11): 12599-12618. doi: 10.3934/math.2021726
  • Many structures in science, engineering, and art can be viewed as curves in 3-space. The entanglement of these curves plays a crucial role in determining the functionality and physical properties of materials. Many concepts in knot theory provide theoretical tools to explore the complexity and entanglement of curves in 3-space. However, classical knot theory focuses on global topological properties and lacks the consideration of local structural information, which is critical in practical applications. In this work, two localized models based on the Jones polynomial were proposed, namely, the multi-scale Jones polynomial and the persistent Jones polynomial. The stability of these models, especially the insensitivity of the multi-scale and persistent Jones polynomial models to small perturbations in curve collections, was analyzed, thus ensuring their robustness for real-world applications.



    Knot theory, a branch of mathematics that focuses on the study of mathematical knots, is primarily concerned with classifying and analyzing knots based on their essential properties under ambient isotopy [1]. This approach allows mathematicians to disregard the specific manner in which knots are embedded in 3-space, emphasizing instead the invariants that remain unchanged under continuous deformations. There are several knot invariants, such as the knot crossing number, the knot group [1], the Alexander polynomial [2], the Jones polynomial [3], the knot Floer homology [4], and the Khovanov homology [5].

    Knot theory has applications in many fields, including physics [6], chemistry [7], and biology [8,9,10]. In practical applications, however, two major challenges arise: Many structures do not form closed loops, and ambient isotopy can significantly alter local structures while preserving global knot characteristics. For example, open curves in the 3-space, such as polymers [11,12], textiles [13], chemical compounds [14], and biological molecules [15,16], often exhibit local entanglement that critically affects their physical properties and functions. Topological invariants, functions that remain unchanged under ambient isotopy, are essential for analysis of knots and links [17,18,19]. However, these invariants do not extend to open curves, since open curves can be continuously deformed without requiring cutting or rejoining, making topological equivalence inapplicable.

    In recent years, methods that incorporate classical concepts from knot theory and are more applicable to practical problems have been proposed. Compared to topological data analysis (TDA), the concept of knot data analysis (KDA) was formally introduced in [20]. Panagiotou and Plaxco [21] demonstrated the utility of the Gauss link integral in protein entanglement, particularly to understand protein folding kinetics and improve future folding models. Based on this, Baldwin and Panagiotou [22] introduced a new measure of local topological and geometrical free energy based on writhing and torsion of protein chains, highlighting its critical role in the rate-limiting steps of protein folding. In addition, Baldwin et al. [23] extended these topological concepts to the study of the Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) spike protein, showing how local geometric features such as writhe and torsion influence its stability and behavior. In a related effort, Shen et al. [20] introduced multi-scale Gauss link integral (mGLI), a novel method leveraging the Gauss link integral to quantify the entanglement and topological complexity of both open and closed curves at various scales. This versatile approach has broad applications in the analysis of curve structures in both physical and biological systems.

    The Jones polynomial [3], a fundamental invariant in knot theory, provides a polynomial measure of entanglement that distinguishes different types of knots by smoothing their crossings. Panagiotou and Kauffman [24] extended this concept to an open curve and proposed a continuous measure of entanglement that converges to the classical Jones polynomial as the end points of an open curve approach each other. Barkataki and Panagiotou [25] further refined this by introducing the Jones polynomial for collections of curves, averaged in all projection directions. Building on these topological frameworks, Panagiotou and Kauffman [26] also used Vassiliev invariants to quantify the complexity of open and closed curves in 3-space. Furthermore, Wang and Panagiotou [27] explored the correlations between protein folding rates and topological measures, specifically writhe, average crossing number (ACN), and the second Vassiliev invariant, to understand the behavior of native protein states. In addition, Herschberg, Pifer, and Panagiotou [28] developed a computational tool that quantifies topological complexity in systems such as polymers, proteins, and periodic structures.

    Using the Jones polynomial framework for the collections of disjoint open or closed curves proposed by Barkataki and Panagiotou in [25], this manuscript introduces two novel models: The multi-scale Jones polynomial, represented by a characteristic matrix, as described in Section 3.1, and the persistent Jones polynomial, represented by weighted persistent barcodes or weighted persistent diagrams, as described in Section 3.2. The weighted barcode was introduced by Cang and Wei in [29]. Both models have the ability to capture local and global entanglement properties in open or closed curve structures in the 3-space, thus, effectively representing their topological characteristics. These models provide an improved approach to solving problems in physical, biological, and chemical environments.

    The stability of the proposed models is a key feature that ensures their robust applicability to real-world scenarios. Stability in this context refers to the robustness of persistent Jones polynomial models, including the multi-scale process, against small perturbations in the data. Minor changes in the positions or configurations of the curve segments should result in correspondingly minor variations in the computed measures. This property is essential for reliable analysis in noisy environments or datasets subject to slight distortions, such as those often encountered in physical, biological, and chemical systems. The models provide a reliable framework for characterizing the topological and geometric properties of complex structures.

    The proposed models are applied to the prediction of B factors and the analysis of protein α-helix and β-sheet structures. The B factor, or Debye-Waller factor, is a critical metric in structural biology that represents the atomic displacement and flexibility within a protein structure and, thus, serves as an indicator of protein dynamics and stability. Traditional methods for predicting B factors have had limitations in capturing the topological information inherent in protein structures. To address this, we apply the multi-scale Jones polynomial model to the prediction of B factors, and achieve prediction accuracies of 0.899, 0.808, and 0.720 for small, medium, and large protein sets [30], respectively. Our results on these three datasets outperformed previous methods. In addition, the persistent Jones polynomial model is utilized to explore the structural properties of protein α-helix and β-sheet segments, with visual representations provided by barcodes that highlight the entanglement features across these secondary structures. The proposed multi-scale Jones polynomial and persistent Jones polynomial models have potential for curve data analysis (CDA).

    The article is organized as follows: In Section 2, the fundamental construction of the Jones polynomial is introduced for the collections of curves in the 3-space. In Section 3, two new models of the Jones polynomial of curves in 3-space are established, namely, the multi-scale Jones polynomial (discussed in Section 3.1) and the persistent Jones polynomial (described in Section 3.2). In Section 4, the stability of these two local models is demonstrated. Section 5 presents applications of the new models, including the prediction of B factors and the exploration of α-helix and β-sheet structures. In Section 6, a discussion is presented on the selection of segmentation, localization, stability issues, and the comparison with classical persistent homology.

    The Jones polynomial [3] is an important invariant in classical knot theory, recognized for its ability to characterize the entanglement properties of knots and links. However, it is less applicable to practical scenarios, which often involve open curves in 3-space rather than closed curves. To address this limitation, Barkataki and Panagiotou [25] extended the concept to the collections of disjoint open or closed curves by defining a normalized version of the bracket polynomial, averaged over all projection directions. This adaptation increases its relevance to real-world applications. Moreover, the Jones polynomial for collections of disjoint open or closed curves converges to the classical Jones polynomial as the endpoints of these curves approach each other. When projected onto a 2-dimensional plane, a collection of curves in 3-space forms a linkoid, which can be generalized to multi-component knotoids describing open-ended knot diagrams. The theory of knotoids was first introduced by Turaev [31], with further developments on linkoids elaborated in [32,33,34,35].

    Before defining the bracket polynomial of linkoids, it is essential to introduce the concept of segment cycles associated with a state. Let L be a linkoid diagram consisting of multiple components. Let G={1,2,3,,2n} denote the set of all endpoints (heads and legs) of L. A component of a linkoid with n components is represented as l2j1,2j, where j{1,2,,n}. The head-leg pairing forms a product of n disjoint 2-cycles, denoted by ˆL=(1,2)(3,4)(2n1,2n).

    Let S be a state corresponding to a choice of smoothing over all crossing points in L. This state induces a pairing represented by the product of n disjoint 2-cycles,

    ˆS=(s1,s2)(s3,s4)(s2n1,s2n),

    where each siG and each pair (s2j1,s2j) for j{1,2,,n} represent the endpoints of a component in the state S.

    For any endpoint aG, the set

    OrbS(a)={xGx=(ˆLˆS)m(a),mZ}

    is defined as the orbit under the composition function ˆLˆS. The segment cycle of an endpoint aG is given by

    Seg(a)=OrbS(a)OrbS(ˆL(a)).

    It is notable that for any point aG, ˆL(a) also belongs to the same segment cycle. Thus, a segment cycle always contains an even number of elements.

    Lemma 2.1. [25, Proposition 3.1] The number of segment cycles in a state S, denoted by |S|cyc, satisfies 1|S|cycn.

    Consider a state S of L with the associated pairing ˆS. Let Seg(a) be a segment cycle in S with |Seg(a)|=2k. This segment cycle can be represented by a circle marked with the 2k endpoints of L (see Figure 1). Let aG be the starting point of the circle. The remaining 2k-1 endpoints are uniquely sequenced in the circle as ˆS(a), ˆL(ˆS(a)), ˆS(ˆL(ˆS(a))), etc., up to ˆL(a). It is important to note that the arcs connecting adjacent points in this circular representation alternate between the functions ˆS and ˆL. The points connected by ˆS belong to the same component in state S, while the points connected by ˆL belong to the same component in L.

    Figure 1.  Representation of the segment cycle of aG.

    Example 1. Consider the linkoid diagram L as shown in Figure 2(a). The set G of all endpoints is {1,2,3,4}. There are two states of L, S1, and S2, as shown in Figures 2(b) and (c). The associated pairings ˆS1 and ˆS2 are represented by the permutations (1, 3) (2, 4) and (1, 2) (3, 4), respectively. The segment cycles of states S1 and S2 are shown in Figure 3. Then, |S1|cyc=1 and |S2|cyc=2.

    Figure 2.  Hopf linkoid and its states.
    Figure 3.  Segment cycles of two states.

    The bracket polynomial of linkoids in S2 or R2 is defined through an extension of the bracket polynomial of links. The following initial conditions and diagrammatic relations are sufficient for the skein computation of the bracket polynomial of linkoids.

    Definition 2.1. Let L be a linkoid diagram with n components. The bracket polynomial of the linkoid is uniquely determined by the following skein relation and initial conditions:

    where |cyc| denotes the number of distinct segment cycles.

    The bracket polynomial of L can be expressed as the following state sum expression:

    L=SAσ(S)d|S|circ1d|S|cyc,

    where S is a state corresponding to a choice of smoothing over all crossing points in L; σ(S) is the algebraic sum of the smoothing labels of S; |S|circ is the number of disjoint circles in S, |S|cyc is the number of distinct segment cycles in S; and d=(A2A2).

    The normalized bracket polynomial is defined as follows:

    fL=(A3)Wr(L)L,

    where Wr(L) is the writhe of the linkoid diagram L.

    Now, consider curves in 3-space. A regular projection of curves fixed in 3-space can result in different linkoid diagrams depending on the projection direction chosen. Barkataki and Panagiotou in [25] define the bracket polynomial of curves in 3-space as the average of the bracket polynomial of a projection of the curve over all possible projection directions. This definition is made precise as follows:

    Definition 2.2. [25, Definition 4.1.] Let L be a collection of disjoint open or closed curves in the 3-space. Let (L)ξ denote the projection of L on a plane with normal vector ξ. The normalized bracket polynomial of L is defined as follows:

    fL=14πξS2(A3)Wr((L)ξ)(L)ξdS,

    where each (L)ξ is a linkoid diagram, and its bracket polynomial can be calculated using Definition 2.1. Note that the integral is taken over all vectors ξS2, excluding a set of measure zeros (corresponding to the irregular projections). This gives the Jones polynomial of a collection of disjoint open or closed curves in the 3-space with substitution A=t14.

    Proposition 2.1. [25, Proposition 4.1.]

    ● For open curves, the Jones polynomial has real coefficients and is a continuous function of the curve coordinates.

    ● As the endpoints of the open curves tend to coincide in 3-space, the Jones polynomial tends to that of the corresponding link.

    The Jones polynomial of a collection of disjoint open or closed curves in 3-space describes the entanglement of the curves within the entire collection. However, in many applications, it is desirable to extract the local structural information of the curves. Two methods for localizing the Jones polynomial are proposed to capture the entanglement of a collection of curves and to meet the needs of practical applications.

    Let L be a collection of disjoint open or closed curves in the 3-space. Given a segmentation Pn={l1,l2,,ln} of L, where each li represents a finite curve segment of L, the segments li, 1in can be connected sequentially to reconstruct L.

    To investigate the entanglement properties between each curve segment and its neighboring segments, multi-scale analysis is a suitable approach. Multi-scale analysis requires the definition of a distance metric between the curve segments. The distance between segments, denoted d(li,lj), can be specified by different metrics, depending on the application context. In this study, for simplicity, we define the distance d(li,lj) as the upper bound of the Eulerian distances between a point of one curve segment and another curve segment.

    For any curve segment li, consider the set of segments within Pn whose distances from li fall within the range [r,R), where rd(li,lj)<R. This set, which includes li itself, is denoted as

    Pir,R={ljPn|rd(li,lj)<R}{li}.

    The Jones polynomial of the set of curve segments Pir,R, denoted by JPir,R, quantifies the entanglement of the curve segment li with other segments as r and R vary. By selecting two sets of distance parameters, {r1,r2,,rm} and {R1,R2,,Rm}, with ri<Ri for 1im, we obtain a set of characteristic polynomials for li:

    {JPir1,R1,JPir2,R2,,JPirm,Rm}.

    Applying this procedure to all curve segments in Pn, we obtain an n×m matrix:

    (JP1r1,R1JP1r2,R2JP1rm,RmJP2r1,R1JP2r2,R2JP2rm,RmJPnr1,R1JPnr2,R2JPnrm,Rm),

    capturing both local and global entanglement properties for the collection of disjoint open or closed curves L.

    Each matrix entry is a polynomial. For practical applications, the Jones polynomial can be evaluated at a specific parametrization, such as t=10, resulting in an n×m characteristic matrix for the segmentation Pn of L:

    mJ(Pn)=(JP1r1,R1(10)JP1r2,R2(10)JP1rm,Rm(10)JP2r1,R1(10)JP2r2,R2(10)JP2rm,Rm(10)JPnr1,R1(10)JPnr2,R2(10)JPnrm,Rm(10)),

    where each entry is a real number.

    Remark 3.1. For specific application contexts involving research objects that can be represented as a collection of disjoint open or closed curves L, the choice of segmentation Pn={l1,l2,,ln} plays a critical role. The choice of an appropriate segmentation, tailored to the requirements of the application, allows for a more precise capture of the entanglement properties inherent to the research objects, and, thus, a more accurate reflection of their characteristics. Similarly, the choice of parameters {r1,r2,,rm} and {R1,R2,,Rm} influences the robustness and precision of the final assessment of the entanglement features of the objects.

    To address different application scenarios and to better capture the information about entanglement of curves in the 3-space, a second localization of the Jones polynomial has been proposed. This adaptation of the Jones polynomial for a collection of disjoint open or closed curves, aimed at quantifying the complexity of entanglement, serves as an extension of the classical Jones polynomial [25].

    Let L be a collection of disjoint open or closed curves with a segmentation denoted by Pn={l1,l2,,ln}, where li is a segment of L. The segments li, 1in, can be connected sequentially to reconstruct L. To effectively represent these multiple segments, both the Čech complex and the Vietoris-Rips complex, constructed here from the distance matrix,

    d(Pn)=(0d(l1,l2)d(l1,ln)d(l2,l1)0d(l2,ln)d(ln,l1)d(ln,l2)0)

    from Pn={l1,l2,,ln}, are suitable methods. Given the similarity between the Čech complex and the Vietoris-Rips complex, we will focus on the Vietoris-Rips complex in the following discussion. Let r denote the variable parameter of the Vietoris-Rips complex.

    Definition 3.1. A critical value of the Vietoris-Rips complex is a real number r such that, for any sufficiently small ε>0, the map KrεKr+ε is an inclusion but not an isomorphism, where Kr denotes the complex at the Vietoris-Rips parameter r.

    Remark 3.2. For a segmentation Pn of L into finite curve segments, the Vietoris-Rips complex has a finite number of critical values.

    Let r0<r1<r2<<rm represent the critical values of the Vietoris-Rips complex for a segmentation Pn of L. This generates a sequence of complexes from Pn that form a filtration F(Pn):

    Kr0Kr1Kr2Krm,

    where the final complex Krm is an (n1)-simplex. For any x<y, let Vyx:KxKy denote the inclusion map.

    Lemma 3.1. [36, Critical Value Lemma] If a closed interval [x,y] does not contain a critical value of the Vietoris-Rips complex, then Vyx:KxKy is an isomorphism.

    Within the filtration F(Pn), consider a complex Kr. Each vertex vaKr corresponds to a segment la of the curve segments in Pn. An edge {va,vb}Kr indicates that the distance between segments la and lb is less than the Vietoris-Rips parameter r. A simplex Δ={va,vb,,vt}Kr means that the pair-wise distances among the corresponding segments {la,lb,,lt}, collectively denoted by Δ(Pn), are all less than r.

    Definition 3.2. Let K be a simplicial complex. The maximal faces of K with respect to inclusion are named the facets of K. The simplicial complex K, characterized by the facets F1,,Fq, is denoted by

    K=F1,,Fq,

    and the set {F1,,Fq} is called the set of facets of K.

    Each complex within the filtration F(Pn) can be described by its facets. Thus, the filtration F(Pn) can be expressed as:

    Fr01,,Fr0q0Fr11,,Fr1q1Frm1,,Frmqm.

    Definition 3.3. The birth of a facet F in the filtration F(Pn) is the smallest index rb such that F appears as a facet in the complex Krb but not in Krbε for any sufficiently small ε>0.

    The death of a facet F in the filtration F(Pn) is the largest index rd such that F is a facet in Krd but not in Krd+ε for any sufficiently small ε>0.

    The life-span of a facet F in the filtration F(Pn) is the interval [rb,rd].

    Therefore, F(Pn) can be represented as a sequence of facets, each associated with a birth-and-death interval,

    {(F1,[r1b,r1d]),(F2,[r2b,r2d]),,(Ft,[rtb,rtd])}.

    Similar to persistent barcodes in persistent homology, a barcode can represent the facets within a filtration. For any given dimension, each bar corresponds to a facet, with the start and end points of the bar indicating the birth and death of the associated facet, respectively.

    Definition 3.4. The barcode B(Pn) for the filtration of facets F(Pn) consists of horizontal line segments [ri,rj], where rirj, representing the birth and death times of the associated facet.

    A barcode provides a visual representation of a filtration as a collection of horizontal line segments on a plane, where the horizontal axis corresponds to the parameter, and the vertical axis represents an ordering of the facets.

    Definition 3.5. For each facet F={va,vb,,vt} in the filtration F(Pn), there exists a corresponding subset of segments F(Pn)={la,lb,,lt} in the segmentation Pn. The Jones polynomial of F(Pn), denoted by JF(Pn), is defined as the weight of the facet F. Consequently, the Jones polynomial can be treated as a weighting function for filtration F(Pn). The resulting weighted filtration is denoted by JF(Pn),

    {(F1,[r1b,r1d],JF1(Pn)),(F2,[r2b,r2d],JF2(Pn)),,(Ft,[rtb,rtd],JFt(Pn))}

    referred to as the persistent Jones polynomial of the segmentation Pn of L.

    Since filtration F(Pn) can be expressed by a barcode of facets, the persistent Jones polynomial of the segmentation Pn of L can also be expressed by a barcode, with the Jones polynomials of the associated facets as weights. The weights in the persistent Jones polynomial of Pn are polynomials. To improve applicability in specific scenarios, setting the Jones polynomial variable t=10 converts these weights to real numbers, producing a real-number weighted barcode BJ(Pn)(10).

    Remark 3.3. The persistent Jones polynomial and the classical persistent homology perform a multi-scale analysis of the data through filtration using Vietoris-Rips complexes or other types of complexes. However, they focus on different aspects of data analysis. The persistent Jones polynomial uses tools from geometric topology to capture the entanglement complexity of curves in 3-space, whereas classical persistent homology uses methods from algebraic topology to investigate the behavior of connected components, one-dimensional loops, two-dimensional cavities, and high-dimensional cavities (i.e., generators of homology groups) within a point cloud data.

    The stability of a model is characterized by the property that small perturbations in the collection of disjoint open or closed curves L result in only minor variations in the localized measures of the multi-scale Jones polynomial and the persistent Jones polynomial.

    Let f:Lf(L) be a continuous function acting on a collection of disjoint open or closed curves L in the 3-space. The difference between f(L) and L is measured using the supremum norm f(L)L, defined as:

    f(L)L=sup

    For a given segmentation P_{n} = \{l_{1}, l_{2}, \dots, l_{n}\} of L , where each l_{i} ( 1 \leqslant i \leqslant n ) is a finite curve segment and L can be reconstructed by connecting these segments end-to-end, consider a continuous mapping f: L \to f(L) such that \|f(L) - L\|_{\infty} < \varepsilon for a sufficiently small \varepsilon > 0 . This induces a corresponding segmentation f(P_{n}) = \{f(l_{1}), f(l_{2}), \dots, f(l_{n})\} of f(L) .

    The characteristic matrices of the multi-scale Jones polynomial for P_{n} and f(P_{n}) , denoted mJ(P_{n}) and mJ(f(P_{n})) , respectively, exhibit only minor differences at the corresponding positions. Furthermore, the weighted Bottleneck distance between the weighted persistence diagrams of persistent Jones polynomials for P_{n} and f(P_{n}) is also minimal.

    Remark 4.1. The Jones polynomial evaluated at t = 10 can be viewed as a function on collections of curves in 3-space. Let L be a collection of curves in \mathbb{R}^3 . According to Proposition 2.1, let f : L \to f(L) be a continuous function. If \|f(L) - L\|_{\infty} < \varepsilon for all sufficiently small \varepsilon > 0 , then |J(L)(10) - J(f(L))(10)| < \varepsilon_J for some sufficiently small \varepsilon_J > 0 .

    Let L be a collection of disjoint open or closed curves in the 3-space, and let P_{n} = \{l_{1}, l_{2}, \dots, l_{n}\} denote a segmentation of L into n segments. Consider a continuous function f: L \to f(L) , which induces a corresponding segmentation of f(L) , represented by f(P_{n}) = \{f(l_{1}), f(l_{2}), \dots, f(l_{n})\} .

    Proposition 4.1. Suppose f: L \to f(L) is a continuous function such that \|f(L) - L\|_{\infty} < \varepsilon for all sufficiently small \varepsilon > 0 . Then, the two sets of curve segments f(P_{r, R}^{i}) and f(P)_{r, R}^{i} are equal:

    \begin{equation*} f(P_{r,R}^{i}) = f(P)_{r,R}^{i}, \end{equation*}

    where:

    f(P_{r, R}^{i}) is the image of the set P_{r, R}^{i} under the function f: L \to f(L) ;

    f(P)_{r, R}^{i} represents the set of curve segments in the segmentation f(P_n) such that their distance from f(l_i) is within [r, R) , including of f(l_i) itself.

    Proof. By definition, we have

    \begin{equation*} P_{r,R}^{i} = \{l_j \in P_n \mid r \leq d(l_i, l_j) < R \} \cup \{l_i\}, \end{equation*}
    \begin{equation*} f(P_{r,R}^{i}) = \{f(l_j) \in P_n \mid r \leq d(l_i, l_j) < R \} \cup \{f(l_i)\}, \end{equation*}

    and

    \begin{equation*} f(P)_{r,R}^{i} = \{f(l_j) \in f(P_n) \mid r \leq d(f(l_i), f(l_j)) < R \} \cup \{f(l_i)\}, \end{equation*}

    for any l_k \in P_{r, R}^{i} , and it holds that r \leq d(l_i, l_k) < R . Since \|f(L) - L\|_{\infty} < \varepsilon , the difference between each segment of the curve and its image under f is less than \varepsilon : \| f(l_{i}) - l_{i} \|_{\infty} < \varepsilon and \| f(l_{k}) - l_{k} \|_{\infty} < \varepsilon , and, thus, the distances satisfy d(f(l_{i}), l_{i}) < \varepsilon and d(f(l_{k}), l_{k}) < \varepsilon .

    Then, we have:

    \begin{equation*} d(l_i, l_k) - 2\varepsilon < d(f(l_i), f(l_k)) < d(l_i, l_k) + 2\varepsilon. \end{equation*}

    Therefore,

    \begin{equation*} r - 2\varepsilon < d(f(l_i), f(l_k)) < R + 2\varepsilon, \end{equation*}

    which implies f(l_k) \in f(P)_{r - 2\varepsilon, R + 2\varepsilon}^{i} . Thus, f(P_{r, R}^{i}) \subseteq f(P)_{r - 2\varepsilon, R + 2\varepsilon}^{i} .

    Similarly, we can show that:

    \begin{equation*} f(P)_{r - 2\varepsilon, R + 2\varepsilon}^{i} \subseteq f(P_{r - 4\varepsilon, R + 4\varepsilon}^{i}). \end{equation*}

    Consequently, we obtain:

    \begin{equation*} f(P_{r,R}^{i}) \subseteq f(P)_{r - 2\varepsilon, R + 2\varepsilon}^{i} \subseteq f(P_{r - 4\varepsilon, R + 4\varepsilon}^{i}). \end{equation*}

    Since \varepsilon > 0 is sufficiently small, we conclude that P_{r - 4\varepsilon, R + 4\varepsilon}^{i} = P_{r, R}^{i} , f(P)_{r - 2\varepsilon, R + 2\varepsilon}^{i} = f(P)_{r, R}^{i} , and f(P_{r, R}^{i}) = f(P)_{r, R}^{i}.

    Theorem 4.1. Suppose f: L \to f(L) is a continuous function such that \|f(L) - L\|_{\infty} < \varepsilon for all sufficiently small \varepsilon > 0 . Consider two sets of distances \{r_1, r_2, \dots, r_m\} and \{R_1, R_2, \dots, R_m\} . There exist two characteristic matrices for the segmentation P_n of L and for the segmentation f(P_n) of f(L) , given by

    \begin{equation*} mJ(P_{n}) = \begin{pmatrix} JP_{r_1, R_1}^{1}(10) & JP_{r_2, R_2}^{1}(10) & \cdots & JP_{r_m, R_m}^{1}(10) \\ JP_{r_1, R_1}^{2}(10) & JP_{r_2, R_2}^{2}(10) & \cdots & JP_{r_m, R_m}^{2}(10) \\ \vdots & \vdots & \ddots & \vdots \\ JP_{r_1, R_1}^{n}(10) & JP_{r_2, R_2}^{n}(10) & \cdots & JP_{r_m, R_m}^{n}(10) \\ \end{pmatrix}, \end{equation*}
    \begin{equation*} mJ(f(P_{n})) = \begin{pmatrix} Jf(P)_{r_1, R_1}^{1}(10) & Jf(P)_{r_2, R_2}^{1}(10) & \cdots & Jf(P)_{r_m, R_m}^{1}(10) \\ Jf(P)_{r_1, R_1}^{2}(10) & Jf(P)_{r_2, R_2}^{2}(10) & \cdots & Jf(P)_{r_m, R_m}^{2}(10) \\ \vdots & \vdots & \ddots & \vdots \\ Jf(P)_{r_1, R_1}^{n}(10) & Jf(P)_{r_2, R_2}^{n}(10) & \cdots & Jf(P)_{r_m, R_m}^{n}(10) \\ \end{pmatrix}. \end{equation*}

    Then, the difference between the corresponding entries in these two matrices is less than \varepsilon_{J} , where \varepsilon_{J} is sufficiently small.

    Proof. We have

    \begin{equation*} \|P_{r_{j},R_{j}}^{i} - f(P_{r_j, R_j}^{i})\|_{\infty} \leq \|L - f(L)\|_{\infty} < \varepsilon. \end{equation*}

    From Remark 4.1, it follows that

    \begin{equation*} |JP_{r_j, R_j}^{i}(10) - Jf(P_{r_j, R_j}^{i})(10)| < \varepsilon_{J}, \end{equation*}

    where \varepsilon_{J} is sufficiently small. By Proposition 4.1, we know that f(P_{r, R}^{i}) = f(P)_{r, R}^{i} . Therefore,

    \begin{equation*} |JP_{r_j, R_j}^{i}(10) - Jf(P)_{r_j, R_j}^{i}(10)| < \varepsilon_{J}, \end{equation*}

    where \varepsilon_{J} is sufficiently small.

    Remark 4.2. The stability of the method used in [20] can be proved in a manner similar to that of Theorem 4.1. Suppose f: L \to f(L) is a continuous function such that \|f(L) - L\|_{\infty} < \varepsilon for all sufficiently small \varepsilon > 0 .

    For the Gauss linking integral, there exist two Gauss linking integral matrices for the segmentation P_{n} = \{l_{1}, l_{2}, \dots, l_{n}\} of a collection of disjoint open or closed curves L in 3-space, as well as for f(P_{n}) of f(L) . These matrices are given by:

    \begin{equation*} GL(P_{n}) = \begin{pmatrix} g(l_{1},l_{1}) & g(l_{1},l_{2}) & \cdots & g(l_{1},l_{n}) \\ g(l_{2},l_{1}) & g(l_{2},l_{2}) & \cdots & g(l_{2},l_{n}) \\ \vdots & \vdots & \ddots & \vdots \\ g(l_{n},l_{1}) & g(l_{n},l_{2}) & \cdots & g(l_{n},l_{n}) \end{pmatrix}, \end{equation*}
    \begin{equation*} GL(f(P_{n})) = \begin{pmatrix} g(f(l_{1}),f(l_{1})) & g(f(l_{1}),f(l_{2})) & \cdots & g(f(l_{1}),f(l_{n})) \\ g(f(l_{2}),f(l_{1})) & g(f(l_{2}),f(l_{2})) & \cdots & g(f(l_{2}),f(l_{n})) \\ \vdots & \vdots & \ddots & \vdots \\ g(f(l_{n}),f(l_{1})) & g(f(l_{n}),f(l_{2})) & \cdots & g(f(l_{n}),f(l_{n})) \end{pmatrix}, \end{equation*}

    where

    \begin{equation*} g(l_{i},l_{j}) = \begin{cases} GL(l_{i},l_{j}), & \text{if } l_{i} \cap l_{j} = \emptyset; \\ 0, & \text{otherwise}. \end{cases} \end{equation*}

    Here, GL(l_{i}, l_{j}) denotes the Gauss linking integral of the curve segments l_{i} and l_{j} .

    As stated in [24, page 3], we can treat the Gauss linking integral of a curve as a continuous function of the coordinates of the curve. Similar to Theorem 4.1, we also have

    \begin{equation*} |g(l_{i},l_{j}) - g(f(l_{i}),f(l_{j}))| < \varepsilon_{GL} \end{equation*}

    for sufficiently small \varepsilon_{GL} .

    In this section, we state and prove the stability of the persistent Jones polynomial, which asserts that small changes in the collection of disjoint open or closed curves L lead to only small changes in the persistent Jones polynomial. As discussed in Section 3.2, the persistent Jones polynomial is represented by a weighted Jones polynomial filtration, which can be expressed as a Jones polynomial weight barcode of facets. For a given barcode, there is a corresponding diagram where each bar in the barcode can be mapped to a point in the diagram. The x -coordinate of this point represents the birth time of the corresponding bar, while the y -coordinate represents its death time.

    Let L be a collection of disjoint open or closed curves in the 3-space, and let P_{n} = \{l_{1}, l_{2}, \dots, l_{n}\} denote a segmentation of L into n segments. Consider a continuous function f: L \to f(L) , which induces a corresponding segmentation of f(L) , represented by f(P_{n}) = \{f(l_{1}), f(l_{2}), \dots, f(l_{n})\} . Let r_1 < r_2 < \dots < r_m be the critical values of the Vietoris-Rips complex of P_n . We denote an interleaved sequence (b_i)_{i = 0, 1, \dots, m} such that b_{i-1} < r_i < b_i for all i . We set b_{-1} = r_0 = -\infty and b_{m+1} = r_{m+1} = +\infty .

    For two integers 0 \leqslant i < j \leqslant m+1 and a fixed integer k , we define the multiplicity of the pair (r_i, r_j) as

    \begin{equation*} \mu_i^j = \beta_{b_{i-1}}^{b_j} - \beta_{b_i}^{b_j} + \beta_{b_i}^{b_{j-1}} - \beta_{b_{i-1}}^{b_{j-1}}, \end{equation*}

    where \beta_x^y is the number of k -facets contained in K_x that remain in K_y for all -\infty \leqslant x \leqslant y \leqslant +\infty . To visualize this definition, consider \beta_x^y as the value of a function \beta at the point (r_{i}, r_{j}) \in \bar{\mathbb{R}}^2 , where \bar{\mathbb{R}} = \mathbb{R} \cup \{-\infty, +\infty\} . Thus, \mu_i^j is the alternating sum of \beta in the corners of the box [b_{i-1}, b_i] \times [b_{j-1}, b_j] , as depicted in Figure 4.

    Figure 4.  The multiplicity of the point (r_i, r_j) is the alternating sum at the corners of the lower right square. When other multiplicities are added, cancellations between plus and minus signs occur.

    Note that if x and x' are in the open interval (r_i, r_{i+1}) , and y and y' are in (r_{j-1}, r_j) , then \beta_x^y = \beta_{x'}^{y'} . Therefore, the multiplicities \mu_i^j are well-defined and always nonnegative.

    Definition 4.1. The diagram D(P_n) \subset \bar{\mathbb{R}}^2 of the persistent Jones polynomial for P_n consists of points (r_i, r_j) with Jones polynomial weights, counted with multiplicity \mu_i^j for 0 \leq i < j \leq m+1 , along with all points on the diagonal, which are counted with infinite multiplicity.

    Each off-diagonal point in the diagram represents the lifespan of a k -facet in the filtration. Similar to the weighted persistent barcode, the Jones polynomial corresponding to the set of curve segments for the k -facet can be used as the weight. This approach results in a weighted persistent diagram for the persistent Jones polynomial.

    The Bottleneck distance is a classical measure used to quantify the difference between two persistent diagrams. It naturally extends to the comparison of weighted persistent diagrams, allowing for a precise demonstration of the variations between two persistent Jones polynomials.

    To better capture the differences between persistent Jones polynomials, we apply a slight modification to the traditional definition of the Bottleneck distance. Let \mathcal{C} and \mathcal{D} be two multi-sets of pairs (\langle a, b \rangle, w) , where \langle a, b \rangle denotes an interval that can be any well-defined member of the set \{[a, b], [a, b), (a, b], (a, b)\} , and w \in \mathbb{R} represents the weight of the interval \langle a, b \rangle .

    A matching between the sets \mathcal{C} and \mathcal{D} is defined as a collection of pairs \chi = \{(I, J) \in \mathcal{C} \times \mathcal{D}\} , where each element I \in \mathcal{C} and each element J \in \mathcal{D} appears in at most one pair within \chi . A matching forms a bijection between a subset of \mathcal{C} and a subset of \mathcal{D} . If a pair (I, J) \in \chi , we say that I is matched with J . Conversely, if an element I does not appear in any pair, it is considered unmatched.

    The cost c(I, J) of the matching elements I = (\langle a, b \rangle, w_1) and J = (\langle c, d \rangle, w_2) is defined as follows:

    \begin{equation*} c(I,J) = \max\big\{|c-a|, |d-b|, |w_1 - w_2|\big\}. \end{equation*}

    Similarly, the cost c(I) of leaving an unmatched element I is defined as:

    \begin{equation*} c(I) = \frac{b-a}{2}. \end{equation*}

    Finally, the cost of a matching \chi is given by:

    \begin{equation*} c(\chi) = \max\left(\sup\limits_{(I,J) \in \chi} c(I,J), \sup\limits_{\text{unmatched } I \in \mathcal{C} \cup \mathcal{D}} c(I)\right). \end{equation*}

    Definition 4.2. The weighted Bottleneck distance between \mathcal{C} and \mathcal{D} is defined as

    \begin{equation*} d_B(\mathcal{C}, \mathcal{D}) = \inf\{c(\chi) \mid \chi {\ is\ a\ matching\ between\ } \mathcal{C} {\ and }\ \mathcal{D} \}. \end{equation*}

    The modified Bottleneck distance increases the emphasis on weight factors compared to the classical Bottleneck distance.

    Let L be a collection of disjoint open or closed curves in the 3-space, and let P_{n} = \{l_{1}, l_{2}, \dots, l_{n}\} denote a segmentation of L into n segments. Consider a continuous function f: L \to f(L) , which induces a segmentation of f(L) , denoted by f(P_{n}) = \{f(l_{1}), f(l_{2}), \dots, f(l_{n})\} .

    There are two persistent Jones polynomials based on P_{n} and f(P_{n}) , represented as J\mathcal{F}(P_{n}) and J\mathcal{F}(f(P_{n})) , respectively. The facet weights in these persistent Jones polynomials are polynomials. By setting the Jones polynomial variable t = 10 , the weight of each facet is converted to a real number. Thus, the converted persistent Jones polynomials can be denoted by J\mathcal{F}(P_{n})(10) and J\mathcal{F}(f(P_{n}))(10) . These can be expressed using weighted persistent diagrams, denoted by D(P_{n}) and D(f(P_{n})) .

    Suppose \|f(L) - L\| < \varepsilon for all sufficiently small \varepsilon > 0 . Then, for all l_{p_{i}} \in P_{n}(p) , we have \|l_{p_i} - f(l_{p_i})\|_{\infty} < \varepsilon . Let l_{p_{i}}, l_{p_{j}} be any two curve segments in P_{n}(p) . Then, there exists

    Thus,

    \begin{equation*} d(l_{p_{i}}, l_{p_{j}}) - 2\varepsilon \leq d(f(l_{p_{i}}), f(l_{p_{j}})) \leq d(l_{p_{i}}, l_{p_{j}}) + 2\varepsilon. \end{equation*}

    There is an important lemma, proved in [36].

    Lemma 4.1. [36, Box Lemma] For a < b < c < d , let R = [a, b] \times [c, d] be a box in \mathbb{R}^{2} , and let R_{2\varepsilon} = [a+2\varepsilon, b-2\varepsilon] \times [c+2\varepsilon, d-2\varepsilon] be the box obtained by shrinking R on all sides by 2\varepsilon . It follows that:

    \begin{equation*} \#(D(P_{n}) \cap R_{2\varepsilon}) \leqslant \#(D(f(P_{n})) \cap R). \end{equation*}

    Theorem 4.2. Let L be a collection of disjoint open or closed curves, and let P_{n} = \{l_{1}, l_{2}, \dots, l_{n}\} denote a segmentation of L into n segments. Suppose f: L \to f(L) is a continuous function such that \| f(L) - L \|_{\infty} < \varepsilon for all sufficiently small \varepsilon > 0 . Then, the weighted Bottleneck distance between the weighted persistent diagrams of the persistent Jones polynomials, d_{B}(D(P_{n}), D(f(P_{n}))) , is sufficiently small.

    Proof. Let L be a collection of disjoint open or closed curves in 3-space, and let P_{n} = \{l_{1}, l_{2}, \dots, l_{n}\} denote a segmentation of L into n segments. Consider a continuous function f: L \to f(L) , which induces a segmentation of f(L) , denoted by f(P_{n}) = \{f(l_{1}) , f(l_{2}) , \dots, f(l_{n})\} . There are two persistent Jones polynomial diagrams for P_{n} and f(P_{n}) , denoted by D(P_{n}) and D(f(P_{n})) .

    Consider the minimum distance between two distinct off-diagonal points or between an off-diagonal point and the diagonal:

    \begin{equation*} \delta_L = \min\{\|p - q\|_{\infty} \mid p \neq q \in D(P_{n}) - \Delta\}. \end{equation*}

    Assuming that \varepsilon > 0 is sufficiently small, we take \varepsilon < \delta_{L}/4 .

    By drawing cubes of radius 2\varepsilon around points in D(P_{n}) , we obtain a thickened diagonal plane along with a finite set of disjoint cubes that are also disjoint from the thickened diagonal, as shown in Figure 5.

    Figure 5.  The shaded squares are centered at the black points of D(P_{n}) .

    Let \mu denote the multiplicity of a point p in D(P_{n}) \setminus \Delta , and let \square_{2\varepsilon} represent the cube centered at p with radius 2\varepsilon . According to Lemma 4.1,

    \begin{equation*} \mu \leq \#(D(f(P_{n})) \cap \square_{2\varepsilon}) \leq \#(D(P_{n}) \cap \square_{4\varepsilon}). \end{equation*}

    Since 4\varepsilon < \delta_L , p is the only point in D(P_{n}) within \square_{4\varepsilon} , which implies that \#(D(f(P_{n})) \cap \square_{2\varepsilon}) = \mu .

    Now, let p = (x, y) be an off-diagonal point in the diagram D(P_{n}) with multiplicity \mu = 1 . A corresponding collection of curve segments P_{n}(p) = \{l_{p_1}, l_{p_2}, \dots, l_{p_t}\} exists in P_{n} . Applying the continuous function f to these segments yields the transformed collection f(P_{n}(p)) = \{f(l_{p_1}), f(l_{p_2}), \dots, f(l_{p_t})\} .

    According to the definition of the persistent Jones polynomial, the point p = (x, y) indicates that P_{n}(p) = \{l_{p_1}, l_{p_2}, \dots, l_{p_t}\} forms a facet with birth x and death y in the filtration \mathcal{F}(P_{n}) . This implies that any two segments in P_{n}(p) are at a distance less than x , and any segment l \in P_{n} \setminus P_{n}(p) is at least a distance y from all l_{p_i} \in P_{n}(p) , for 1 \leq i \leq t .

    Since

    \begin{equation*} d(l_{p_{i}}, l_{p_{j}}) - 2\varepsilon \leq d(f(l_{p_{i}}), f(l_{p_{j}})) \leq d(l_{p_{i}}, l_{p_{j}}) + 2\varepsilon, \end{equation*}

    then any two curve segments in f(P_{n}(p)) are within a distance of x + 2\varepsilon . Furthermore, for f(l) \in f(P_{n} \setminus P_{n}(p)) and f(l_{p_{i}}) \in f(P_{n}(p)) , the distance is at least y - 2\varepsilon .

    Thus, the segment collection f(P_{n}(p)) forms a facet within \mathcal{F}(f(P_{n})) , corresponding to a point f(p) in D(f(P_{n})) , within the region [x, x + 2\varepsilon] \times [y - 2\varepsilon, y] , indicating that \|f(p) - p\|_{\infty} < 2\varepsilon . Moreover, we have \#(D(f(P_{n})) \cap \square_{2\varepsilon}) = 1 ; hence, f(p) is the only point in D(f(P_{n})) \cap \square_{2\varepsilon} .

    For off-diagonal points p^{1} = p^{2} = \dots = p^{\mu} = (x, y) with \mu > 1 , this indicates \mu distinct facets in \mathcal{F}(P_{n}) with the same birth x and death y , but distinct curve segment collections P_{n}(p^1) , P_{n}(p^2) , \dots, P_{n}(p^\mu) . Analogous to the case when \mu = 1 , the images f(p^{1}) , f(p^{2}) , \dots, f(p^{\mu}) lie within [x, x + 2\varepsilon] \times [y - 2\varepsilon, y] , which gives \|f(p^{i}) - p^{i}\|_{\infty} < 2\varepsilon for 1 \leq i \leq \mu . Moreover, we have that \#(D(f(P_{n})) \cap \square_{2\varepsilon}) = \mu ; hence, f(p^{1}), f(p^{2}), \dots, f(p^{\mu}) are the \mu points in D(f(P_{n})) \cap \square_{2\varepsilon} .

    The weights of p and f(p) are JP_{n}(p)(10) and Jf(P_{n}(p))(10) , respectively. Since

    \begin{equation*} \|f(P_{n}(p)) - P_{n}(p)\|_{\infty} \leq \|f(L) - L\|_{\infty} < \varepsilon, \end{equation*}

    by Remark 4.1, the weight difference satisfies |Jf(P_{n}(p))(10) - JP_{n}(p)(10)| < \varepsilon_J , where \varepsilon_{J} > 0 is sufficiently small.

    After examining all the off-diagonal points in D(P_{n}) , the only points in D(f(P_{n})) that are not images f(p) for some p \in (D(P_{n})\setminus \Delta) lie beyond 2\varepsilon from D(P_{n}) \setminus \Delta .

    Let q \in D(f(P_{n})) be a point for which there is no corresponding point p \in D(P_{n}) such that f(p) = q . Suppose the distance from q to \Delta is greater than 2\varepsilon . Then, there exists a square \square^{q}_{2\varepsilon} centered at q with a radius of 2\varepsilon such that D(P_{n}) \cap \square^{q}_{2\varepsilon} = \emptyset . This contradicts Lemma 4.1, which states that 1 \leq \#(D(P_{n}) \cap \square^{q}_{2\varepsilon}) \neq 0 . Therefore, the distance from q to \Delta must be less than 2\varepsilon .

    There exists a natural matching between D(P_{n}) and D(f(P_{n})) , represented as \chi = \{(p, f(p)) \mid p \in D(P_{n}) \setminus \Delta \} , with unmatched points in D(f(P_{n})) regarded as not corresponding to any point in D(P_{n}) . Therefore, by the definition of the weighted Bottleneck distance,

    \begin{equation*} d_B(D(P_{n}), D(f(P_{n}))) < \max \{2\varepsilon, \varepsilon_J\}, \end{equation*}

    where \varepsilon and \varepsilon_J are sufficiently small, thus, completing the proof.

    In other words, the weighted persistent diagrams of persistent Jones polynomials are stable under small-amplitude or possibly irregular perturbations.

    The utility of the proposed multi-scale and persistent Jones polynomial models is demonstrated through their application to real-world problems. These models provide robust frameworks for analyzing both local and global structural properties of curves in the 3-space. By leveraging their capacity to encode entanglement complexity features, these models provide new perspectives and tools for understanding protein flexibility, stability, and entanglement.

    In this section, we illustrate two important applications of the proposed models. The first application focuses on predicting the B-factor of protein residues, a critical indicator of protein flexibility. This application demonstrates the practical utility of the multi-scale Jones polynomial in processing curve structural data. The second application uses persistent Jones polynomial to analyze the topological features of protein secondary structures, specifically \alpha -helix and \beta -sheets. These applications demonstrate the versatility and potential of the proposed models to advance computational structural biology.

    B-factors, also known as Debye-Waller factors, measure atomic displacements within protein structures and provide insight into molecular flexibility and stability. Analysis of B-factors enables a deeper understanding of protein dynamics and aids in predicting regions with high structural mobility, which is crucial for understanding protein function and interactions.

    To eliminate the influence of irrelevant atomic information and better capture the geometric and topological properties of the protein structure, each amino acid is represented by its C_{\alpha} atom. These C_{\alpha} atoms are sequentially connected to form a C_{\alpha} chain L . Let C = \{c_0, c_1, \dots, c_n\} denote the set of C_{\alpha} atoms arranged in the sequence of the protein. The C_{\alpha} chain of the protein is considered a disjoint open curve. Segmentation of the C_{\alpha} chain is achieved by cutting the midpoint between each C_{\alpha} atom and its adjacent C_{\alpha} atom, denoted by P_{n} = \{l_{0}, l_{1}, \dots, l_{n}\} . The distance between two curve segments is defined as the distance between the C_{\alpha} atoms contained in the segments, that is, d(l_{i}, l_{j}) = d(c_{i}, c_{j}) .

    In this study, we select the radius r to range from 4\mathring{\mathrm{A}} to 15\mathring{\mathrm{A}} with a step size of 0.25\mathring{\mathrm{A}} . We set R = (r + 1)\mathring{\mathrm{A}} , so the radius R ranges from 5\mathring{\mathrm{A}} to 16\mathring{\mathrm{A}} . In total, the interception range is from 4\mathring{\mathrm{A}} to 16\mathring{\mathrm{A}} . Thus, there is a characteristic matrix for P_{n} of the protein C_{\alpha} chain L ,

    \begin{equation*} \begin{pmatrix} JP_{4\mathring{\mathrm{A}}, 5\mathring{\mathrm{A}}}^{1}(10) & JP_{4.25\mathring{\mathrm{A}}, 5.25\mathring{\mathrm{A}}}^{1}(10) & \cdots & JP_{15\mathring{\mathrm{A}}, 16\mathring{\mathrm{A}}}^{1}(10)\\ JP_{4\mathring{\mathrm{A}}, 5\mathring{\mathrm{A}}}^{2}(10) & JP_{4.25\mathring{\mathrm{A}}, 5.25\mathring{\mathrm{A}}}^{2}(10) & \cdots & JP_{15\mathring{\mathrm{A}}, 16\mathring{\mathrm{A}}}^{2}(10)\\ \vdots & \vdots & \ddots & \vdots\\ JP_{4\mathring{\mathrm{A}}, 5\mathring{\mathrm{A}}}^{n}(10) & JP_{4.25\mathring{\mathrm{A}}, 5.25\mathring{\mathrm{A}}}^{n}(10) & \cdots & JP_{15\mathring{\mathrm{A}}, 16\mathring{\mathrm{A}}}^{n}(10)\\ \end{pmatrix}. \end{equation*}

    This choice is motivated by the fact that the average distance between C_{\alpha} atoms is approximately 3.8 \mathring{\mathrm{A}} . The selected radius scheme results in a powerful feature extraction method that provides rich representations of local protein structures. To minimize the influence of overly complex machine learning models, and to emphasize the effectiveness of the multi-scale Jones polynomial and avoid overfitting, we chose to use a Lasso regression model with parameter 0.16 for B-factor prediction.

    To validate the effectiveness of the multi-scale Jones polynomial in predicting C_{\alpha} atom B factors across proteins of varying sizes, we compared our method with several previous approaches, including mGLI [20], evolutionary homology (EH) [37], atom-specic persistent homology (ASPH) [38], optimal flexibility-rigidity index (opFRI) [39], parameter free flexibility-rigidity index (pfFRI) [39], Gaussian network model (GNM) [30], and normal mode analysis (NMA) [30]. The comparison was performed on three sets of proteins from [30], as shown in Figure 6.

    Figure 6.  Comparison of B-factor predictions on three protein datasets between our multi-scale Jones polynomial method and other approaches from the literature.

    The multi-scale Jones polynomial method achieved average correlation coefficients of 0.899 , 0.808 , and 0.720 for small, medium, and large protein sets, respectively. Our results on these three datasets outperformed previous methods.

    To further illustrate the performance of the multi-scale Jones polynomial (mJP) analysis, we present a case study involving a potential antibiotic synthesis protein (PDBID: \mathrm{1V70} ) containing 105 residues, as shown in Figure 7(a). After processing with the mJP model, as shown in Figure 7, a characteristic matrix is generated. The normalized characteristic matrix is then used as the input to the Lasso regression model, as illustrated in Figure 7(b). The Lasso regression model is used to predict the B-factor of each residue and compare the predicted values with the experimentally determined values, as shown in Figure 7(c).

    Figure 7.  The process of mJP analysis for protein B-factor prediction. (a) The 3D structure of the protein \mathrm{1V70} ; (b) The normalized characteristic matrix; (c) A comparison of the predicted B-factors with experimentally determined values.

    Compared to traditional B-factor analysis methods, which focus on individual atoms, their spatial positions in the 3-space, and the thermal motion and disorder of atoms within the protein structure, our approach effectively captures the torsional entanglement of the peptide chain at each C_{\alpha} atomic position by incorporating the Jones polynomial. This torsional entanglement of the peptide chain significantly influences the observed B-factor values.

    The torsional entanglement of protein peptide chains, captured through the multi-scale Jones polynomial, provides critical insight into the structural and functional dynamics of proteins. By analyzing torsional entanglement, this approach reveals patterns of molecular flexibility and rigidity that allow for an in-depth understanding of protein stability and function. Furthermore, this method improves our ability to model and predict regions of structural mobility, offering potential applications for protein engineering and drug discovery.

    In molecular biology, \alpha -helices and \beta -sheets are fundamental secondary structures in proteins, stabilized by hydrogen bonding patterns that contribute to the overall stability and function of the protein. Additionally \alpha -helices are typically more rigid than \beta -sheets. To explore the local structural complexity and stability of these structures, we employed topological analysis using the barcode representation of the persistent Jones polynomial (pJP). Using protein data from the Protein Data Bank (PDB), we demonstrated this approach with examples, including the analysis of an \alpha -helix chain consisting of 19 residues of the protein with PDB ID \mathrm{1C26} . Additionally, we extracted two parallel \beta -sheets consisting of 16 residues from the protein \mathrm{2JOX} to explore their barcodes representations of the persistent Jones polynomial.

    To eliminate the influence of irrelevant atomic information and better capture the geometric and topological properties of the protein structure, each amino acid is represented by its C_{\alpha} atom, as shown in Figure 8(a) and (b). These C_{\alpha} atoms are sequentially connected to form a C_{\alpha} chain, which is treated as a disjoint open curve. Segmentation of the C_{\alpha} chain is achieved by cutting the midpoint between each C_{\alpha} atom and its adjacent C_{\alpha} atom. The distance between two curve segments is defined as the distance between the C_{\alpha} atoms contained within these segments.

    Figure 8.  The process of the pJP analysis for the \alpha -helix and \beta -sheets. (a) (b) The 3D structures of the \alpha -helix and \beta -sheets. (c) The colored barcodes visualizing the \alpha -helix. (d) The colored barcodes visualizing the \beta -sheets. The colored barcodes obtained through the pJP model can be applied to machine learning for protein structure analysis.

    The process of the pJP model is illustrated in Figure 8, using colored barcodes to represent the \alpha -helix and \beta -sheets structures.

    Figure 8(c) represents the barcodes corresponding to the \alpha -helix. In the 0-facets panel, there are 19 bars with Jones polynomial weights of 0. Each bar has a length of approximately 3.8\mathring{\mathrm{A}} , which is the average distance between two C_{\alpha} atoms. Additionally, the 1-facets panel contains 18 bars with similar birth times and life-spans, starting around 3.8\mathring{\mathrm{A}} and persisting until approximately 5.4\mathring{\mathrm{A}} , each with a Jones polynomial weight of 0. These bars correspond to facets formed by two adjacent C_{\alpha} atoms. Moreover, 16 short-lived bars represent facets formed by two nonadjacent C_{\alpha} atoms. As shown in the 2-facets panel, the bars capture the persistence of facets formed by three C_{\alpha} atoms, along with the complexity of entanglement of the corresponding polyline system.

    Figure 8(d) represents the barcodes corresponding to the \beta -sheets. Similar to the \alpha -helix case, the segmentation and distance definitions are consistent. The 0-facets panel includes 16 0-facets bars, indicating the presence of 16 C_{\alpha} atoms. In the 1-facets panel, there are 14 bars for facets formed by two adjacent C_{\alpha} atoms and 8 for facets formed by nonadjacent C_{\alpha} atoms. The bars in the 2-facets panel represent facets formed by three C_{\alpha} atoms and provide information about the complexity of entanglement. The longer lifespans of the bars in the 2-facets panel of the \beta -sheets compared to the \alpha -helix suggest that the C_{\alpha} atoms in the \beta -sheets are more spatially dispersed.

    The color of the barcodes reflects the value of the Jones polynomial weight, which indicates the difference of the torsional entanglement of the set of curve segments in the system. It is important to note that the color gradient, whether it tends to red or blue, does not imply a higher or lower degree of entanglement complexity in the represented set of curve segments. Rather, a greater color difference between two bars indicates a greater difference between the sets of curve segments they represent.

    From the color bars in Figures 8 (c) and (d), it can be observed that the Jones polynomial weight for the \alpha -helix ranges from -86 to 0 , while for the \beta -sheets it ranges from -6 to 0 . This indicates a greater variability in the sets of curve segments represented by the facets within the \alpha -helix compared to the \beta -sheets, suggesting that the \alpha -helix exhibits more complex entanglement, as observed.

    The persistent Jones polynomial effectively captures the torsional entanglement of secondary structures, such as \alpha -helices and \beta -sheets, within protein peptide chains. This torsional entanglement plays a critical role in the analysis of protein structure and function as it provides insight beyond atomic positions alone. By incorporating the Jones polynomial, our approach reveals important topological characteristics that are important for protein stability and functionality.

    In this study, the selection of the segmentation P_{n} = \{l_{1}, l_{2}, \dots, l_{n}\} for a collection of curves L is crucial to capture the topological and geometric characteristics of the curve structure L . The segmentation serves as the basis for defining and calculating both the multi-scale analysis of the Jones polynomial and the persistent Jones polynomial. First, the outcomes of these two models depend not only on the spatial positions of the segments but also on their relative lengths in relation to the entire curve. As the segment length approaches zero, the results of the models tend toward triviality. Similarly, when the segment extends to cover the entire curve, the models recover global information. In both of these cases, the models cannot extract meaningful local information for spatial data. Thus, the choice of segmentation depends on the specific application.

    Knot theory has traditionally focused on global invariants, but real-world applications often require local structural insights. Classical knot invariants mainly reflect global topology and fail to capture crucial local structural details in applications like molecular biology and highway crossing design. To address this gap, localized versions of invariants such as the multi-scale Jones polynomials and persistent Jones polynomials have been developed. These localized models decompose global invariants for analyzing local topology in the context of the entire structure, promoting the application of KDA or CDA in systems where both global and local structures matter.

    The stability of the model is crucial for practical applications. In real-world data, noise and minor perturbations pose challenges. Stability ensures that minor input changes do not cause disproportionate changes in the calculated invariants. It is critical in biological or physical contexts. For multi-scale Jones polynomial and persistent Jones polynomial models, we demonstrate stability under small perturbations. Minor adjustments in the collection L result in slight modifications to characteristic matrices and barcodes or diagrams. This stability makes the models reliable for structural topology and is applicable in KDA or CDA for real-world data.

    The torsional entanglement of protein peptide chains, captured through both multi-scale and persistent Jones polynomials, provides crucial insights into the structure and function of proteins. By analyzing torsional entanglement, this approach reveals patterns of molecular flexibility and rigidity that allow a detailed understanding of protein structure and function. Furthermore, this method improves our capacity to model and predict regions of structural mobility and reactivity, offering valuable implications for enzyme kinetics and protein engineering.

    In Section 5.2 of this manuscript, the barcodes of the persistent Jones polynomial represent the birth, death, and lifespan of the facets in the complexes under filtration. In contrast, persistent homology barcodes capture the changes in the homology classes of the complexes during filtration, i.e., the changes in the generators of the homology groups of the complexes (the number of generators corresponds to the Betti number). Despite these differences, there are important similarities between the two concepts. Both are based on filtration and provide insight into data characteristics by examining the evolution of the complexes during filtration. In addition, the bars in the barcodes of the persistent Jones polynomial represent facets formed by subsets of curve segments from the segmentation P_{n} = \{l_{1}, l_{2}, \dots, l_{n}\} of the collection of curves L . Therefore, these bars are constructed based on the distance conditions between the curve segments in the segmentation P_{n} . Similarly, in the case of point cloud data, persistent homology barcodes are also constructed according to the distance conditions of the points in the point cloud data.

    The multi-scale Gauss link integral model [20] and the present multi-scale Jones polynomial and persistent Jones polynomial models represent solid advances in computational geometric topology. These approaches have great potential for real-world applications when combined with machine learning and artificial intelligence.

    Ruzhi Song: Writing-original draft, Writing-review & editing; Fengling Li: Methodology, Writing-original draft, Writing-review & editing; Jie Wu: Conceptualization; Fengchun Lei: Conceptualization, Methodology; Guo-Wei Wei: Conceptualization, Writing-review & editing. All authors have read and approved the final version of the manuscript for publication.

    The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.

    This work was supported in part by grant (No. 12331003) of National Natural Science Foundation of China, State Key Laboratory of Structural Analysis, Optimization and CAE Software for Industrial Equipment, and Beijing Institute of Mathematical Sciences and Applications. The work of GWW was supported in part by NIH grants R01GM126189, R01AI164266, and R35GM148196, NSF grants DMS-2052983, DMS-1761320, DMS-2245903, and IIS-1900473, MSU Research Foundation, and Bristol-Myers Squibb 65109. The work of JW was supported in part by High-level Scientific Research Foundation of Hebei Province.

    The authors declare no competing interest.



    [1] R. H. Crowell, R. H. Fox, Introduction to knot theory, New York: Springer, 1963. https://dx.doi.org/10.1007/978-1-4612-9935-6
    [2] J. W. Alexander, Topological invariants of knots and links, Trans. Amer. Math. Soc., 30 (1928), 275–306. https://dx.doi.org/10.1090/S0002-9947-1928-1501429-1 doi: 10.1090/S0002-9947-1928-1501429-1
    [3] V. F. R. Jones, A polynomial invariant for knots via von Neumann algebras, Bull. Amer. Math. Soc., 12 (1985), 103–111. https://dx.doi.org/10.1090/s0273-0979-1985-15304-2 doi: 10.1090/s0273-0979-1985-15304-2
    [4] C. Manolescu, An introduction to knot Floer homology, 2014, arXiv: 1401.7107.
    [5] M. Khovanov, A categorification of the Jones polynomial, Duke Math. J., 101 (2000), 359–426. https://dx.doi.org/10.1215/S0012-7094-00-10131-7 doi: 10.1215/S0012-7094-00-10131-7
    [6] T. Ohtsuki, Quantum invariants: A study of knots, 3-manifolds, and their sets, Singapore: World Scientific, 2001. https://dx.doi.org/10.1142/4746
    [7] C. Z. Liang, K. Mislow, Knots in proteins, J. Am. Chem. Soc., 116 (1994), 11189–11190. https://dx.doi.org/10.1021/ja00103a057
    [8] D. W. Sumners, The role of knot theory in DNA research, Boca Raton: CRC Press, 1986.
    [9] T. Schlick, Q. Y. Zhu, A. Dey, S. Jain, S. T. Yan, A. Laederach, To knot or not to knot: multiple conformations of the SARS-CoV-2 frameshifting RNA element, J. Am. Chem. Soc., 143 (2021), 11404–11422. https://dx.doi.org/10.1021/jacs.1c03003 doi: 10.1021/jacs.1c03003
    [10] K. C. Millett, E. J. Rawdon, A. Stasiak, J. I. Sułkowska, Identifying knots in proteins, Biochem. Soc. Trans., 41 (2013), 533–537. https://dx.doi.org/10.1042/bst20120339 doi: 10.1042/bst20120339
    [11] J. Qin, S. T. Milner, Counting polymer knots to find the entanglement length, Soft Matter, 7 (2011), 10676–10693. https://dx.doi.org/10.1039/c1sm05972f doi: 10.1039/c1sm05972f
    [12] Y. Z. Liu, M. O'Keeffe, M. M. J. Treacy, O. M. Yaghi, The geometry of periodic knots, polycatenanes and weaving from a chemical perspective: a library for reticular chemistry, Chem. Soc. Rev., 47 (2018), 4642–4664. https://dx.doi.org/10.1039/c7cs00695k doi: 10.1039/c7cs00695k
    [13] R. L. Ricca, Topology bounds energy of knots and links, Proc. R. Soc. A., 464 (2008), 293–300. https://dx.doi.org/10.1098/rspa.2007.0174 doi: 10.1098/rspa.2007.0174
    [14] E. Panagiotou, K. C. Millett, P. J. Atzberger, Topological methods for polymeric materials: characterizing the relationship between polymer entanglement and viscoelasticity, Polymers, 11 (2019), 437. https://dx.doi.org/10.3390/polym11030437 doi: 10.3390/polym11030437
    [15] J. Arsuaga, M. Vazquez, P. McGuirk, S. Trigueros, D. W. Sumners, J. Roca, DNA knots reveal a chiral organization of DNA in phage capsids, Proc. Natl. Acad. Sci. U.S.A., 102 (2005), 9165–9169. https://dx.doi.org/10.1073/pnas.0409323102 doi: 10.1073/pnas.0409323102
    [16] J. I. Sulkowska, E. J. Rawdon, K. C. Millet, J. N. Onuchic, A. Stasiak, Conservation of complex knotting and slipknotting patterns in proteins, Proc. Natl. Acad. Sci. U.S.A., 109 (2012), E1715–E1723. https://dx.doi.org/10.1073/pnas.1205918109 doi: 10.1073/pnas.1205918109
    [17] P. Freyd, D. Yetter, J. Hoste, W. B. R. Lickorish, K. Millett, A. Ocneanu, A new polynomial invariant of knots and links, Bull. Amer. Math. Soc., 12 (1985), 239–246. https://dx.doi.org/10.1090/s0273-0979-1985-15361-3 doi: 10.1090/s0273-0979-1985-15361-3
    [18] L. H. Kauffman, An invariant of regular isotopy, Trans. Amer. Math. Soc., 318 (1990), 417–471. https://dx.doi.org/10.1090/S0002-9947-1990-0958895-7 doi: 10.1090/S0002-9947-1990-0958895-7
    [19] J. H. Przytycki, P. Traczyk, Conway algebras and skein equivalence of links, Proc. Amer. Math. Soc., 100 (1987), 744–748. https://dx.doi.org/10.1090/S0002-9939-1987-0894448-2 doi: 10.1090/S0002-9939-1987-0894448-2
    [20] L. Shen, H. S. Feng, F. L. Li, F. C. Lei, J. Wu, G.-W. Wei, Knot data analysis using multiscale Gauss link integral, Proc. Natl. Acad. Sci. U.S.A., 121 (2024), e2408431121. https://dx.doi.org/10.1073/pnas.2408431121 doi: 10.1073/pnas.2408431121
    [21] E. Panagiotou, K. W. Plaxco, A topological study of protein folding kinetics, 2018, arXiv: 1812.08721.
    [22] Q. Baldwin, E. Panagiotou, The local topological free energy of proteins, J. Theor. Biol., 529 (2021), 110854. https://dx.doi.org/10.1016/j.jtbi.2021.110854 doi: 10.1016/j.jtbi.2021.110854
    [23] Q. Baldwin, B. Sumpter, E. Panagiotou, The local topological free energy of the SARS-CoV-2 Spike protein, Polymers, 14 (2022), 3014. https://dx.doi.org/10.3390/polym14153014 doi: 10.3390/polym14153014
    [24] E. Panagiotou, L. H. Kauffman, Knot polynomials of open and closed curves, Proc. R. Soc. A., 476 (2020), 20200124. https://dx.doi.org/10.1098/rspa.2020.0124 doi: 10.1098/rspa.2020.0124
    [25] K. Barkataki, E. Panagiotou, The Jones polynomial of collections of open curves in 3-space, Proc. R. Soc. A., 478 (2022), 20220302. https://dx.doi.org/10.1098/rspa.2022.0302 doi: 10.1098/rspa.2022.0302
    [26] E. Panagiotou, L. H. Kauffman, Vassiliev measures of complexity of open and closed curves in 3-space, Proc. R. Soc. A., 477 (2021), 20210440. https://dx.doi.org/10.1098/rspa.2021.0440 doi: 10.1098/rspa.2021.0440
    [27] J. Wang, E. Panagiotou, The protein folding rate and the geometry and topology of the native state, Sci. Rep., 12 (2022), 6384. https://dx.doi.org/10.1038/s41598-022-09924-0 doi: 10.1038/s41598-022-09924-0
    [28] T. Herschberg, K. Pifer, E. Panagiotou, A computational package for measuring Topological Entanglement in Polymers, Proteins and Periodic systems (TEPPP), Comput. Phys. Commun., 286 (2023), 108639. https://dx.doi.org/10.1016/j.cpc.2022.108639 doi: 10.1016/j.cpc.2022.108639
    [29] Z. X. Cang, G.-W. Wei, Persistent cohomology for data with multicomponent heterogeneous information, SIAM J. Math. Data Sci., 2 (2020), 396–418. https://dx.doi.org/10.1137/19m1272226 doi: 10.1137/19m1272226
    [30] J. K. Park, R. Jernigan, Z. J. Wu, Coarse grained normal mode analysis vs. refined gaussian network model for protein residue-level structural fluctuations, Bull. Math. Biol., 75 (2013), 124–160. https://dx.doi.org/10.1007/s11538-012-9797-y doi: 10.1007/s11538-012-9797-y
    [31] V. Turaev, Knotoids, Osaka J. Math., 49 (2012), 195–223. https://dx.doi.org/10.18910/10080
    [32] N. Gügümcü, L. H. Kauffman, New invariants of knotoids, Eur. J. Combin., 65 (2017), 186–229. https://dx.doi.org/10.1016/j.ejc.2017.06.004
    [33] N. Gügümcü, S. Lambropoulou, Knotoids, braidoids and applications, Symmetry, 9 (2017), 315. https://dx.doi.org/10.3390/sym9120315
    [34] N. Gügümcü, L. Kauffman, Parity in knotoids, 2019, arXiv: 1905.04089.
    [35] M. Manouras, S. Lambropoulou, L. H. Kauffman, Finite type invariants for knotoids, Eur. J. Combin., 98 (2021), 103402. https://dx.doi.org/10.1016/j.ejc.2021.103402 doi: 10.1016/j.ejc.2021.103402
    [36] D. Cohen-Steiner, H. Edelsbrunner, J. Harer, Stability of persistence diagrams, In: Proceedings of the twenty-first annual symposium on computational geometry, New York: Association for Computing Machinery, 2005,263–271. https://dx.doi.org/10.1145/1064092.1064133
    [37] Z. X. Cang, E. Munch, G.-W. Wei, Evolutionary homology on coupled dynamical systems with applications to protein flexibility analysis, J. Appl. and Comput. Topology, 4 (2020), 481–507. https://dx.doi.org/10.1007/s41468-020-00057-9 doi: 10.1007/s41468-020-00057-9
    [38] D. Bramer, G.-W. Wei, Atom-specific persistent homology and its application to protein flexibility analysis, Comput. Math. Biophys., 8 (2020), 1–35. https://dx.doi.org/10.1515/cmb-2020-0001 doi: 10.1515/cmb-2020-0001
    [39] K. Opron, K. L. Xia, G.-W. Wei, Fast and anisotropic flexibility-rigidity index for protein flexibility and fluctuation analysis, J. Chem. Phys., 140 (2014), 234105. https://dx.doi.org/10.1063/1.4882258 doi: 10.1063/1.4882258
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(365) PDF downloads(49) Cited by(0)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog