Loading [MathJax]/jax/output/SVG/jax.js
Research article Special Issues

Improved collision detection of MD5 with additional sufficient conditions

  • One application of counter-cryptanalysis is detecting whether a message block is involved in a collision attack, such as the detection of MD5 and SHA-1. Stevens and Shumow speeded up the detection of SHA-1 by introducing unavoidable conditions in message blocks. They left a challenge: how to determine unavoidable conditions for MD5. Later, Shen et al. found that the unavoidable conditions of MD5 were the sufficient conditions located in the last round of differential paths. In this paper, we made further work. We discover sufficient conditions in the second round that can also be used as unavoidable conditions. With additional sufficient conditions, we subdivide three sets and distinguish seven more classes. As a result, compared with Shen's collision detection algorithm, our improved algorithm reduces the collision detection cost by 8.18%. Finally, we find that they do exist in the differential paths constructed by the automatic tool "HashClash".

    Citation: Linan Fang, Ting Wu, Yongxing Qi, Yanzhao Shen, Peng Zhang, Mingmin Lin, Xinfeng Dong. Improved collision detection of MD5 with additional sufficient conditions[J]. Electronic Research Archive, 2022, 30(6): 2018-2032. doi: 10.3934/era.2022102

    Related Papers:

    [1] Miao Luo, Yousong Chen, Dawei Gao, Lijun Wang . Inversion study of vehicle frontal collision and front bumper collision. Electronic Research Archive, 2023, 31(2): 776-792. doi: 10.3934/era.2023039
    [2] Shao-Yuan Huang, Wei-Hsun Lee . Sufficient conditions for exact bifurcation curves in Minkowski curvature problems and their applications. Electronic Research Archive, 2025, 33(4): 2325-2351. doi: 10.3934/era.2025103
    [3] Xuanfeng Li, Jiajia Yu . Joint attention mechanism for the design of anti-bird collision accident detection system. Electronic Research Archive, 2022, 30(12): 4401-4415. doi: 10.3934/era.2022223
    [4] Zhaoyan Meng, Shuting Lyu, Mengqing Zhang, Xining Li, Qimin Zhang . Sufficient and necessary conditions of near-optimal controls for a stochastic listeriosis model with spatial diffusion. Electronic Research Archive, 2024, 32(5): 3059-3091. doi: 10.3934/era.2024140
    [5] Milena Dimova, Natalia Kolkovska, Nikolai Kutev . Global behavior of the solutions to nonlinear Klein-Gordon equation with critical initial energy. Electronic Research Archive, 2020, 28(2): 671-689. doi: 10.3934/era.2020035
    [6] Yang Gao, Qingzhong Ji . On the inverse stability of $ z^n+c $. Electronic Research Archive, 2025, 33(3): 1414-1428. doi: 10.3934/era.2025066
    [7] Hui Xu, Xinyang Zhao, Qiyun Yin, Junting Dou, Ruopeng Liu, Wengang Wang . Isolating switch state detection system based on depth information guidance. Electronic Research Archive, 2024, 32(2): 836-856. doi: 10.3934/era.2024040
    [8] Yuri Prokhorov . Conic bundle structures on $ \mathbb{Q} $-Fano threefolds. Electronic Research Archive, 2022, 30(5): 1881-1897. doi: 10.3934/era.2022095
    [9] Jin Wang, Jun-E Feng, Hua-Lin Huang . Solvability of the matrix equation $ AX^{2} = B $ with semi-tensor product. Electronic Research Archive, 2021, 29(3): 2249-2267. doi: 10.3934/era.2020114
    [10] Xun Wang, Qunyi Bie . Energy equality for the multi-dimensional nonhomogeneous incompressible Hall-MHD equations in a bounded domain. Electronic Research Archive, 2023, 31(1): 17-36. doi: 10.3934/era.2023002
  • One application of counter-cryptanalysis is detecting whether a message block is involved in a collision attack, such as the detection of MD5 and SHA-1. Stevens and Shumow speeded up the detection of SHA-1 by introducing unavoidable conditions in message blocks. They left a challenge: how to determine unavoidable conditions for MD5. Later, Shen et al. found that the unavoidable conditions of MD5 were the sufficient conditions located in the last round of differential paths. In this paper, we made further work. We discover sufficient conditions in the second round that can also be used as unavoidable conditions. With additional sufficient conditions, we subdivide three sets and distinguish seven more classes. As a result, compared with Shen's collision detection algorithm, our improved algorithm reduces the collision detection cost by 8.18%. Finally, we find that they do exist in the differential paths constructed by the automatic tool "HashClash".



    A hash function compresses messages of arbitrary size into a fixed-length bit array. It is one of the basic components of security applications, for example, digital signatures and message authentication codes. The design of many classical hash functions such as MD5 and SHA-1 is based on the Merkle–Damgård construction [1,2]. This construction operates on the padded input message using a compression function and updating a fixed-size internal state (also called chaining value).

    A secure hash function must fulfill three properties, i.e., pre-image resistance, second pre-image resistance, and collision resistance. Collision resistance means that it is practically impossible to find two different messages with the same hash value. Much research has been carried out on the collision resistance of commonly used hash functions over the past few years. In 2005, Wang et al. proposed an identical-prefix collision attack on MD5 using modular differential [3]. This attack is based on differential cryptanalysis, which is also an essential method for analyzing block ciphers. Linear cryptanalysis, extensively used in attack block ciphers [4], is another powerful method. The key of the attack of Wang et al. is constructing differential paths that are a description of how the differences should propagate through chaining values. The limitation of the identical-prefix collision attack is that the message blocks before the colliding message need to be identical. Later, Stevens et al. overcame the shortcoming [5]. They introduced the chosen-prefix collision attack in that prefix message blocks can be chosen arbitrarily. Such an attack is more dangerous than the identical-prefix collision attack because they created two X.509 certificates with the same signature or even a rogue CA certificate [6]. Another proof of the harmfulness of the collision attack is Flame's malicious certificate. It was forged using the chosen-prefix collision attack on MD5 [7]. In addition, the adaptive chosen-prefix collision attack was used to construct a distinguishing attack on HMAC/NMAC-MD5 with the pseudo-collision [8].

    To defend against collision attacks, Stevens introduced counter-cryptanalysis, namely detecting whether a given message block is one of a carefully constructed colliding message pair [7,9]. The average complexity of Stevens' algorithm to detect the collision of MD5 and SHA-1 is about 224 times and 15 times the hashing of a message block, respectively. Later, Stevens and Shumow greatly reduced the detection complexity of SHA-1 utilizing the unavoidable bit conditions in the message generated from disturbance vectors [10]. The complexity is only 1.96 times a hashing of SHA-1. They left a public problem: how to determine unavoidable conditions for MD5. As a response, Shen et al. analyzed the properties of the Boolean function of the last round of MD5 in detail [11]. They concluded that the last round's sufficient conditions combinations (SCCs) were appropriate as the unavoidable conditions. According to SCCs, the 223 classes are divided into several sets, and 126 classes have adequate SCCs to be identified. As a result, their collision detection algorithm costs only 28.6% runtime of Stevens' algorithm. With the existing SCCs, nine sets cannot be further subdivided, and 79 classes cannot be classified. In particular, the undistinguishable 79 classes significantly affect detection complexity. Thus, a challenging problem is subdividing sets and classifying more classes.

    Our contribution. In this paper, we carefully study the properties of the Boolean function of the second round and discover that the sufficient conditions (SCs) in the second round are also suitable to be unavoidable conditions. Note that the closer to the first round of MD5, the more complex the conditions are. There may be no SCs in the opening steps of the second round that can be used for distinguishing classes. Thus, we only consider SCs between steps 31 and 28, termed the additional sufficient conditions (ASCs). After analyzing all nine sets and 79 classes, we find that three sets can be subdivided, and seven classes in the non-distinguishable set can be distinguished, as they have enough SCs when combining ASCs and existing SCCs. We propose a new collision detection algorithm implemented by integrating ASCs into Shen's algorithm. Furthermore, we use "HashClash" to confirm that the ASCs exist in the differential paths of MD5. Eventually, compared to Shen's algorithm, the complexity of our algorithm to detect collision of MD5 is reduced by 8.18%.

    The remainder of the paper is organized as follows. In Section 2, we give a short description of MD5 and summarize the related works. The properties of the Boolean function of the second round and ASCs are discussed in Section 3. In Section 4, we present the verification of ASCs and the complexity of the new detection algorithm. The paper is summarized in Section 5.

    MD5 is one of the classical hash functions. The compression function receives a 128-bit intermediate hash value (IHVin=(a,b,c,d)) and a 512-bit message block M and outputs a new 128-bit intermediate hash value (IHVout). M is divided into consecutive 32-bit words (m0,m1,,m15). The compression function consists of 64 steps which are split into four rounds that contain 16 steps each. For t=0,1,,63 each step is as follows:

    Ft=ft(Qt,Qt1,Qt2) (2.1)
    Qt+1=Qt+RL(Ft+Qt3+Ct+mt,Rt) (2.2)

    where (Q0,Q1,Q2,Q3)=(b,c,d,a) and IHVout=(a+Q61,b+Q64,c+Q63,d+Q62). In each step t, ft is the corresponding Boolean function, Ct is a fixed constant, mt is the corresponding 32-bit word, RL denotes left-rotation and Rt is the rotation constant. Qt3 can be calculated using the following formula in which RR denotes right rotation:

    Qt3=RR(Qt+1Qt,Rt)FtmtCt (2.3)

    One can find more details of MD5 in [12].

    This subsection discusses the related works and compares the related methods with our approach. The comparison results can be found in Table 1.

    Table 1.  Comparison of collision detection methods for MD5.
    Algorithm Method Complexity
    Stevens' algorithm Check 223 classes in sequence 224 times hashing a message block
    Shen's algorithm 1) Use SCCs to classify 126 classes into 22 elements
    2) Determine the element according to SCCs and check the classes in the element
    3) Check the remaining 76 classes
    72.2949 times hashing a message block (under our experimental setting)
    Our algorithm 1) Use SCCs and ASCs to classify 133 classes into 30 elements
    2) Determine the element according to SCCs and ASCs, check the classes in the element
    3) Check the remaining 69 classes
    66.3834 times hashing a message block

     | Show Table
    DownLoad: CSV

    In [7,9], Stevens introduced a collision detection algorithm. The algorithm builds on two observations. One is that there are fewer message differences that can cause collision attacks. The other is that the current differential path used in collision attacks has fixed differences in the working state of some steps. For MD5, there are 223 classes of message difference δM, step t, and working state δWSt=(Qt3,Qt2,Qt,Qt+1). One can find more details in Algorithm 1. The average detection complexity is 224 times the complexity of the compression operator of MD5.

    Algorithm 1 Stevens' collision detection algorithm
    Input: padded and split message blocks M0,...,MN1.
    Output: True if a near-collision or pseudo-collision is detected and False otherwise.
      1:  Let IHV0 be the initial chaining values.
      2:  for k=0 to N-1 do
      3:    WS0=IHVk.
      4:    Calculate all working states WS1,...,WS64 and IHVk+1=IHVk+WS64.
      5:    for each (δM,t,δWSt) do
      6:      Let Mk=Mk+δM and WSt=WSt+δWSt.
      7:      Calculate WSt1,...,WS0 backward and WSt+1,...,WS64 forward.
      8:      Let IHVk=WS0 and IHVk+1=IHVk+WS64.
      9:      if IHVk+1=IHVk+1 then
      10:        return True. //Mk and Mk is a near-collision or pseudo-collision block pair
      11:      end if
      12:    end for
      13:  end for
      14:  return False

     | Show Table
    DownLoad: CSV

    In [11], Shen et al. proposed an improved collision detection algorithm of MD5 using unavoidable conditions. They used the properties of MD5 and found that the SCs in the last round can be seen as unavoidable conditions. It is based on the observation that the corresponding SCs always remain the same once the input differences are fixed. Based on the SCCs, the 223 different classes are split into four sets, namely, distinguishable set (DS), individual checked set (ICS), non-distinguishable set (NDS), and discard set. The classes within each element in DS have identical SCCs. The same applies to the classes within each element in ICS. The SCs at the 31th bit are called main SCs (mSCs). The SCs at the 5th, 9th, 14th, or 20th bit are called auxiliary SCs (aSCs). Elements in NDS do not have enough SCs, and elements in the discard set are not appropriate for building a collision attack. The details of the algorithm are described in Algorithm 2. 126 classes in DS and ICS contribute to reducing the complexity of collision detection by 71.4%. Note that each working state difference is converted to (δQ38,δQ39,δQ40,δQ41)=(0,0,0,0) (type Ⅰ difference) or (δQ38,δQ39,δQ40,δQ41)=(231,231,231,231) (type Ⅱ difference) in [11], where δX=XX is the modular difference for 32-bit words X and X. Elements containing multiple message differences are as follows:

    1) DS1 using type Ⅱ difference: δM=0, δM=± (δm11=2b) (b{0,,31}), and δM=± (δm4=2b) (b{20,25,31}).

    2) DS2 using type Ⅱ difference: δM=(δm8=231) and δM=± (δm8=231,m11=221).

    3) DS3 using type Ⅱ difference: δM=(δm5=231) and δM=(δm5=231,δm11=231).

    4) DS4 using type Ⅱ difference: δM=(δm14=231) and δM=± (δm11=215,δm4=δm14=231).

    5) DS9 using type Ⅰ difference: δM=± (δm2=28,δm14=231), and δM=± (δm2=28,δm11=215, δm4=δm14=231).

    6) DS13 using type Ⅰ difference: δM=± (δm5=210), δM=± (δm5=210,δm11=221), and δM=± (δm5=210,δm11=231).

    7) ICS2 using type Ⅰ difference: δM=(δm14=231) and δM=± (δm11=215,δm4=δm14=231).

    8) ICS3 using type Ⅰ difference: δM=(δm5=231), δM=(δm5=231,δm11=231), and δM=(δm5=231,δm8=231).

    9) ICS8 using type Ⅱ difference: δM=± (δm5=210), δM=± (δm5=210,δm11=221), and δM=± (δm5=210,δm11=231).

    The classes in NDS are as follows:

    1) Using type Ⅰ difference: δM=± (δm11=2b) (b{0,,31}), δM=± (δm4=2b) (b{20,25,31}), δM=± (δm8=2b) (b{25,31}), and δM=± (δm8=231,δm11=221).

    2) Using type Ⅱ difference: δM=± (δm2=28,δm14=231), δM=± (δm6=28,δm9=δm15=231), and δM=± (δm2=28,δm11=215,δm4=δm14=231).

    Algorithm 2 Shen's collision detection algorithm
    Input: padded and split message blocks M0,...,MN1.
    Output: True if a near-collision or pseudo-collision is detected and False otherwise.
      1:  Let IHV0 be the initial chaining values.
      2:  for k=0 to N-1 do
      3:    (Q3,Q0,Q1,Q2)=IHVk.
      4:    Calculate Q1,...,Q64 and IHVk+1.
      5:    Obtain mSCs=Q46[31]...Q59[31] and aSCsb=Q46[b]...Q59[b] (b=5,9,14,20).
      6:    if mSCs is one of the SCs of a element E in DS or mSCs and aSCsb are one of the SCs of a element E in ICS then
      7:      for each (δM,δWS41) in E do
      8:        Calculate IHVk+1 and IHVk+1 using δM and δWS41.
      9:        if IHVk+1=IHVk+1 then
      10:          return True.
      11:        end if
      12:      end for
      13:    end if
      14:    for each element E in NDS do
      15:      for each (δM,δWS41) in E do
      16:        Calculate IHVk+1 and IHVk+1 using δM and δWS41.
      17:        if IHVk+1=IHVk+1 then
      18:          return True.
      19:        end if
      20:      end for
      21:    end for
      22:  end for
      23:  return False

     | Show Table
    DownLoad: CSV

    In this subsection, we describe the properties of the Boolean function of the second round. Let F(A,B,C)=(CA)(¯CB) and F(A,B,C)=(CA)(¯CB) where A, B, C, A, B and C are 1-bit Boolean variables. Let ΔK=KK (K{A,B,C}) and ΔF=F(A,B,C)F(A,B,C). We can get the following property.

    Property 1. If ΔA0, ΔB=0, and ΔC=0, then ΔF=0 if and only if C=C=0. If ΔA=0, ΔB0, and ΔC=0, then ΔF=0 if and only if C=C=1. If ΔA=0, ΔB=0, and ΔC0, then ΔF=0 if and only if A=B. If ΔA0, ΔB0, and ΔC0, then ΔF=0 if and only if ΔAΔB, ΔF=1 or 1 if and only if ΔA=ΔB.

    Proof. According to the truth table of ΔF, the property is easily derived. Note that ΔA =ΔB means that A=B and A=B; ΔAΔB means that AB and AB.

    When bit length is extended to 32-bit, the properties of the Boolean function F(Qt,Qt1,Qt2)=(Qt2Qt)(¯Qt2Qt1) can be summarized as follows based on property 1. Let ΔX=(X[i]X[i])N1i=0 be the signed bitwise difference and X[i] indicate to take the i-th bit of X.

    Property 2. If (δQt,δQt1,δQt2)=(2b,0,0) and δFt=0, then Qt2[b]=0.

    Proof. (δQt,δQt1,δQt2)=(2b,0,0) means that ΔQt[b]0, ΔQt1[b]=0 and ΔQt2[b]=0. δFt=0 means that ΔFt[i]=0 (0i31). According to property 1, Qt2[b]=0.

    Property 3. If (δQt,δQt1,δQt2)=(0,2b,0) and δFt=0, then Qt2[b]=1.

    Proof. (δQt,δQt1,δQt2)=(0,2b,0) means that ΔQt[b]=0, ΔQt1[b]0 and ΔQt2[b]=0. δFt=0 means that ΔFt[i]=0 (0i31). According to property 1, Qt2[b]=1.

    Property 4. If (δQt,δQt1,δQt2)=(0,0,2b) and δFt=0, then Qt[b]=Qt1[b].

    Proof. (δQt,δQt1,δQt2)=(0,0,2b) means that ΔQt[b]=0, ΔQt1[b]=0 and ΔQt2[b]0. δFt=0 means that ΔFt[i]=0 (0i31). According to property 1, Qt[b]=Qt1[b].

    Property 5. If (δQt,δQt1,δQt2)=(2b,2b,2b) and δFt=0, then ΔQt[b]ΔQt1[b].

    Proof. (δQt,δQt1,δQt2)=(2b,2b,2b) means that ΔQt[b]0, ΔQt1[b]0 and ΔQt2[b]0. δFt=0 means that ΔFt[i]=0 (0i31). According to property 1, ΔQt[b]ΔQt1[b].

    Property 6. If (δQt,δQt1,δQt2)=(2b,2b,2b) and δFt=± 2b, then ΔQt[b]=ΔQt1[b].

    Proof. (δQt,δQt1,δQt2)=(2b,2b,2b) means that ΔQt[b]0, ΔQt1[b]0 and ΔQt2[b]0. δFt=± 2b means that ΔFt[b]0. According to property 1, ΔQt[b]=ΔQt1[b].

    The conditions in the differential path must be satisfied for a collision attack. Thus, the success probability of a collision attack depends on the number of conditions. It is easy to fulfill almost all conditions in the first 25 steps with message modification techniques. Conditions in the remaining steps are randomly satisfied. We only choose SCs between steps 31 and 28 as ASCs. Because conditions in the second round are more complex than conditions in the last round, we choose the part with relatively few simple conditions. In this subsection, we describe how to deduce ASCs.

    Due to the fixed chaining value difference, we start from step 40 using Eq (2.3) to calculate the Qt3 reversely with linearizing step 40 to step 32, and we deduce the conditions from step 31 to step 28. In Eq (2.3), if δQt+1δQt0 then RR(δQt+1δQt,Rt) will rotate the difference to another bit position, which may increase the number of conditions. In the following, we refer to δQt+1δQt0 as the subtraction difference. Support δQt=± 2b and δQk (k{,t1}{t+1,}) is the chaining values adjacent to Qt. There are roughly two ways to deal with the subtraction difference:

    1) If there is no difference at bit position b in δQk, then a good choice is to eliminate the difference ± 2b at steps t+2, t+1, and t according to properties 2–4, respectively. If not eliminated, the number of conditions will increase owing to the introduced difference at bit position b. Note that the subtraction difference occurs twice in either case.

    2) If there are consecutive differences at bit position b in δQk, then we can eliminate the subtraction difference according to properties 5 and 6, namely, keeping the difference at bit position b. Thus, no conditions will be introduced because of the subtraction difference except for the initial and last subtraction difference.

    According to the two ways, we deduce the conditions of a high probability differential path with properties 2 to 6. Note that the high bit position conditions may be eliminated because of carrying expansion.

    Depending on ASCs, we find that DS9, ICS2, and ICS3 can be subdivided. DS1, DS2, DS3, DS4, DS13, and ICS8 cannot be subdivided. The reasons are as follows. The conditions of elements in DS1 are too complicated. After removing the repeated SCs, the number of remaining conditions of elements in DS2, DS3, and ICS8 is less than two. Besides, DS4 and DS13 have less than three SCs.

    The elements in NDS that have enough SCs to be distinguished are as follows: δM=± (δm8=225) including two classes using type Ⅰ difference, δM=(δm8=231) including one class using type Ⅰ difference, δM=±(δm6=28,δm9=δm15=231) including two classes using type Ⅱ difference, and δM=± (δm2=28,δm14=231) including two classes using type Ⅱ difference. All of them may have SCs located at the 31th, 27th, 22th, 17th, 11th, or 8th bits. The ASCs at the 31th bit are called main ASCs. The ASCs at other bit locations are called auxiliary ASCs. The other elements in NDS do not have enough SCs. The conditions of δM=± (δm11=2b) (b{0,,31}), δM=± (δm4=2b) (b{20,25,31}), and δM=± (δm8=231,δm11=221) (b{0,,31}) using type Ⅰ difference are too complex, and the number of SCs is less than four. δM=± (δm2=28,δm11=215,δm4=δm14=231) using type Ⅱ difference has just four SCs.

    We add the classes in DS9, ICS2, and ICS3 to the subdivided set, and the details are as follows.

    1) DS9 element has four classes and uses type Ⅰ difference. It is divided into two subsets:

    (a) DS9_1. For the message differences δM=± (δm2=28,δm14=231), chaining value differences are δQt=231 (31t32), δQt=0 (t{30,28}), δQ29=± 227, δQ27=± 217, and unavoidable sufficient conditions are Q30[27]=Q31[27], Q27[8]=Q28[8], Q28[27]=1. Note that at step 29, RR(δQ30δQ29,9)=± 218 and δQ27 can be eliminated or retained; δm2=± 28 is introduced. Therefore, δQ26=± 218±217± 28 or ± 218± 28. δQ25 has ± 222 because of δQ28δQ270.

    (b) DS9_2. For the message differences δM=± (δm2=28,δm11=215,δm4=δm14=231), chaining value differences are δQt=231 (27t35), δQ26=231± 28, δQ25=± 231, and unavoidable sufficient conditions are Qt[31]=Qt+1[31] (27t30), Q27[8]=Q28[8].

    2) ICS2 element has three classes and uses type Ⅰ difference. It is divided into two subsets:

    (a) ICS2_1. For the message difference δM=(δm14=231), chaining value differences are δQt=231 (31t32), δQt=0 (t{30,28}), δQ29=± 227, δQ27=± 217, and unavoidable sufficient conditions are Q30[27]=Q31[27], Q28[27]=1. Note that the calculation of δQ26 and δQ25 is similar to the calculation in DS9_1 except for no δm2=± 28.

    (b) ICS2_2. For the message difference δM=± (δm11=215,δm4=δm14=231), chaining value differences are δQt=231 (25t34), and unavoidable sufficient conditions are Qt[31]=Qt+1[31] (27t30).

    3) ICS3 element has three classes and uses type Ⅰ difference. It is divided into three subsets:

    (a) ICS3_1. For the message difference δM=(δm5=231), chaining value differences are δQt=0 (t{30,31}), δQt=231 (27t29), δQ26=231±222, δQ25=231, and unavoidable sufficient conditions are Q30[31]Q31[31], Qt[31]=Qt+1[31] (27t28), Q27[22]=Q28[22].

    (b) ICS3_2. For the message difference δM=(δm5=231,δm11=231), chaining value differences are δQt=231 (26t27, 29t31), δQ28=231±211, δQ25=231±26, and unavoidable sufficient conditions are Qt[31]=Qt+1[31] (27t30), Q29[11]=Q30[11], Q27[11]=1, Q26[11]=0.

    (c) ICS3_3. For the message difference δM=(δm5=231,δm8=231), chaining value differences are δQ30=231, δQ27=± 217, δQ26=± 222, δQt=0 (t{25,28,29,31}), and unavoidable sufficient conditions are Q29[31]=1, Q28[31]=0, Q28[17]=Q29[17], Q26[17]=1.

    We add seven classes to the new individual checked set (NICS). The SCs of each element consist of the SCCs in the last 16 steps and ASCs to ensure enough SCs. The details of the new elements are as follows:

    1) NICS1 element has two classes and uses type Ⅰ difference. Message difference is δM=± (δm8=225). Chaining value differences are δQt=231 ( 57t59), δQt=± 225 ( 29t30). Unavoidable sufficient conditions are Q57[31]Q59[31], Q56[31]=1, Q55[31]=0, Q28[11]=Q29[11], Q26[11]=1. Note that δQ27 has ± 211 owing to δQ31δQ300. Because at step 31, we get δQ28=± 225 or 0, the differences of δQt ( 25t28) at 25th bit are uncertain.

    2) NICS2 element has one class and uses type Ⅰ difference. Message difference is δM=(δm8=231). Chaining value differences are δQt=231 (25t26, 28t30), δQt=± 25 (57t59), δQ27=231±217. Unavoidable sufficient conditions are Q57[5]Q59[5], Q56[5]=1, Q55[5]=0, Qt[31]=Qt+1[31] (27t29), Q28[17]=Q29[17], Q26[17]=1.

    3) NICS3 element has two classes and uses type Ⅱ difference. Message difference is δM=± (δm6=28,δm9=δm15=231). Chaining value differences are δQ59=± 223± 29, δQ58=± 29, δQt=231 (25t43). Unavoidable sufficient conditions are Q57[9]=1, Q56[9]=0, Qt[31]=Qt+1[31] (27t30).

    4) NICS4 element has two classes and uses type Ⅱ difference. Message difference is δM=± (δm2=28,δm14=231). Chaining value differences are δQt=231 (30t47), δQ29=231±227, δQ27=± 217, δQ26=± 28, δQ25=± 222. Unavoidable sufficient conditions are Q49[31]=0, Q48[31]=1, Q30[27]=Q31[27], Q28[27]=1, Q27[8]=Q28[8]. The difference on 31th bit of δQt (25t28) are uncertain. Because at step 29, δQ27=± 217 can be eliminated or retained, the differences of δQ26 at 17th bit are undetermined.

    In this subsection, we discuss how ASCs can be integrated into Shen's algorithm. Our new algorithm is described in Algorithm 3. Details are as follows. In [11], all possible bit values are listed in a table using a hexadecimal representation; the key bits determine the rest of the bits; the bit mask indicates those positions where the SCs are. We use the same method to represent ASCs. Because there are only 6 bits, we use binary to represent them. The most significant bit is δQ26[b], and the least significant bit is δQ31[b] (b{31,27,22,17,11,8}). For example, ASCs of ICS_2 are Qt[31]=Qt+1[31] (27t30). They are respected by two binary values, 0b000000 and 0b011111. The bit mask is 0b011111. We choose Q30[31] and Q31[31] as key bits. Obviously, Q30[31]=Q31[31]=0 indicates 0b000000; Q30[31]=Q31[31]=1 indicates 0b011111. Two bits are used as they determine up to four different values.

    Algorithm 3 Algorithm integrated with ASC
    Input: padded and split message blocks M0,...,MN1.
    Output: True if a near-collision or pseudo-collision is detected and False otherwise.
      1:  Let IHV0 be the initial chaining values.
      2:  for k=0 to N-1 do
      3:    (Q3,Q0,Q1,Q2)=IHVk.
      4:     Calculate Q1,...,Q64 and IHVk+1.
      5:     Obtain mSCs=Q46[31]...Q59[31], aSCsb=Q46[b]...Q59[b] (b=5,9,14,20) and ASCsb=Q26[b]...Q31[b] (b=31,27,22,17,11,8).
      6:     if mSCs is one of the SCs of a element E in DS or mSCs and aSCsb are one of the SCs of a element E in ICS or mSCs, aSCsb and ASCsb is one of the SCs of a element E in NICS then
      7:       if E belongs to the Subdivided Set then
      8:         according to ASCsb, choose δM and δWS41 and calculate IHVk+1 and IHVk+1.
      9:         if IHVk+1=IHVk+1 then
      10:           return True.
      11:         end if
      12:       else
      13:         for each (δM,δWS41) in E do
      14:           Calculate IHVk+1 and IHVk+1 using δM and δWS41.
      15:           if IHVk+1=IHVk+1 then
      16:             return True.
      17:           end if
      18:         end for
      19:       end if
      20:     end if
      21:     for each element E in NDS do
      22:       for each (δM,δWS41) in E do
      23:         Calculate IHVk+1 and IHVk+1 using δM and δWS41 in E.
      24:         if IHVk+1=IHVk+1 then
      25:           return True.
      26:         end if
      27:       end for
      28:     end for
      29:   end for
      30:   return False

     | Show Table
    DownLoad: CSV

    For the elements in the subdivided set, conditions are specified using key bits according to the table of SCCs in [11]. Then, ASCs are used to determine which message difference these conditions belong to. Therefore, it is no longer necessary to calculate the IHVout of all messages difference, in turn, to detect a collision. ASCs of the subdivided set are listed in Table 2. The column where the value of SCs is located is determined by the key bits. For instance, the key bits are 11, which means column 3. Note that the condition Q27[8]=Q28[8] exists both in DS9_1 and DS9_2, so it is not used. DS9 and ICS2 conditions are identical in Table 2, so they share a 2×5 array in implementation. ICS3 conditions are saved in a 6×5 array. For the key bits, ICS3_1* and ICS3_2 are Q29[b] and Q30[b]; ICS3_1 and ICS3_3 are Q27[b] and Q28[b]; ICS3_3* are Q28[b] and Q29[b]; all others are Q30[b] and Q31[b].

    Table 2.  ASCs of the subdivided set.
    Element C[i][0] C[i][1] C[i][2] C[i][3] Bit mask
    DS9_1 0b001000 -1 -1 0b001011 0b001011
    DS9_2 0b000000 -1 -1 0b011111 0b011111
    ICS2_1 0b001000 -1 -1 0b001011 0b001011
    ICS2_2 0b000000 -1 -1 0b011111 0b011111
    ICS3_1* 0b000001 0b000010 0b011101 0b011110 0b011111
    ICS3_1 0b000000 -1 -1 0b011000 0b011000
    ICS3_2* 0b000000 -1 -1 0b011111 0b011111
    ICS3_2 0b010000 -1 -1 0b010110 0b110110
    ICS3_3* -1 0b000100 -1 -1 0b001100
    ICS3_3 0b100000 -1 -1 0b111000 0b111000
    * is the main ASCs. is the auxiliary ASCs. -1 denotes that there is no value.

     | Show Table
    DownLoad: CSV

    We add NICS to the end of ICS in the original algorithm. Namely, after checking ICS, the algorithm will process the elements in NICS. SCs in NICS are listed in Table 3. Note that the values of NICS1 and NICS1 are identical to NICS2 and NICS2, respectively. They are merged in the implementation. For the key bits, NICS1, NICS2*, NICS2 and NICS3* are Q28[b] and Q29[b]; NICS4* are Q29[b] and Q30[b]; NICS4 are Q27[b] and Q28[b]; NICS1 and NICS2 are Q57[b] and Q59[b].

    Table 3.  SCCs and ASCs of the new individual checked set.
    Element C[i][0] C[i][1] C[i][2] C[i][3] Bit mask
    NICS1 -1 0x00009 0x0000c -1 0x0001d
    NICS1 0b100000 -1 -1 0b101100 0b101100
    NICS2 -1 0x00009 0x0000c -1 0x0001d
    NICS2* 0b000000 -1 -1 0b011110 0b011110
    NICS2 0b100000 -1 -1 0b101100 0b101100
    NICS3 0x00004 -1 -1 -1 0x00004
    NICS3* 0b000000 -1 -1 0b011111 0b011111
    NICS4 0x20000 -1 -1 -1 0x20000
    NICS4* 0b001000 -1 -1 0b001011 0b001011
    NICS4 0b000000 -1 -1 0b011000 0b011000
    is the SCCs of the last 16 steps.

     | Show Table
    DownLoad: CSV

    We use "HashClash" developed by Stevens to verify whether ASCs exist in the differential path. "HashClash" using the MIT License is a tool to construct differential paths for MD5 and SHA-1, also including the chosen-prefix collision attack on MD5. It consists of three parts, i.e., forward extension, backward extension, and connection. One can find the source code of "HashClash" in [13].

    To achieve the maximum complexity of conditions as much as possible, we set the parameter as follows:

    1) Autobalance=1,000,000. The parameter limits the maximum number of paths that can be saved per step;

    2) Estimate=4. The program will evaluate the maximum number of conditions under 4×1,000,000 available paths.

    3) Maxweight=32 and Maxsdrs=2000. Maxweight controls the carrying extension length; Maxsdrs limits the total number of ΔQ. Such a setting means no limitation to carrying extension of δQ.

    Other parameters are the default values. The backward program of MD5 is used to construct differential paths starting from step 40 to step 16. After observing the paths with the minimum and maximum conditions, we found that all ASCs existed. This result suggests that the conditions found are sufficient. One can find the results and our new algorithm code in https://github.com/Finsenty54/Improved-Collision-Detection-Of-MD5.

    The algorithm of Shen et al. takes 78.44 times the runtime of hashing a message block to detect collisions in theory. In this paper, as seven classes can be distinguished which make a major contribution to improving the detection efficiency, the total complexity of the new collision algorithm is about 71.44 times of computing the MD5 hash of a message block. Compared with the original algorithm, the complexity is reduced by 8.92%.

    The C++ code of the collision detection algorithm with ASCs was compiled by g++ 10.2.1 under -O3 optimization and was run on AMD Ryzen 5 2600 at 3.40 GHz under Parrot OS 4.11. We conducted three experiments on the original code of Shen et al. and our code, respectively. The total number of messages generated is 224. The results are shown in Table 4. The average checks number of the original algorithm is 72.2949 in contrast to 66.3834 of our new algorithm. The experiments show that it reduces the runtime by 8.18% with ASCs.

    Table 4.  Comparison of the original and our new algorithm.
    Experiments Generate random message [× 106/ms] Standard MD5 hash [× 106/ms] Collision detection [× 108/ms] Average check number [multiples of one hashing operation]
    Original test1 3.27323 5.11324 1.35930 72.0957
    Original test2 3.27849 5.10581 1.35466 72.3395
    Original test3 3.26173 5.09832 1.36322 72.4496
    New test1 3.22541 5.07214 1.26397 66.6971
    New test2 3.25677 5.11430 1.27354 66.4313
    New test3 3.21775 5.07999 1.26165 66.0214

     | Show Table
    DownLoad: CSV

    In this paper, we derive the sufficient conditions between steps 31 and 28 of MD5 based on the properties of the Boolean function of the second round and describe the reverse deduction ways. Combining the additional sufficient conditions, we improve Shen's detection algorithm by subdividing three sets and distinguishing seven more classes. After adding the subdivided set and NICS to their algorithm, we get our new detection algorithm. The cost of collision detection of MD5 is reduced by 8.18%. This study further supports the idea of using unavoidable conditions to accelerate collision detection in previous research. The experimental results of Shen's algorithm differ from those claimed in [11]. This inconsistency may be due to the various experimental setting.

    This work was supported by the Zhejiang Provincial Natural Science Foundation of China [Grant No. LQ20F020019] and the Technology on Communication Security Laboratory [Grant No. 6142103190105].

    All authors declare no conflicts of interest in this paper.



    [1] I. B. Damgård, A design principle for hash functions, in Advances in Cryptology –- CRYPTO' 89 Proceedings (ed. G. Brassard), Springer, New York, NY, (1990), 416–427. https://doi.org/10.1007/0-387-34805-0_39
    [2] R. C. Merkle, One way hash functions and DES, in Advances in Cryptology –- CRYPTO' 89 Proceedings (ed. G. Brassard), Springer, New York, NY, (1990), 428–446. https://doi.org/10.1007/0-387-34805-0_40
    [3] X. Wang, H. Yu, How to break MD5 and other hash functions, in Advances in Cryptology – EUROCRYPT 2005 (ed. R. Cramer), Springer, Berlin, Heidelberg, (2005), 19–35. https://doi.org/10.1007/11426639_2
    [4] A. D. Dwivedi, S. Dhar, G. Srivastava, R. Singh, Cryptanalysis of round-reduced fantomas, robin and iSCREAM, Cryptography, 3 (2019), 4. https://doi.org/10.3390/cryptography3010004 doi: 10.3390/cryptography3010004
    [5] M. Stevens, A. Lenstra, B. de Weger, Chosen-prefix collisions for MD5 and colliding X.509 certificates for different identities, in Advances in Cryptology - EUROCRYPT 2007 (ed. M. Naor), Springer, Berlin, Heidelberg, (2007), 1–22. https://doi.org/10.1007/978-3-540-72540-4_1
    [6] M. Stevens, A. Sotirov, J. Appelbaum, A. Lenstra, D. Molnar, D. A. Osvik, et al., Short chosen-prefix collisions for MD5 and the creation of a rogue CA certificate, in Advances in Cryptology - CRYPTO 2009 (ed. S. Halevi), Springer, Berlin, Heidelberg, (2009), 55–69. https://doi.org/10.1007/978-3-642-03356-8_4
    [7] M. Stevens, Counter-cryptanalysis, in Advances in Cryptology – CRYPTO 2013 (eds. R. Canetti and J. A. Garay), Springer, Berlin, Heidelberg, (2013), 129–146. https://doi.org/10.1007/978-3-642-40041-4_8
    [8] X. Wang, H. Yu, W. Wang, H. Zhang, T. Zhan, Cryptanalysis on HMAC/NMAC-MD5 and MD5-MAC, in Advances in Cryptology - EUROCRYPT 2009 (ed. A. Joux), Springer, Berlin, Heidelberg, (2009), 121–133. https://doi.org/10.1007/978-3-642-01001-9_7
    [9] M. Stevens, Attacks on hash functions and applications, Ph.D thesis, Leiden University in Leiden, 2012. https://doi.org/10.6100/ir749233
    [10] M. Stevens, D. Shumow, Speeding up detection of {SHA-1} collision attacks using unavoidable attack conditions, in 26th USENIX Security Symposium (USENIX Security 17), USENIX Association, Vancouver, BC, (2017), 881–897. https://doi.org/10.5555/3241189.3241259
    [11] Y. Shen, T. Wu, G. Wang, X. Dong, H. Qian, Improved collision detection of MD5 using sufficient condition combination, Comput. J., (2021). https://doi.org/10.1093/comjnl/bxab109 doi: 10.1093/comjnl/bxab109
    [12] R. L. Rivest, The MD5 message-digest algorithm, RFC, 1321 (1992), 1–21. https://doi.org/10.17487/RFC1321 doi: 10.17487/RFC1321
    [13] M. Stevens, Project HashClash - MD5 & SHA-1 cryptanalytic toolbox [Internet]. Available from: https: //github.com/cr-marcstevens/hashclash.
  • This article has been cited by:

    1. D. Paul Joseph, P. Viswanathan, SDOT: Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-Based Three-Tier Forensic Classification Framework, 2023, 11, 2169-3536, 3291, 10.1109/ACCESS.2023.3234434
    2. Donagani Ramakrishna, Mohammed Ali Shaik, A Comprehensive Analysis of Cryptographic Algorithms: Evaluating Security, Efficiency, and Future Challenges, 2025, 13, 2169-3536, 11576, 10.1109/ACCESS.2024.3518533
  • Reader Comments
  • © 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2357) PDF downloads(133) Cited by(2)

Figures and Tables

Tables(4)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog