One application of counter-cryptanalysis is detecting whether a message block is involved in a collision attack, such as the detection of MD5 and SHA-1. Stevens and Shumow speeded up the detection of SHA-1 by introducing unavoidable conditions in message blocks. They left a challenge: how to determine unavoidable conditions for MD5. Later, Shen et al. found that the unavoidable conditions of MD5 were the sufficient conditions located in the last round of differential paths. In this paper, we made further work. We discover sufficient conditions in the second round that can also be used as unavoidable conditions. With additional sufficient conditions, we subdivide three sets and distinguish seven more classes. As a result, compared with Shen's collision detection algorithm, our improved algorithm reduces the collision detection cost by 8.18%. Finally, we find that they do exist in the differential paths constructed by the automatic tool "HashClash".
Miao Luo, Yousong Chen, Dawei Gao, Lijun Wang .
Inversion study of vehicle frontal collision and front bumper collision. Electronic Research Archive, 2023, 31(2): 776-792.
doi: 10.3934/era.2023039
[2]
Shao-Yuan Huang, Wei-Hsun Lee .
Sufficient conditions for exact bifurcation curves in Minkowski curvature problems and their applications. Electronic Research Archive, 2025, 33(4): 2325-2351.
doi: 10.3934/era.2025103
[3]
Xuanfeng Li, Jiajia Yu .
Joint attention mechanism for the design of anti-bird collision accident detection system. Electronic Research Archive, 2022, 30(12): 4401-4415.
doi: 10.3934/era.2022223
[4]
Zhaoyan Meng, Shuting Lyu, Mengqing Zhang, Xining Li, Qimin Zhang .
Sufficient and necessary conditions of near-optimal controls for a stochastic listeriosis model with spatial diffusion. Electronic Research Archive, 2024, 32(5): 3059-3091.
doi: 10.3934/era.2024140
[5]
Milena Dimova, Natalia Kolkovska, Nikolai Kutev .
Global behavior of the solutions to nonlinear Klein-Gordon equation with critical initial energy. Electronic Research Archive, 2020, 28(2): 671-689.
doi: 10.3934/era.2020035
[6]
Yang Gao, Qingzhong Ji .
On the inverse stability of $ z^n+c $. Electronic Research Archive, 2025, 33(3): 1414-1428.
doi: 10.3934/era.2025066
[7]
Hui Xu, Xinyang Zhao, Qiyun Yin, Junting Dou, Ruopeng Liu, Wengang Wang .
Isolating switch state detection system based on depth information guidance. Electronic Research Archive, 2024, 32(2): 836-856.
doi: 10.3934/era.2024040
[8]
Yuri Prokhorov .
Conic bundle structures on $ \mathbb{Q} $-Fano threefolds. Electronic Research Archive, 2022, 30(5): 1881-1897.
doi: 10.3934/era.2022095
[9]
Jin Wang, Jun-E Feng, Hua-Lin Huang .
Solvability of the matrix equation $ AX^{2} = B $ with semi-tensor product. Electronic Research Archive, 2021, 29(3): 2249-2267.
doi: 10.3934/era.2020114
[10]
Xun Wang, Qunyi Bie .
Energy equality for the multi-dimensional nonhomogeneous incompressible Hall-MHD equations in a bounded domain. Electronic Research Archive, 2023, 31(1): 17-36.
doi: 10.3934/era.2023002
Abstract
One application of counter-cryptanalysis is detecting whether a message block is involved in a collision attack, such as the detection of MD5 and SHA-1. Stevens and Shumow speeded up the detection of SHA-1 by introducing unavoidable conditions in message blocks. They left a challenge: how to determine unavoidable conditions for MD5. Later, Shen et al. found that the unavoidable conditions of MD5 were the sufficient conditions located in the last round of differential paths. In this paper, we made further work. We discover sufficient conditions in the second round that can also be used as unavoidable conditions. With additional sufficient conditions, we subdivide three sets and distinguish seven more classes. As a result, compared with Shen's collision detection algorithm, our improved algorithm reduces the collision detection cost by 8.18%. Finally, we find that they do exist in the differential paths constructed by the automatic tool "HashClash".
1.
Introduction
A hash function compresses messages of arbitrary size into a fixed-length bit array. It is one of the basic components of security applications, for example, digital signatures and message authentication codes. The design of many classical hash functions such as MD5 and SHA-1 is based on the Merkle–Damgård construction [1,2]. This construction operates on the padded input message using a compression function and updating a fixed-size internal state (also called chaining value).
A secure hash function must fulfill three properties, i.e., pre-image resistance, second pre-image resistance, and collision resistance. Collision resistance means that it is practically impossible to find two different messages with the same hash value. Much research has been carried out on the collision resistance of commonly used hash functions over the past few years. In 2005, Wang et al. proposed an identical-prefix collision attack on MD5 using modular differential [3]. This attack is based on differential cryptanalysis, which is also an essential method for analyzing block ciphers. Linear cryptanalysis, extensively used in attack block ciphers [4], is another powerful method. The key of the attack of Wang et al. is constructing differential paths that are a description of how the differences should propagate through chaining values. The limitation of the identical-prefix collision attack is that the message blocks before the colliding message need to be identical. Later, Stevens et al. overcame the shortcoming [5]. They introduced the chosen-prefix collision attack in that prefix message blocks can be chosen arbitrarily. Such an attack is more dangerous than the identical-prefix collision attack because they created two X.509 certificates with the same signature or even a rogue CA certificate [6]. Another proof of the harmfulness of the collision attack is Flame's malicious certificate. It was forged using the chosen-prefix collision attack on MD5 [7]. In addition, the adaptive chosen-prefix collision attack was used to construct a distinguishing attack on HMAC/NMAC-MD5 with the pseudo-collision [8].
To defend against collision attacks, Stevens introduced counter-cryptanalysis, namely detecting whether a given message block is one of a carefully constructed colliding message pair [7,9]. The average complexity of Stevens' algorithm to detect the collision of MD5 and SHA-1 is about 224 times and 15 times the hashing of a message block, respectively. Later, Stevens and Shumow greatly reduced the detection complexity of SHA-1 utilizing the unavoidable bit conditions in the message generated from disturbance vectors [10]. The complexity is only 1.96 times a hashing of SHA-1. They left a public problem: how to determine unavoidable conditions for MD5. As a response, Shen et al. analyzed the properties of the Boolean function of the last round of MD5 in detail [11]. They concluded that the last round's sufficient conditions combinations (SCCs) were appropriate as the unavoidable conditions. According to SCCs, the 223 classes are divided into several sets, and 126 classes have adequate SCCs to be identified. As a result, their collision detection algorithm costs only 28.6% runtime of Stevens' algorithm. With the existing SCCs, nine sets cannot be further subdivided, and 79 classes cannot be classified. In particular, the undistinguishable 79 classes significantly affect detection complexity. Thus, a challenging problem is subdividing sets and classifying more classes.
Our contribution. In this paper, we carefully study the properties of the Boolean function of the second round and discover that the sufficient conditions (SCs) in the second round are also suitable to be unavoidable conditions. Note that the closer to the first round of MD5, the more complex the conditions are. There may be no SCs in the opening steps of the second round that can be used for distinguishing classes. Thus, we only consider SCs between steps 31 and 28, termed the additional sufficient conditions (ASCs). After analyzing all nine sets and 79 classes, we find that three sets can be subdivided, and seven classes in the non-distinguishable set can be distinguished, as they have enough SCs when combining ASCs and existing SCCs. We propose a new collision detection algorithm implemented by integrating ASCs into Shen's algorithm. Furthermore, we use "HashClash" to confirm that the ASCs exist in the differential paths of MD5. Eventually, compared to Shen's algorithm, the complexity of our algorithm to detect collision of MD5 is reduced by 8.18%.
The remainder of the paper is organized as follows. In Section 2, we give a short description of MD5 and summarize the related works. The properties of the Boolean function of the second round and ASCs are discussed in Section 3. In Section 4, we present the verification of ASCs and the complexity of the new detection algorithm. The paper is summarized in Section 5.
2.
Preliminaries
2.1. Description of MD5
MD5 is one of the classical hash functions. The compression function receives a 128-bit intermediate hash value (IHVin=(a,b,c,d)) and a 512-bit message block M and outputs a new 128-bit intermediate hash value (IHVout). M is divided into consecutive 32-bit words (m0,m1,…,m15). The compression function consists of 64 steps which are split into four rounds that contain 16 steps each. For t=0,1,…,63 each step is as follows:
Ft=ft(Qt,Qt−1,Qt−2)
(2.1)
Qt+1=Qt+RL(Ft+Qt−3+Ct+mt,Rt)
(2.2)
where (Q0,Q−1,Q−2,Q−3)=(b,c,d,a) and IHVout=(a+Q61,b+Q64,c+Q63,d+Q62). In each step t, ft is the corresponding Boolean function, Ct is a fixed constant, mt is the corresponding 32-bit word, RL denotes left-rotation and Rt is the rotation constant. Qt−3 can be calculated using the following formula in which RR denotes right rotation:
This subsection discusses the related works and compares the related methods with our approach. The comparison results can be found in Table 1.
Table 1.
Comparison of collision detection methods for MD5.
Algorithm
Method
Complexity
Stevens' algorithm
Check 223 classes in sequence
224 times hashing a message block
Shen's algorithm
1) Use SCCs to classify 126 classes into 22 elements 2) Determine the element according to SCCs and check the classes in the element 3) Check the remaining 76 classes
72.2949 times hashing a message block (under our experimental setting)
Our algorithm
1) Use SCCs and ASCs to classify 133 classes into 30 elements 2) Determine the element according to SCCs and ASCs, check the classes in the element 3) Check the remaining 69 classes
In [7,9], Stevens introduced a collision detection algorithm. The algorithm builds on two observations. One is that there are fewer message differences that can cause collision attacks. The other is that the current differential path used in collision attacks has fixed differences in the working state of some steps. For MD5, there are 223 classes of message difference δM, step t, and working state δWSt=(Qt−3,Qt−2,Qt,Qt+1). One can find more details in Algorithm 1. The average detection complexity is 224 times the complexity of the compression operator of MD5.
In [11], Shen et al. proposed an improved collision detection algorithm of MD5 using unavoidable conditions. They used the properties of MD5 and found that the SCs in the last round can be seen as unavoidable conditions. It is based on the observation that the corresponding SCs always remain the same once the input differences are fixed. Based on the SCCs, the 223 different classes are split into four sets, namely, distinguishable set (DS), individual checked set (ICS), non-distinguishable set (NDS), and discard set. The classes within each element in DS have identical SCCs. The same applies to the classes within each element in ICS. The SCs at the 31th bit are called main SCs (mSCs). The SCs at the 5th, 9th, 14th, or 20th bit are called auxiliary SCs (aSCs). Elements in NDS do not have enough SCs, and elements in the discard set are not appropriate for building a collision attack. The details of the algorithm are described in Algorithm 2. 126 classes in DS and ICS contribute to reducing the complexity of collision detection by 71.4%. Note that each working state difference is converted to (δQ38,δQ39,δQ40,δQ41)=(0,0,0,0) (type Ⅰ difference) or (δQ38,δQ39,δQ40,δQ41)=(231,231,231,231) (type Ⅱ difference) in [11], where δX=X′−X is the modular difference for 32-bit words X and X′. Elements containing multiple message differences are as follows:
1) DS1 using type Ⅱ difference: δM=0, δM=±(δm11=2b)(b∈{0,…,31}), and δM=±(δm4=2b)(b∈{20,25,31}).
2) DS2 using type Ⅱ difference: δM=(δm8=231) and δM=±(δm8=231,m11=221).
3) DS3 using type Ⅱ difference: δM=(δm5=231) and δM=(δm5=231,δm11=231).
4) DS4 using type Ⅱ difference: δM=(δm14=231) and δM=±(δm11=215,δm4=δm14=231).
5) DS9 using type Ⅰ difference: δM=±(δm2=28,δm14=231), and δM=±(δm2=28,δm11=215, δm4=δm14=231).
6) DS13 using type Ⅰ difference: δM=±(δm5=210), δM=±(δm5=210,δm11=221), and δM=±(δm5=210,δm11=231).
7) ICS2 using type Ⅰ difference: δM=(δm14=231) and δM=±(δm11=215,δm4=δm14=231).
8) ICS3 using type Ⅰ difference: δM=(δm5=231), δM=(δm5=231,δm11=231), and δM=(δm5=231,δm8=231).
9) ICS8 using type Ⅱ difference: δM=±(δm5=210), δM=±(δm5=210,δm11=221), and δM=±(δm5=210,δm11=231).
The classes in NDS are as follows:
1) Using type Ⅰ difference: δM=±(δm11=2b)(b∈{0,…,31}), δM=±(δm4=2b)(b∈{20,25,31}), δM=±(δm8=2b)(b∈{25,31}), and δM=±(δm8=231,δm11=221).
2) Using type Ⅱ difference: δM=±(δm2=28,δm14=231), δM=±(δm6=28,δm9=δm15=231), and δM=±(δm2=28,δm11=215,δm4=δm14=231).
Algorithm 2 Shen's collision detection algorithm
Input: padded and split message blocks M0,...,MN−1.
Output:True if a near-collision or pseudo-collision is detected and False otherwise.
1: Let IHV0 be the initial chaining values.
2: for k=0 to N-1 do
3: (Q−3,Q0,Q−1,Q−2)=IHVk.
4: Calculate Q1,...,Q64 and IHVk+1.
5: Obtain mSCs=Q46[31]‖...‖Q59[31] and aSCsb=Q46[b]‖...‖Q59[b](b=5,9,14,20).
6: if mSCs is one of the SCs of a element E in DS or mSCs and aSCsb are one of the SCs of a element E in ICS then
7: for each (δM,δWS41) in Edo
8: Calculate IHV′k+1 and IHVk+1 using δM and δWS41.
9: ifIHV′k+1=IHVk+1then
10: return True.
11: end if
12: end for
13: end if
14: for each element E in NDS do
15: for each (δM,δWS41) in Edo
16: Calculate IHV′k+1 and IHVk+1 using δM and δWS41.
In this subsection, we describe the properties of the Boolean function of the second round. Let F(A,B,C)=(C∧A)⊕(¯C∧B) and F(A′,B′,C′)=(C′∧A′)⊕(¯C′∧B′) where A, B, C, A′, B′ and C′ are 1-bit Boolean variables. Let ΔK=K′−K(K∈{A,B,C}) and ΔF=F(A′,B′,C′)−F(A,B,C). We can get the following property.
Property 1. If ΔA≠0, ΔB=0, and ΔC=0, then ΔF=0 if and only if C=C′=0. If ΔA=0, ΔB≠0, and ΔC=0, then ΔF=0 if and only if C=C′=1. If ΔA=0, ΔB=0, and ΔC≠0, then ΔF=0 if and only if A=B. If ΔA≠0, ΔB≠0, and ΔC≠0, then ΔF=0 if and only if ΔA≠ΔB, ΔF=1 or −1 if and only if ΔA=ΔB.
Proof. According to the truth table of ΔF, the property is easily derived. Note that ΔA=ΔB means that A=B and A′=B′; ΔA≠ΔB means that A≠B and A′≠B′.
When bit length is extended to 32-bit, the properties of the Boolean function F(Qt,Qt−1,Qt−2)=(Qt−2∧Qt)⊕(¯Qt−2∧Qt−1) can be summarized as follows based on property 1. Let ΔX=(X′[i]−X[i])N−1i=0 be the signed bitwise difference and X[i] indicate to take the i-th bit of X.
Property 2. If (δQt,δQt−1,δQt−2)=(2b,0,0) and δFt=0, then Qt−2[b]=0.
Proof.(δQt,δQt−1,δQt−2)=(2b,0,0) means that ΔQt[b]≠0, ΔQt−1[b]=0 and ΔQt−2[b]=0. δFt=0 means that ΔFt[i]=0(0≤i≤31). According to property 1, Qt−2[b]=0.
Property 3. If (δQt,δQt−1,δQt−2)=(0,2b,0) and δFt=0, then Qt−2[b]=1.
Proof.(δQt,δQt−1,δQt−2)=(0,2b,0) means that ΔQt[b]=0, ΔQt−1[b]≠0 and ΔQt−2[b]=0. δFt=0 means that ΔFt[i]=0(0≤i≤31). According to property 1, Qt−2[b]=1.
Property 4. If (δQt,δQt−1,δQt−2)=(0,0,2b) and δFt=0, then Qt[b]=Qt−1[b].
Proof.(δQt,δQt−1,δQt−2)=(0,0,2b) means that ΔQt[b]=0, ΔQt−1[b]=0 and ΔQt−2[b]≠0. δFt=0 means that ΔFt[i]=0(0≤i≤31). According to property 1, Qt[b]=Qt−1[b].
Property 5. If (δQt,δQt−1,δQt−2)=(2b,2b,2b) and δFt=0, then ΔQt[b]≠ΔQt−1[b].
Proof.(δQt,δQt−1,δQt−2)=(2b,2b,2b) means that ΔQt[b]≠0, ΔQt−1[b]≠0 and ΔQt−2[b]≠0. δFt=0 means that ΔFt[i]=0(0≤i≤31). According to property 1, ΔQt[b]≠ΔQt−1[b].
Property 6. If (δQt,δQt−1,δQt−2)=(2b,2b,2b) and δFt=±2b, then ΔQt[b]=ΔQt−1[b].
Proof.(δQt,δQt−1,δQt−2)=(2b,2b,2b) means that ΔQt[b]≠0, ΔQt−1[b]≠0 and ΔQt−2[b]≠0. δFt=±2b means that ΔFt[b]≠0. According to property 1, ΔQt[b]=ΔQt−1[b].
3.2. Additional sufficient conditions
The conditions in the differential path must be satisfied for a collision attack. Thus, the success probability of a collision attack depends on the number of conditions. It is easy to fulfill almost all conditions in the first 25 steps with message modification techniques. Conditions in the remaining steps are randomly satisfied. We only choose SCs between steps 31 and 28 as ASCs. Because conditions in the second round are more complex than conditions in the last round, we choose the part with relatively few simple conditions. In this subsection, we describe how to deduce ASCs.
Due to the fixed chaining value difference, we start from step 40 using Eq (2.3) to calculate the Qt−3 reversely with linearizing step 40 to step 32, and we deduce the conditions from step 31 to step 28. In Eq (2.3), if δQt+1−δQt≠0 then RR(δQt+1−δQt,Rt) will rotate the difference to another bit position, which may increase the number of conditions. In the following, we refer to δQt+1−δQt≠0 as the subtraction difference. Support δQt=±2b and δQk(k∈{…,t−1}∪{t+1,…}) is the chaining values adjacent to Qt. There are roughly two ways to deal with the subtraction difference:
1) If there is no difference at bit position b in δQk, then a good choice is to eliminate the difference ±2b at steps t+2, t+1, and t according to properties 2–4, respectively. If not eliminated, the number of conditions will increase owing to the introduced difference at bit position b. Note that the subtraction difference occurs twice in either case.
2) If there are consecutive differences at bit position b in δQk, then we can eliminate the subtraction difference according to properties 5 and 6, namely, keeping the difference at bit position b. Thus, no conditions will be introduced because of the subtraction difference except for the initial and last subtraction difference.
According to the two ways, we deduce the conditions of a high probability differential path with properties 2 to 6. Note that the high bit position conditions may be eliminated because of carrying expansion.
Depending on ASCs, we find that DS9, ICS2, and ICS3 can be subdivided. DS1, DS2, DS3, DS4, DS13, and ICS8 cannot be subdivided. The reasons are as follows. The conditions of elements in DS1 are too complicated. After removing the repeated SCs, the number of remaining conditions of elements in DS2, DS3, and ICS8 is less than two. Besides, DS4 and DS13 have less than three SCs.
The elements in NDS that have enough SCs to be distinguished are as follows: δM=±(δm8=225) including two classes using type Ⅰ difference, δM=(δm8=231) including one class using type Ⅰ difference, δM=±(δm6=28,δm9=δm15=231) including two classes using type Ⅱ difference, and δM=±(δm2=28,δm14=231) including two classes using type Ⅱ difference. All of them may have SCs located at the 31th, 27th, 22th, 17th, 11th, or 8th bits. The ASCs at the 31th bit are called main ASCs. The ASCs at other bit locations are called auxiliary ASCs. The other elements in NDS do not have enough SCs. The conditions of δM=±(δm11=2b)(b∈{0,…,31}), δM=±(δm4=2b)(b∈{20,25,31}), and δM=±(δm8=231,δm11=221)(b∈{0,…,31}) using type Ⅰ difference are too complex, and the number of SCs is less than four. δM=±(δm2=28,δm11=215,δm4=δm14=231) using type Ⅱ difference has just four SCs.
3.2.1. The subdivided set
We add the classes in DS9, ICS2, and ICS3 to the subdivided set, and the details are as follows.
1) DS9 element has four classes and uses type Ⅰ difference. It is divided into two subsets:
(a) DS9_1. For the message differences δM=±(δm2=28,δm14=231), chaining value differences are δQt=231(31≤t≤32), δQt=0(t∈{30,28}), δQ29=±227, δQ27=±217, and unavoidable sufficient conditions are Q30[27]=Q31[27], Q27[8]=Q28[8], Q28[27]=1. Note that at step 29, RR(δQ30−δQ29,9)=±218 and δQ27 can be eliminated or retained; δm2=±28 is introduced. Therefore, δQ26=±218±217±28 or ±218±28. δQ25 has ±222 because of δQ28−δQ27≠0.
(b) DS9_2. For the message differences δM=±(δm2=28,δm11=215,δm4=δm14=231), chaining value differences are δQt=231(27≤t≤35), δQ26=231±28, δQ25=±231, and unavoidable sufficient conditions are Qt[31]=Qt+1[31](27≤t≤30), Q27[8]=Q28[8].
2) ICS2 element has three classes and uses type Ⅰ difference. It is divided into two subsets:
(a) ICS2_1. For the message difference δM=(δm14=231), chaining value differences are δQt=231(31≤t≤32), δQt=0(t∈{30,28}), δQ29=±227, δQ27=±217, and unavoidable sufficient conditions are Q30[27]=Q31[27], Q28[27]=1. Note that the calculation of δQ26 and δQ25 is similar to the calculation in DS9_1 except for no δm2=±28.
(b) ICS2_2. For the message difference δM=±(δm11=215,δm4=δm14=231), chaining value differences are δQt=231(25≤t≤34), and unavoidable sufficient conditions are Qt[31]=Qt+1[31](27≤t≤30).
3) ICS3 element has three classes and uses type Ⅰ difference. It is divided into three subsets:
(a) ICS3_1. For the message difference δM=(δm5=231), chaining value differences are δQt=0(t∈{30,31}), δQt=231(27≤t≤29), δQ26=231±222, δQ25=231, and unavoidable sufficient conditions are Q30[31]≠Q31[31], Qt[31]=Qt+1[31](27≤t≤28), Q27[22]=Q28[22].
(b) ICS3_2. For the message difference δM=(δm5=231,δm11=231), chaining value differences are δQt=231(26≤t≤27,29≤t≤31), δQ28=231±211, δQ25=231±26, and unavoidable sufficient conditions are Qt[31]=Qt+1[31](27≤t≤30), Q29[11]=Q30[11], Q27[11]=1, Q26[11]=0.
(c) ICS3_3. For the message difference δM=(δm5=231,δm8=231), chaining value differences are δQ30=231, δQ27=±217, δQ26=±222, δQt=0(t∈{25,28,29,31}), and unavoidable sufficient conditions are Q29[31]=1, Q28[31]=0, Q28[17]=Q29[17], Q26[17]=1.
3.2.2. The new individual checked set
We add seven classes to the new individual checked set (NICS). The SCs of each element consist of the SCCs in the last 16 steps and ASCs to ensure enough SCs. The details of the new elements are as follows:
1) NICS1 element has two classes and uses type Ⅰ difference. Message difference is δM=±(δm8=225). Chaining value differences are δQt=231(57≤t≤59), δQt=±225(29≤t≤30). Unavoidable sufficient conditions are Q57[31]≠Q59[31], Q56[31]=1, Q55[31]=0, Q28[11]=Q29[11], Q26[11]=1. Note that δQ27 has ±211 owing to δQ31−δQ30≠0. Because at step 31, we get δQ28=±225 or 0, the differences of δQt(25≤t≤28) at 25th bit are uncertain.
2) NICS2 element has one class and uses type Ⅰ difference. Message difference is δM=(δm8=231). Chaining value differences are δQt=231(25≤t≤26,28≤t≤30), δQt=±25(57≤t≤59), δQ27=231±217. Unavoidable sufficient conditions are Q57[5]≠Q59[5], Q56[5]=1, Q55[5]=0, Qt[31]=Qt+1[31](27≤t≤29), Q28[17]=Q29[17], Q26[17]=1.
3) NICS3 element has two classes and uses type Ⅱ difference. Message difference is δM=±(δm6=28,δm9=δm15=231). Chaining value differences are δQ59=±223±29, δQ58=±29, δQt=231(25≤t≤43). Unavoidable sufficient conditions are Q57[9]=1, Q56[9]=0, Qt[31]=Qt+1[31](27≤t≤30).
4) NICS4 element has two classes and uses type Ⅱ difference. Message difference is δM=±(δm2=28,δm14=231). Chaining value differences are δQt=231(30≤t≤47), δQ29=231±227, δQ27=±217, δQ26=±28, δQ25=±222. Unavoidable sufficient conditions are Q49[31]=0, Q48[31]=1, Q30[27]=Q31[27], Q28[27]=1, Q27[8]=Q28[8]. The difference on 31th bit of δQt(25≤t≤28) are uncertain. Because at step 29, δQ27=±217 can be eliminated or retained, the differences of δQ26 at 17th bit are undetermined.
3.3. Our new algorithm and implementation
In this subsection, we discuss how ASCs can be integrated into Shen's algorithm. Our new algorithm is described in Algorithm 3. Details are as follows. In [11], all possible bit values are listed in a table using a hexadecimal representation; the key bits determine the rest of the bits; the bit mask indicates those positions where the SCs are. We use the same method to represent ASCs. Because there are only 6 bits, we use binary to represent them. The most significant bit is δQ26[b], and the least significant bit is δQ31[b](b∈{31,27,22,17,11,8}). For example, ASCs of ICS_2 are Qt[31]=Qt+1[31](27≤t≤30). They are respected by two binary values, 0b000000 and 0b011111. The bit mask is 0b011111. We choose Q30[31] and Q31[31] as key bits. Obviously, Q30[31]=Q31[31]=0 indicates 0b000000; Q30[31]=Q31[31]=1 indicates 0b011111. Two bits are used as they determine up to four different values.
Algorithm 3 Algorithm integrated with ASC
Input: padded and split message blocks M0,...,MN−1.
Output:True if a near-collision or pseudo-collision is detected and False otherwise.
1: Let IHV0 be the initial chaining values.
2: for k=0 to N-1 do
3: (Q−3,Q0,Q−1,Q−2)=IHVk.
4: Calculate Q1,...,Q64 and IHVk+1.
5: Obtain mSCs=Q46[31]‖...‖Q59[31], aSCsb=Q46[b]‖...‖Q59[b](b=5,9,14,20) and ASCsb=Q26[b]‖...‖Q31[b](b=31,27,22,17,11,8).
6: if mSCs is one of the SCs of a element E in DS or mSCs and aSCsb are one of the SCs of a element E in ICS or mSCs, aSCsb and ASCsb is one of the SCs of a element E in NICS then
7: ifE belongs to the Subdivided Set then
8: according to ASCsb, choose δM and δWS41 and calculate IHV′k+1 and IHVk+1.
9: ifIHV′k+1=IHVk+1then
10: return True.
11: end if
12: else
13: for each (δM,δWS41) in Edo
14: Calculate IHV′k+1 and IHVk+1 using δM and δWS41.
15: ifIHV′k+1=IHVk+1then
16: return True.
17: end if
18: end for
19: end if
20: end if
21: for each element E in NDS do
22: for each (δM,δWS41) in Edo
23: Calculate IHV′k+1 and IHVk+1 using δM and δWS41 in E.
For the elements in the subdivided set, conditions are specified using key bits according to the table of SCCs in [11]. Then, ASCs are used to determine which message difference these conditions belong to. Therefore, it is no longer necessary to calculate the IHVout of all messages difference, in turn, to detect a collision. ASCs of the subdivided set are listed in Table 2. The column where the value of SCs is located is determined by the key bits. For instance, the key bits are 11, which means column 3. Note that the condition Q27[8]=Q28[8] exists both in DS9_1 and DS9_2, so it is not used. DS9 and ICS2 conditions are identical in Table 2, so they share a 2×5 array in implementation. ICS3 conditions are saved in a 6×5 array. For the key bits, ICS3_1* and ICS3_2† are Q29[b] and Q30[b]; ICS3_1† and ICS3_3† are Q27[b] and Q28[b]; ICS3_3* are Q28[b] and Q29[b]; all others are Q30[b] and Q31[b].
Table 2.
ASCs of the subdivided set.
Element
C[i][0]
C[i][1]
C[i][2]
C[i][3]
Bit mask
DS9_1
0b001000
-1
-1
0b001011
0b001011
DS9_2
0b000000
-1
-1
0b011111
0b011111
ICS2_1
0b001000
-1
-1
0b001011
0b001011
ICS2_2
0b000000
-1
-1
0b011111
0b011111
ICS3_1*
0b000001
0b000010
0b011101
0b011110
0b011111
ICS3_1†
0b000000
-1
-1
0b011000
0b011000
ICS3_2*
0b000000
-1
-1
0b011111
0b011111
ICS3_2†
0b010000
-1
-1
0b010110
0b110110
ICS3_3*
-1
0b000100
-1
-1
0b001100
ICS3_3†
0b100000
-1
-1
0b111000
0b111000
* is the main ASCs. † is the auxiliary ASCs. -1 denotes that there is no value.
We add NICS to the end of ICS in the original algorithm. Namely, after checking ICS, the algorithm will process the elements in NICS. SCs in NICS are listed in Table 3. Note that the values of NICS1‡ and NICS1† are identical to NICS2‡ and NICS2†, respectively. They are merged in the implementation. For the key bits, NICS1†, NICS2*, NICS2† and NICS3* are Q28[b] and Q29[b]; NICS4* are Q29[b] and Q30[b]; NICS4† are Q27[b] and Q28[b]; NICS1‡ and NICS2‡ are Q57[b] and Q59[b].
Table 3.
SCCs and ASCs of the new individual checked set.
We use "HashClash" developed by Stevens to verify whether ASCs exist in the differential path. "HashClash" using the MIT License is a tool to construct differential paths for MD5 and SHA-1, also including the chosen-prefix collision attack on MD5. It consists of three parts, i.e., forward extension, backward extension, and connection. One can find the source code of "HashClash" in [13].
To achieve the maximum complexity of conditions as much as possible, we set the parameter as follows:
1)Autobalance=1,000,000. The parameter limits the maximum number of paths that can be saved per step;
2)Estimate=4. The program will evaluate the maximum number of conditions under 4×1,000,000 available paths.
3)Maxweight=32 and Maxsdrs=2000. Maxweight controls the carrying extension length; Maxsdrs limits the total number of ΔQ. Such a setting means no limitation to carrying extension of δQ.
Other parameters are the default values. The backward program of MD5 is used to construct differential paths starting from step 40 to step 16. After observing the paths with the minimum and maximum conditions, we found that all ASCs existed. This result suggests that the conditions found are sufficient. One can find the results and our new algorithm code in https://github.com/Finsenty54/Improved-Collision-Detection-Of-MD5.
4.2. Complexity
The algorithm of Shen et al. takes 78.44 times the runtime of hashing a message block to detect collisions in theory. In this paper, as seven classes can be distinguished which make a major contribution to improving the detection efficiency, the total complexity of the new collision algorithm is about 71.44 times of computing the MD5 hash of a message block. Compared with the original algorithm, the complexity is reduced by 8.92%.
The C++ code of the collision detection algorithm with ASCs was compiled by g++ 10.2.1 under -O3 optimization and was run on AMD Ryzen 5 2600 at 3.40 GHz under Parrot OS 4.11. We conducted three experiments on the original code of Shen et al. and our code, respectively. The total number of messages generated is 224. The results are shown in Table 4. The average checks number of the original algorithm is 72.2949 in contrast to 66.3834 of our new algorithm. The experiments show that it reduces the runtime by 8.18% with ASCs.
Table 4.
Comparison of the original and our new algorithm.
Experiments
Generate random message [×106/ms]
Standard MD5 hash [×106/ms]
Collision detection [×108/ms]
Average check number [multiples of one hashing operation]
In this paper, we derive the sufficient conditions between steps 31 and 28 of MD5 based on the properties of the Boolean function of the second round and describe the reverse deduction ways. Combining the additional sufficient conditions, we improve Shen's detection algorithm by subdividing three sets and distinguishing seven more classes. After adding the subdivided set and NICS to their algorithm, we get our new detection algorithm. The cost of collision detection of MD5 is reduced by 8.18%. This study further supports the idea of using unavoidable conditions to accelerate collision detection in previous research. The experimental results of Shen's algorithm differ from those claimed in [11]. This inconsistency may be due to the various experimental setting.
Acknowledgments
This work was supported by the Zhejiang Provincial Natural Science Foundation of China [Grant No. LQ20F020019] and the Technology on Communication Security Laboratory [Grant No. 6142103190105].
Conflict of interest
All authors declare no conflicts of interest in this paper.
References
[1]
I. B. Damgård, A design principle for hash functions, in Advances in Cryptology –- CRYPTO' 89 Proceedings (ed. G. Brassard), Springer, New York, NY, (1990), 416–427. https://doi.org/10.1007/0-387-34805-0_39
[2]
R. C. Merkle, One way hash functions and DES, in Advances in Cryptology –- CRYPTO' 89 Proceedings (ed. G. Brassard), Springer, New York, NY, (1990), 428–446. https://doi.org/10.1007/0-387-34805-0_40
[3]
X. Wang, H. Yu, How to break MD5 and other hash functions, in Advances in Cryptology – EUROCRYPT 2005 (ed. R. Cramer), Springer, Berlin, Heidelberg, (2005), 19–35. https://doi.org/10.1007/11426639_2
[4]
A. D. Dwivedi, S. Dhar, G. Srivastava, R. Singh, Cryptanalysis of round-reduced fantomas, robin and iSCREAM, Cryptography, 3 (2019), 4. https://doi.org/10.3390/cryptography3010004 doi: 10.3390/cryptography3010004
[5]
M. Stevens, A. Lenstra, B. de Weger, Chosen-prefix collisions for MD5 and colliding X.509 certificates for different identities, in Advances in Cryptology - EUROCRYPT 2007 (ed. M. Naor), Springer, Berlin, Heidelberg, (2007), 1–22. https://doi.org/10.1007/978-3-540-72540-4_1
[6]
M. Stevens, A. Sotirov, J. Appelbaum, A. Lenstra, D. Molnar, D. A. Osvik, et al., Short chosen-prefix collisions for MD5 and the creation of a rogue CA certificate, in Advances in Cryptology - CRYPTO 2009 (ed. S. Halevi), Springer, Berlin, Heidelberg, (2009), 55–69. https://doi.org/10.1007/978-3-642-03356-8_4
[7]
M. Stevens, Counter-cryptanalysis, in Advances in Cryptology – CRYPTO 2013 (eds. R. Canetti and J. A. Garay), Springer, Berlin, Heidelberg, (2013), 129–146. https://doi.org/10.1007/978-3-642-40041-4_8
[8]
X. Wang, H. Yu, W. Wang, H. Zhang, T. Zhan, Cryptanalysis on HMAC/NMAC-MD5 and MD5-MAC, in Advances in Cryptology - EUROCRYPT 2009 (ed. A. Joux), Springer, Berlin, Heidelberg, (2009), 121–133. https://doi.org/10.1007/978-3-642-01001-9_7
[9]
M. Stevens, Attacks on hash functions and applications, Ph.D thesis, Leiden University in Leiden, 2012. https://doi.org/10.6100/ir749233
[10]
M. Stevens, D. Shumow, Speeding up detection of {SHA-1} collision attacks using unavoidable attack conditions, in 26th USENIX Security Symposium (USENIX Security 17), USENIX Association, Vancouver, BC, (2017), 881–897. https://doi.org/10.5555/3241189.3241259
[11]
Y. Shen, T. Wu, G. Wang, X. Dong, H. Qian, Improved collision detection of MD5 using sufficient condition combination, Comput. J., (2021). https://doi.org/10.1093/comjnl/bxab109 doi: 10.1093/comjnl/bxab109
[12]
R. L. Rivest, The MD5 message-digest algorithm, RFC, 1321 (1992), 1–21. https://doi.org/10.17487/RFC1321 doi: 10.17487/RFC1321
[13]
M. Stevens, Project HashClash - MD5 & SHA-1 cryptanalytic toolbox [Internet]. Available from: https: //github.com/cr-marcstevens/hashclash.
This article has been cited by:
1.
D. Paul Joseph, P. Viswanathan,
SDOT: Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-Based Three-Tier Forensic Classification Framework,
2023,
11,
2169-3536,
3291,
10.1109/ACCESS.2023.3234434
2.
Donagani Ramakrishna, Mohammed Ali Shaik,
A Comprehensive Analysis of Cryptographic Algorithms: Evaluating Security, Efficiency, and Future Challenges,
2025,
13,
2169-3536,
11576,
10.1109/ACCESS.2024.3518533
Table 1.
Comparison of collision detection methods for MD5.
Algorithm
Method
Complexity
Stevens' algorithm
Check 223 classes in sequence
224 times hashing a message block
Shen's algorithm
1) Use SCCs to classify 126 classes into 22 elements 2) Determine the element according to SCCs and check the classes in the element 3) Check the remaining 76 classes
72.2949 times hashing a message block (under our experimental setting)
Our algorithm
1) Use SCCs and ASCs to classify 133 classes into 30 elements 2) Determine the element according to SCCs and ASCs, check the classes in the element 3) Check the remaining 69 classes
Input: padded and split message blocks M0,...,MN−1.
Output:True if a near-collision or pseudo-collision is detected and False otherwise.
1: Let IHV0 be the initial chaining values.
2: for k=0 to N-1 do
3: (Q−3,Q0,Q−1,Q−2)=IHVk.
4: Calculate Q1,...,Q64 and IHVk+1.
5: Obtain mSCs=Q46[31]‖...‖Q59[31], aSCsb=Q46[b]‖...‖Q59[b](b=5,9,14,20) and ASCsb=Q26[b]‖...‖Q31[b](b=31,27,22,17,11,8).
6: if mSCs is one of the SCs of a element E in DS or mSCs and aSCsb are one of the SCs of a element E in ICS or mSCs, aSCsb and ASCsb is one of the SCs of a element E in NICS then
7: ifE belongs to the Subdivided Set then
8: according to ASCsb, choose δM and δWS41 and calculate IHV′k+1 and IHVk+1.
9: ifIHV′k+1=IHVk+1then
10: return True.
11: end if
12: else
13: for each (δM,δWS41) in Edo
14: Calculate IHV′k+1 and IHVk+1 using δM and δWS41.
15: ifIHV′k+1=IHVk+1then
16: return True.
17: end if
18: end for
19: end if
20: end if
21: for each element E in NDS do
22: for each (δM,δWS41) in Edo
23: Calculate IHV′k+1 and IHVk+1 using δM and δWS41 in E.
1) Use SCCs to classify 126 classes into 22 elements 2) Determine the element according to SCCs and check the classes in the element 3) Check the remaining 76 classes
72.2949 times hashing a message block (under our experimental setting)
Our algorithm
1) Use SCCs and ASCs to classify 133 classes into 30 elements 2) Determine the element according to SCCs and ASCs, check the classes in the element 3) Check the remaining 69 classes
Input: padded and split message blocks M0,...,MN−1.
Output:True if a near-collision or pseudo-collision is detected and False otherwise.
1: Let IHV0 be the initial chaining values.
2: for k=0 to N-1 do
3: WS0=IHVk.
4: Calculate all working states WS1,...,WS64 and IHVk+1=IHVk+WS64.
5: for each (δM,t,δWSt)do
6: Let M′k=Mk+δM and WS′t=WSt+δWSt.
7: Calculate WS′t−1,...,WS′0 backward and WS′t+1,...,WS′64 forward.
8: Let IHV′k=WS′0 and IHV′k+1=IHV′k+WS′64.
9: ifIHV′k+1=IHVk+1then
10: return True. //Mk and M′k is a near-collision or pseudo-collision block pair
11: end if
12: end for
13: end for
14: return False
Algorithm 2 Shen's collision detection algorithm
Input: padded and split message blocks M0,...,MN−1.
Output:True if a near-collision or pseudo-collision is detected and False otherwise.
1: Let IHV0 be the initial chaining values.
2: for k=0 to N-1 do
3: (Q−3,Q0,Q−1,Q−2)=IHVk.
4: Calculate Q1,...,Q64 and IHVk+1.
5: Obtain mSCs=Q46[31]‖...‖Q59[31] and aSCsb=Q46[b]‖...‖Q59[b](b=5,9,14,20).
6: if mSCs is one of the SCs of a element E in DS or mSCs and aSCsb are one of the SCs of a element E in ICS then
7: for each (δM,δWS41) in Edo
8: Calculate IHV′k+1 and IHVk+1 using δM and δWS41.
9: ifIHV′k+1=IHVk+1then
10: return True.
11: end if
12: end for
13: end if
14: for each element E in NDS do
15: for each (δM,δWS41) in Edo
16: Calculate IHV′k+1 and IHVk+1 using δM and δWS41.
17: ifIHV′k+1=IHVk+1then
18: return True.
19: end if
20: end for
21: end for
22: end for
23: return False
Algorithm 3 Algorithm integrated with ASC
Input: padded and split message blocks M0,...,MN−1.
Output:True if a near-collision or pseudo-collision is detected and False otherwise.
1: Let IHV0 be the initial chaining values.
2: for k=0 to N-1 do
3: (Q−3,Q0,Q−1,Q−2)=IHVk.
4: Calculate Q1,...,Q64 and IHVk+1.
5: Obtain mSCs=Q46[31]‖...‖Q59[31], aSCsb=Q46[b]‖...‖Q59[b](b=5,9,14,20) and ASCsb=Q26[b]‖...‖Q31[b](b=31,27,22,17,11,8).
6: if mSCs is one of the SCs of a element E in DS or mSCs and aSCsb are one of the SCs of a element E in ICS or mSCs, aSCsb and ASCsb is one of the SCs of a element E in NICS then
7: ifE belongs to the Subdivided Set then
8: according to ASCsb, choose δM and δWS41 and calculate IHV′k+1 and IHVk+1.
9: ifIHV′k+1=IHVk+1then
10: return True.
11: end if
12: else
13: for each (δM,δWS41) in Edo
14: Calculate IHV′k+1 and IHVk+1 using δM and δWS41.
15: ifIHV′k+1=IHVk+1then
16: return True.
17: end if
18: end for
19: end if
20: end if
21: for each element E in NDS do
22: for each (δM,δWS41) in Edo
23: Calculate IHV′k+1 and IHVk+1 using δM and δWS41 in E.
24: ifIHV′k+1=IHVk+1then
25: return True.
26: end if
27: end for
28: end for
29: end for
30: return False
Element
C[i][0]
C[i][1]
C[i][2]
C[i][3]
Bit mask
DS9_1
0b001000
-1
-1
0b001011
0b001011
DS9_2
0b000000
-1
-1
0b011111
0b011111
ICS2_1
0b001000
-1
-1
0b001011
0b001011
ICS2_2
0b000000
-1
-1
0b011111
0b011111
ICS3_1*
0b000001
0b000010
0b011101
0b011110
0b011111
ICS3_1†
0b000000
-1
-1
0b011000
0b011000
ICS3_2*
0b000000
-1
-1
0b011111
0b011111
ICS3_2†
0b010000
-1
-1
0b010110
0b110110
ICS3_3*
-1
0b000100
-1
-1
0b001100
ICS3_3†
0b100000
-1
-1
0b111000
0b111000
* is the main ASCs. † is the auxiliary ASCs. -1 denotes that there is no value.
Element
C[i][0]
C[i][1]
C[i][2]
C[i][3]
Bit mask
NICS1‡
-1
0x00009
0x0000c
-1
0x0001d
NICS1†
0b100000
-1
-1
0b101100
0b101100
NICS2‡
-1
0x00009
0x0000c
-1
0x0001d
NICS2*
0b000000
-1
-1
0b011110
0b011110
NICS2†
0b100000
-1
-1
0b101100
0b101100
NICS3‡
0x00004
-1
-1
-1
0x00004
NICS3*
0b000000
-1
-1
0b011111
0b011111
NICS4‡
0x20000
-1
-1
-1
0x20000
NICS4*
0b001000
-1
-1
0b001011
0b001011
NICS4†
0b000000
-1
-1
0b011000
0b011000
‡ is the SCCs of the last 16 steps.
Experiments
Generate random message [×106/ms]
Standard MD5 hash [×106/ms]
Collision detection [×108/ms]
Average check number [multiples of one hashing operation]