Improved collision detection of MD5 with additional sufficient conditions

Linan Fang; Ting Wu; Yongxing Qi; Yanzhao Shen; Peng Zhang; Mingmin Lin; Xinfeng Dong; Linan Fang; Ting Wu; Yongxing Qi; Yanzhao Shen; Peng Zhang; Mingmin Lin; Xinfeng Dong

doi:10.3934/era.2022102

Electronic Research Archive

2022, Volume 30, Issue 6: 2018-2032. doi: 10.3934/era.2022102

Previous Article Next Article

Research article Special Issues

Improved collision detection of MD5 with additional sufficient conditions

1.
School of Cyberspace, Hangzhou Dianzi University, Hangzhou 310018, China
2.
Hangzhou Innovation Institute, Beihang University, Hangzhou 310020, China
3.
Shandong Institute of Blockchain, Jinan 250102, China
4.
Hangzhou Hikvision Digital Technology Company Limited, Hangzhou 310051, China
5.
Science and Technology on Communication Security Laboratory, Chengdu 610041, China

Received: 04 March 2022 Revised: 25 March 2022 Accepted: 05 April 2022 Published: 13 April 2022

One application of counter-cryptanalysis is detecting whether a message block is involved in a collision attack, such as the detection of MD5 and SHA-1. Stevens and Shumow speeded up the detection of SHA-1 by introducing unavoidable conditions in message blocks. They left a challenge: how to determine unavoidable conditions for MD5. Later, Shen et al. found that the unavoidable conditions of MD5 were the sufficient conditions located in the last round of differential paths. In this paper, we made further work. We discover sufficient conditions in the second round that can also be used as unavoidable conditions. With additional sufficient conditions, we subdivide three sets and distinguish seven more classes. As a result, compared with Shen's collision detection algorithm, our improved algorithm reduces the collision detection cost by 8.18%. Finally, we find that they do exist in the differential paths constructed by the automatic tool "HashClash".

Keywords:

Citation: Linan Fang, Ting Wu, Yongxing Qi, Yanzhao Shen, Peng Zhang, Mingmin Lin, Xinfeng Dong. Improved collision detection of MD5 with additional sufficient conditions[J]. Electronic Research Archive, 2022, 30(6): 2018-2032. doi: 10.3934/era.2022102

Related Papers:

[1]	Miao Luo, Yousong Chen, Dawei Gao, Lijun Wang . Inversion study of vehicle frontal collision and front bumper collision. Electronic Research Archive, 2023, 31(2): 776-792. doi: 10.3934/era.2023039
[2]	Shao-Yuan Huang, Wei-Hsun Lee . Sufficient conditions for exact bifurcation curves in Minkowski curvature problems and their applications. Electronic Research Archive, 2025, 33(4): 2325-2351. doi: 10.3934/era.2025103
[3]	Xuanfeng Li, Jiajia Yu . Joint attention mechanism for the design of anti-bird collision accident detection system. Electronic Research Archive, 2022, 30(12): 4401-4415. doi: 10.3934/era.2022223
[4]	Zhaoyan Meng, Shuting Lyu, Mengqing Zhang, Xining Li, Qimin Zhang . Sufficient and necessary conditions of near-optimal controls for a stochastic listeriosis model with spatial diffusion. Electronic Research Archive, 2024, 32(5): 3059-3091. doi: 10.3934/era.2024140
[5]	Milena Dimova, Natalia Kolkovska, Nikolai Kutev . Global behavior of the solutions to nonlinear Klein-Gordon equation with critical initial energy. Electronic Research Archive, 2020, 28(2): 671-689. doi: 10.3934/era.2020035
[6]	Yang Gao, Qingzhong Ji . On the inverse stability of $ z^n+c $. Electronic Research Archive, 2025, 33(3): 1414-1428. doi: 10.3934/era.2025066
[7]	Hui Xu, Xinyang Zhao, Qiyun Yin, Junting Dou, Ruopeng Liu, Wengang Wang . Isolating switch state detection system based on depth information guidance. Electronic Research Archive, 2024, 32(2): 836-856. doi: 10.3934/era.2024040
[8]	Yuri Prokhorov . Conic bundle structures on $ \mathbb{Q} $-Fano threefolds. Electronic Research Archive, 2022, 30(5): 1881-1897. doi: 10.3934/era.2022095
[9]	Jin Wang, Jun-E Feng, Hua-Lin Huang . Solvability of the matrix equation $ AX^{2} = B $ with semi-tensor product. Electronic Research Archive, 2021, 29(3): 2249-2267. doi: 10.3934/era.2020114
[10]	Xun Wang, Qunyi Bie . Energy equality for the multi-dimensional nonhomogeneous incompressible Hall-MHD equations in a bounded domain. Electronic Research Archive, 2023, 31(1): 17-36. doi: 10.3934/era.2023002

Abstract

1. Introduction

A hash function compresses messages of arbitrary size into a fixed-length bit array. It is one of the basic components of security applications, for example, digital signatures and message authentication codes. The design of many classical hash functions such as MD5 and SHA-1 is based on the Merkle–Damgård construction ^[1,2]. This construction operates on the padded input message using a compression function and updating a fixed-size internal state (also called chaining value).

A secure hash function must fulfill three properties, i.e., pre-image resistance, second pre-image resistance, and collision resistance. Collision resistance means that it is practically impossible to find two different messages with the same hash value. Much research has been carried out on the collision resistance of commonly used hash functions over the past few years. In 2005, Wang et al. proposed an identical-prefix collision attack on MD5 using modular differential ^[3]. This attack is based on differential cryptanalysis, which is also an essential method for analyzing block ciphers. Linear cryptanalysis, extensively used in attack block ciphers ^[4], is another powerful method. The key of the attack of Wang et al. is constructing differential paths that are a description of how the differences should propagate through chaining values. The limitation of the identical-prefix collision attack is that the message blocks before the colliding message need to be identical. Later, Stevens et al. overcame the shortcoming ^[5]. They introduced the chosen-prefix collision attack in that prefix message blocks can be chosen arbitrarily. Such an attack is more dangerous than the identical-prefix collision attack because they created two X.509 certificates with the same signature or even a rogue CA certificate ^[6]. Another proof of the harmfulness of the collision attack is Flame's malicious certificate. It was forged using the chosen-prefix collision attack on MD5 ^[7]. In addition, the adaptive chosen-prefix collision attack was used to construct a distinguishing attack on HMAC/NMAC-MD5 with the pseudo-collision ^[8].

To defend against collision attacks, Stevens introduced counter-cryptanalysis, namely detecting whether a given message block is one of a carefully constructed colliding message pair ^[7,9]. The average complexity of Stevens' algorithm to detect the collision of MD5 and SHA-1 is about 224 times and 15 times the hashing of a message block, respectively. Later, Stevens and Shumow greatly reduced the detection complexity of SHA-1 utilizing the unavoidable bit conditions in the message generated from disturbance vectors ^[10]. The complexity is only 1.96 times a hashing of SHA-1. They left a public problem: how to determine unavoidable conditions for MD5. As a response, Shen et al. analyzed the properties of the Boolean function of the last round of MD5 in detail ^[11]. They concluded that the last round's sufficient conditions combinations (SCCs) were appropriate as the unavoidable conditions. According to SCCs, the 223 classes are divided into several sets, and 126 classes have adequate SCCs to be identified. As a result, their collision detection algorithm costs only 28.6% runtime of Stevens' algorithm. With the existing SCCs, nine sets cannot be further subdivided, and 79 classes cannot be classified. In particular, the undistinguishable 79 classes significantly affect detection complexity. Thus, a challenging problem is subdividing sets and classifying more classes.

Our contribution. In this paper, we carefully study the properties of the Boolean function of the second round and discover that the sufficient conditions (SCs) in the second round are also suitable to be unavoidable conditions. Note that the closer to the first round of MD5, the more complex the conditions are. There may be no SCs in the opening steps of the second round that can be used for distinguishing classes. Thus, we only consider SCs between steps 31 and 28, termed the additional sufficient conditions (ASCs). After analyzing all nine sets and 79 classes, we find that three sets can be subdivided, and seven classes in the non-distinguishable set can be distinguished, as they have enough SCs when combining ASCs and existing SCCs. We propose a new collision detection algorithm implemented by integrating ASCs into Shen's algorithm. Furthermore, we use "HashClash" to confirm that the ASCs exist in the differential paths of MD5. Eventually, compared to Shen's algorithm, the complexity of our algorithm to detect collision of MD5 is reduced by 8.18%.

The remainder of the paper is organized as follows. In Section 2, we give a short description of MD5 and summarize the related works. The properties of the Boolean function of the second round and ASCs are discussed in Section 3. In Section 4, we present the verification of ASCs and the complexity of the new detection algorithm. The paper is summarized in Section 5.

2. Preliminaries

2.1. Description of MD5

MD5 is one of the classical hash functions. The compression function receives a 128-bit intermediate hash value ( $IHV_{in} = (a, b, c, d)$ ) and a 512-bit message block $M$ and outputs a new 128-bit intermediate hash value ( $IHV_{out}$ ). $M$ is divided into consecutive 32-bit words $(m_0, m_1, \dots, m_{15})$ . The compression function consists of 64 steps which are split into four rounds that contain 16 steps each. For $t = 0, 1, \dots, 63$ each step is as follows:

$\begin{equation} F_t = f_{t}(Q_t, Q_{t-1}, Q_{t-2}) \end{equation}$

(2.1)

$\begin{equation} Q_{t+1} = Q_t+RL(F_t+Q_{t-3}+C_t+m_t,R_t) \end{equation}$

(2.2)

where $(Q_0, Q_{-1}, Q_{-2}, Q_{-3}) = (b, c, d, a)$ and $IHV_{out} = (a+Q_{61}, b+Q_{64}, c+Q_{63}, d+Q_{62})$ . In each step $t$ , $f_t$ is the corresponding Boolean function, $C_t$ is a fixed constant, $m_t$ is the corresponding 32-bit word, $RL$ denotes left-rotation and $R_t$ is the rotation constant. $Q_{t-3}$ can be calculated using the following formula in which $RR$ denotes right rotation:

$\begin{equation} Q_{t-3} = RR(Q_{t+1}-Q_t,R_t)-F_t-m_t-C_t \end{equation}$

(2.3)

One can find more details of MD5 in ^[12].

2.2. Related works

This subsection discusses the related works and compares the related methods with our approach. The comparison results can be found in Table 1.

Table 1. Comparison of collision detection methods for MD5.

Algorithm	Method	Complexity
Stevens' algorithm	Check 223 classes in sequence	224 times hashing a message block
Shen's algorithm	$1)$ Use SCCs to classify 126 classes into 22 elements $2)$ Determine the element according to SCCs and check the classes in the element $3)$ Check the remaining 76 classes	72.2949 times hashing a message block (under our experimental setting)
Our algorithm	$1)$ Use SCCs and ASCs to classify 133 classes into 30 elements $2)$ Determine the element according to SCCs and ASCs, check the classes in the element $3)$ Check the remaining 69 classes	66.3834 times hashing a message block

| Show Table

DownLoad: CSV

2.2.1. Stevens' collision detection algorithm

In ^[7,9], Stevens introduced a collision detection algorithm. The algorithm builds on two observations. One is that there are fewer message differences that can cause collision attacks. The other is that the current differential path used in collision attacks has fixed differences in the working state of some steps. For MD5, there are 223 classes of message difference $\delta M$ , step $t$ , and working state $\delta WS_t = (Q_{t-3}, Q_{t-2}, Q_t, Q_{t+1})$ . One can find more details in Algorithm 1. The average detection complexity is 224 times the complexity of the compression operator of MD5.

Algorithm 1 Stevens' collision detection algorithm
Input: padded and split message blocks $M_0, ..., M_{N-1}$ .
Output: ${\texttt{True}}$ if a near-collision or pseudo-collision is detected and ${\texttt{False}}$ otherwise.
1: Let $IHV_0$ be the initial chaining values.
2: for k=0 to N-1 do
3: $WS_0=IHV_k$ .
4: Calculate all working states $WS_1, ..., WS_{64}$ and $IHV_{k+1}=IHV_k+WS_{64}$ .
5: for each $(\delta M, t, \delta WS_t)$ do
6: Let $M_k^{\prime}=M_k+\delta M$ and $WS_t^{\prime}=WS_t+\delta WS_t$ .
7: Calculate $WS_{t-1}^{\prime}, ..., WS_{0}^{\prime}$ backward and $WS_{t+1}^{\prime}, ..., WS_{64}^{\prime}$ forward.
8: Let $IHV_k^{\prime}=WS_0^{\prime}$ and $IHV_{k+1}^{\prime}=IHV_k^{\prime}+WS_{64}^{\prime}$ .
9: if $IHV_{k+1}^{\prime}=IHV_{k+1}$ then
10: return ${\texttt{True}}$ . // $M_k$ and $M_k^{\prime}$ is a near-collision or pseudo-collision block pair
11: end if
12: end for
13: end for
14: return ${\texttt{False}}$

| Show Table

DownLoad: CSV

2.2.2. Shen's collision detection algorithm

In ^[11], Shen et al. proposed an improved collision detection algorithm of MD5 using unavoidable conditions. They used the properties of MD5 and found that the SCs in the last round can be seen as unavoidable conditions. It is based on the observation that the corresponding SCs always remain the same once the input differences are fixed. Based on the SCCs, the 223 different classes are split into four sets, namely, distinguishable set (DS), individual checked set (ICS), non-distinguishable set (NDS), and discard set. The classes within each element in DS have identical SCCs. The same applies to the classes within each element in ICS. The SCs at the 31th bit are called main SCs (mSCs). The SCs at the 5th, 9th, 14th, or 20th bit are called auxiliary SCs (aSCs). Elements in NDS do not have enough SCs, and elements in the discard set are not appropriate for building a collision attack. The details of the algorithm are described in . 126 classes in DS and ICS contribute to reducing the complexity of collision detection by 71.4%. Note that each working state difference is converted to $(\delta Q_{38}, \delta Q_{39}, \delta Q_{40}, \delta Q_{41}) = (0, 0, 0, 0)$ (type Ⅰ difference) or $(\delta Q_{38}, \delta Q_{39}, \delta Q_{40}, \delta Q_{41}) = (2^{31}, 2^{31}, 2^{31}, 2^{31})$ (type Ⅱ difference) in ^[11], where $\delta X = X^{\prime}-X$ is the modular difference for 32-bit words $X$ and $X^{\prime}$ . Elements containing multiple message differences are as follows:

$1)$ DS1 using type Ⅱ difference: $\delta M = 0$ , $\delta M = \pm$ $(\delta m_{11} = 2^b)\ (b\in\{0, \ldots, 31\})$ , and $\delta M = \pm$ $(\delta m_4 = 2^b)\ (b\in\{20, 25, 31\})$ .

$2)$ DS2 using type Ⅱ difference: $\delta M = (\delta m_8 = 2^{31})$ and $\delta M = \pm$ $(\delta m_8 = 2^{31}, m_{11} = 2^{21})$ .

$3)$ DS3 using type Ⅱ difference: $\delta M = (\delta m_5 = 2^{31})$ and $\delta M = (\delta m_5 = 2^{31}, \delta m_{11} = 2^{31})$ .

$4)$ DS4 using type Ⅱ difference: $\delta M = (\delta m_{14} = 2^{31})$ and $\delta M = \pm$ $(\delta m_{11} = 2^{15}, \delta m_4 = \delta m_{14} = 2^{31})$ .

$5)$ DS9 using type Ⅰ difference: $\delta M = \pm$ $(\delta m_2 = 2^8, \delta m_{14} = 2^{31})$ , and $\delta M = \pm$ $(\delta m_2 = 2^8, \delta m_{11} = 2^{15}$ , $\delta m_4 = \delta m_{14} = 2^{31})$ .

$6)$ DS13 using type Ⅰ difference: $\delta M = \pm$ $(\delta m_5 = 2^{10})$ , $\delta M = \pm$ $(\delta m_5 = 2^{10}, \delta m_{11} = 2^{21})$ , and $\delta M = \pm$ $(\delta m_5 = 2^{10}, \delta m_{11} = 2^{31})$ .

$7)$ ICS2 using type Ⅰ difference: $\delta M = (\delta m_{14} = 2^{31})$ and $\delta M = \pm$ $(\delta m_{11} = 2^{15}, \delta m_4 = \delta m_{14} = 2^{31})$ .

$8)$ ICS3 using type Ⅰ difference: $\delta M = (\delta m_5 = 2^{31})$ , $\delta M = (\delta m_5 = 2^{31}, \delta m_{11} = 2^{31})$ , and $\delta M = (\delta m_5 = 2^{31}, \delta m_8 = 2^{31})$ .

$9)$ ICS8 using type Ⅱ difference: $\delta M = \pm$ $(\delta m_5 = 2^{10})$ , $\delta M = \pm$ $(\delta m_5 = 2^{10}, \delta m_{11} = 2^{21})$ , and $\delta M = \pm$ $(\delta m_5 = 2^{10}, \delta m_{11} = 2^{31})$ .

The classes in NDS are as follows:

$1)$ Using type Ⅰ difference: $\delta M = \pm$ $(\delta m_{11} = 2^b)\ (b\in\{0, \ldots, 31\})$ , $\delta M = \pm$ $(\delta m_4 = 2^b)\ (b\in\{20, 25, 31\})$ , $\delta M = \pm$ $(\delta m_8 = 2^b)\ (b\in\{25, 31\})$ , and $\delta M = \pm$ $(\delta m_8 = 2^{31}, \delta m_{11} = 2^{21})$ .

$2)$ Using type Ⅱ difference: $\delta M = \pm$ $(\delta m_2 = 2^8, \delta m_{14} = 2^{31})$ , $\delta M = \pm$ $(\delta m_6 = 2^8, \delta m_9 = \delta m_{15} = 2^{31})$ , and $\delta M = \pm$ $(\delta m_2 = 2^8, \delta m_{11} = 2^{15}, \delta m_4 = \delta m_{14} = 2^{31})$ .

Algorithm 2 Shen's collision detection algorithm
Input: padded and split message blocks $M_0, ..., M_{N-1}$ .
Output: ${\texttt{True}}$ if a near-collision or pseudo-collision is detected and ${\texttt{False}}$ otherwise.
1: Let $IHV_0$ be the initial chaining values.
2: for k=0 to N-1 do
3: $(Q_{-3}, Q_{0}, Q_{-1}, Q_{-2})=IHV_k$ .
4: Calculate $Q_{1}, ..., Q_{64}$ and $IHV_{k+1}$ .
5: Obtain $mSCs=Q_{46}[31]\\|...\\| Q_{59}[31]$ and $aSCs_{b}=Q_{46}[b]\\|...\\| Q_{59}[b]\ (b=5, 9, 14, 20)$ .
6: if mSCs is one of the SCs of a element $E$ in DS or mSCs and $aSCs_b$ are one of the SCs of a element $E$ in ICS then
7: for each $(\delta M, \delta WS_{41})$ in $E$ do
8: Calculate $IHV_{k+1}^{\prime}$ and $IHV_{k+1}$ using $\delta M$ and $\delta WS_{41}$ .
9: if $IHV_{k+1}^{\prime}=IHV_{k+1}$ then
10: return ${\texttt{True}}$ .
11: end if
12: end for
13: end if
14: for each element $E$ in NDS do
15: for each $(\delta M, \delta WS_{41})$ in $E$ do
16: Calculate $IHV_{k+1}^{\prime}$ and $IHV_{k+1}$ using $\delta M$ and $\delta WS_{41}$ .
17: if $IHV_{k+1}^{\prime}=IHV_{k+1}$ then
18: return ${\texttt{True}}$ .
19: end if
20: end for
21: end for
22: end for
23: return ${\texttt{False}}$

| Show Table

DownLoad: CSV

3. Improved classification

3.1. Properties of MD5

In this subsection, we describe the properties of the Boolean function of the second round. Let $F(A, B, C) = (C\wedge A)\oplus(\overline{C}\wedge B)$ and $F(A^\prime, B^\prime, C^\prime) = (C^\prime\wedge A^\prime)\oplus(\overline{C^\prime}\wedge B^\prime)$ where $A$ , $B$ , $C$ , $A^\prime$ , $B^\prime$ and $C^\prime$ are 1-bit Boolean variables. Let $\mathrm{\Delta K} = K^\prime-K\ (K\in\{A, B, C\})$ and $\mathrm{\Delta F} = F(A^\prime, B^\prime, C^\prime)-F(A, B, C)$ . We can get the following property.

Property 1. If $\mathrm{\Delta A} \neq 0$ , $\mathrm{\Delta B} = 0$ , and $\mathrm{\Delta C} = 0$ , then $\mathrm{\Delta F} = 0$ if and only if $C = C^\prime = 0$ . If $\mathrm{\Delta A} = 0$ , $\mathrm{\Delta B}\neq 0$ , and $\mathrm{\Delta C} = 0$ , then $\mathrm{\Delta F} = 0$ if and only if $C = C^\prime = 1$ . If $\mathrm{\Delta A} = 0$ , $\mathrm{\Delta B} = 0$ , and $\mathrm{\Delta C}\neq 0$ , then $\mathrm{\Delta F} = 0$ if and only if $A = B$ . If $\mathrm{\Delta A}\neq 0$ , $\mathrm{\Delta B}\neq 0$ , and $\mathrm{\Delta C}\neq 0$ , then $\mathrm{\Delta F} = 0$ if and only if $\mathrm{\Delta A}\neq \mathrm{\Delta B}$ , $\mathrm{\Delta F} = 1$ or $-1$ if and only if $\mathrm{\Delta A} = \mathrm{\Delta B}$ .

Proof. According to the truth table of $\mathrm{\Delta F}$ , the property is easily derived. Note that $\mathrm{\Delta A}\ = \mathrm{\Delta B}$ means that $A = B$ and $A^\prime = B^\prime$ ; $\mathrm{\Delta A}\neq \mathrm{\Delta B}$ means that $A\neq B$ and $A^\prime\neq B^\prime$ .

When bit length is extended to 32-bit, the properties of the Boolean function $F(Q_t, Q_{t-1}, Q_{t-2}) = (Q_{t-2}\wedge Q_t)\oplus(\overline{Q_{t-2}}\wedge Q_{t-1})$ can be summarized as follows based on property 1. Let $\mathrm{\Delta X} = (X^\prime[i]-X[i])_{i = 0}^{N-1}$ be the signed bitwise difference and $X[i]$ indicate to take the i-th bit of $X$ .

Property 2. If $(\delta Q_t, \delta Q_{t-1}, \delta Q_{t-2}) = (2^b, 0, 0)$ and $\delta F_t = 0$ , then $Q_{t-2}[b] = 0$ .

Proof. $(\delta Q_t, \delta Q_{t-1}, \delta Q_{t-2}) = (2^b, 0, 0)$ means that $\mathrm{\Delta}Q_t[b]\neq0$ , $\mathrm{\Delta}Q_{t-1}[b] = 0$ and $\mathrm{\Delta}Q_{t-2}[b] = 0$ . $\delta F_t = 0$ means that $\mathrm{\Delta}F_t[i] = 0\ (0\le i\le31)$ . According to property 1, $Q_{t-2}[b] = 0$ .

Property 3. If $(\delta Q_t, \delta Q_{t-1}, \delta Q_{t-2}) = (0, 2^b, 0)$ and $\delta F_t = 0$ , then $Q_{t-2}[b] = 1$ .

Proof. $(\delta Q_t, \delta Q_{t-1}, \delta Q_{t-2}) = (0, 2^b, 0)$ means that $\mathrm{\Delta}Q_t[b] = 0$ , $\mathrm{\Delta}Q_{t-1}[b]\neq 0$ and $\mathrm{\Delta}Q_{t-2}[b] = 0$ . $\delta F_t = 0$ means that $\mathrm{\Delta}F_t[i] = 0\ (0\le i\le31)$ . According to property 1, $Q_{t-2}[b] = 1$ .

Property 4. If $(\delta Q_t, \delta Q_{t-1}, \delta Q_{t-2}) = (0, 0, 2^b)$ and $\delta F_t = 0$ , then $Q_t[b] = Q_{t-1}[b]$ .

Proof. $(\delta Q_t, \delta Q_{t-1}, \delta Q_{t-2}) = (0, 0, 2^b)$ means that $\mathrm{\Delta}Q_t[b] = 0$ , $\mathrm{\Delta}Q_{t-1}[b] = 0$ and $\mathrm{\Delta}Q_{t-2}[b]\neq0$ . $\delta F_t = 0$ means that $\mathrm{\Delta}F_t[i] = 0\ (0\le i\le31)$ . According to property 1, $Q_t[b] = Q_{t-1}[b]$ .

Property 5. If $(\delta Q_t, \delta Q_{t-1}, \delta Q_{t-2}) = (2^b, 2^b, 2^b)$ and $\delta F_t = 0$ , then $\mathrm{\Delta}Q_t[b]\neq\mathrm{\Delta}Q_{t-1}[b]$ .

Proof. $(\delta Q_t, \delta Q_{t-1}, \delta Q_{t-2}) = (2^b, 2^b, 2^b)$ means that $\mathrm{\Delta}Q_t[b]\neq0$ , $\mathrm{\Delta}Q_{t-1}[b]\neq0$ and $\mathrm{\Delta}Q_{t-2}[b]\neq0$ . $\delta F_t = 0$ means that $\mathrm{\Delta}F_t[i] = 0\ (0\le i\le31)$ . According to property 1, $\mathrm{\Delta}Q_t[b]\neq\mathrm{\Delta}Q_{t-1}[b]$ .

Property 6. If $(\delta Q_t, \delta Q_{t-1}, \delta Q_{t-2}) = (2^b, 2^b, 2^b)$ and $\delta F_t = \pm$ $2^b$ , then $\mathrm{\Delta}Q_t[b] = \mathrm{\Delta}Q_{t-1}[b]$ .

Proof. $(\delta Q_t, \delta Q_{t-1}, \delta Q_{t-2}) = (2^b, 2^b, 2^b)$ means that $\mathrm{\Delta}Q_t[b]\neq0$ , $\mathrm{\Delta}Q_{t-1}[b]\neq0$ and $\mathrm{\Delta}Q_{t-2}[b]\neq0$ . $\delta F_t = \pm$ $2^b$ means that $\mathrm{\Delta}F_t[b]\neq0$ . According to property 1, $\mathrm{\Delta}Q_t[b] = \mathrm{\Delta}Q_{t-1}[b]$ .

3.2. Additional sufficient conditions

The conditions in the differential path must be satisfied for a collision attack. Thus, the success probability of a collision attack depends on the number of conditions. It is easy to fulfill almost all conditions in the first 25 steps with message modification techniques. Conditions in the remaining steps are randomly satisfied. We only choose SCs between steps 31 and 28 as ASCs. Because conditions in the second round are more complex than conditions in the last round, we choose the part with relatively few simple conditions. In this subsection, we describe how to deduce ASCs.

Due to the fixed chaining value difference, we start from step 40 using Eq (2.3) to calculate the $Q_{t-3}$ reversely with linearizing step 40 to step 32, and we deduce the conditions from step 31 to step 28. In Eq (2.3), if $\delta Q_{t+1}-\delta Q_t\neq0$ then $RR(\delta Q_{t+1}-\delta Q_t, R_t)$ will rotate the difference to another bit position, which may increase the number of conditions. In the following, we refer to $\delta Q_{t+1}-\delta Q_t\neq0$ as the subtraction difference. Support $\delta Q_t = \pm$ $2^b$ and ${\delta Q}_k\ (k\in\{\ldots, t-1\}\cup\{t+1, \ldots\})$ is the chaining values adjacent to $Q_t$ . There are roughly two ways to deal with the subtraction difference:

$1)$ If there is no difference at bit position $b$ in $\delta Q_k$ , then a good choice is to eliminate the difference $\pm$ $2^b$ at steps $t+2$ , $t+1$ , and $t$ according to properties 2–4, respectively. If not eliminated, the number of conditions will increase owing to the introduced difference at bit position $b$ . Note that the subtraction difference occurs twice in either case.

$2)$ If there are consecutive differences at bit position $b$ in ${\delta Q}_k$ , then we can eliminate the subtraction difference according to properties 5 and 6, namely, keeping the difference at bit position $b$ . Thus, no conditions will be introduced because of the subtraction difference except for the initial and last subtraction difference.

According to the two ways, we deduce the conditions of a high probability differential path with properties 2 to 6. Note that the high bit position conditions may be eliminated because of carrying expansion.

Depending on ASCs, we find that DS9, ICS2, and ICS3 can be subdivided. DS1, DS2, DS3, DS4, DS13, and ICS8 cannot be subdivided. The reasons are as follows. The conditions of elements in DS1 are too complicated. After removing the repeated SCs, the number of remaining conditions of elements in DS2, DS3, and ICS8 is less than two. Besides, DS4 and DS13 have less than three SCs.

The elements in NDS that have enough SCs to be distinguished are as follows: $\delta M = \pm$ $(\delta m_8 = 2^{25})$ including two classes using type Ⅰ difference, $\delta M = (\delta m_8 = 2^{31})$ including one class using type Ⅰ difference, $\delta M = \pm(\delta m_6 = 2^8, \delta m_9 = \delta m_{15} = 2^{31})$ including two classes using type Ⅱ difference, and $\delta M = \pm$ $(\delta m_2 = 2^8, \delta m_{14} = 2^{31})$ including two classes using type Ⅱ difference. All of them may have SCs located at the 31th, 27th, 22th, 17th, 11th, or 8th bits. The ASCs at the 31th bit are called main ASCs. The ASCs at other bit locations are called auxiliary ASCs. The other elements in NDS do not have enough SCs. The conditions of $\delta M = \pm$ $(\delta m_{11} = 2^b)\ (b\in\{0, \ldots, 31\})$ , $\delta M = \pm$ $(\delta m_4 = 2^b)\ (b\in\{20, 25, 31\})$ , and $\delta M = \pm$ $(\delta m_8 = 2^{31}, \delta m_{11} = 2^{21})\ (b\in\{0, \ldots, 31\})$ using type Ⅰ difference are too complex, and the number of SCs is less than four. $\delta M = \pm$ $(\delta m_2 = 2^8, \delta m_{11} = 2^{15}, \delta m_4 = \delta m_{14} = 2^{31})$ using type Ⅱ difference has just four SCs.

3.2.1. The subdivided set

We add the classes in DS9, ICS2, and ICS3 to the subdivided set, and the details are as follows.

$1)$ DS9 element has four classes and uses type Ⅰ difference. It is divided into two subsets:

(a) DS9_1. For the message differences $\delta M = \pm$ $(\delta m_2 = 2^8, \delta m14 = 2^{31})$ , chaining value differences are $\delta Q_t = 2^{31}\ (31\le t\le 32)$ , $\delta Q_t = 0\ (t\in\{30, 28\})$ , $\delta Q_{29} = \pm$ $2^{27}$ , $\delta Q_{27} = \pm$ $2^{17}$ , and unavoidable sufficient conditions are $Q_{30}[27] = Q_{31}[27]$ , $Q_{27}[8] = Q_{28}[8]$ , $Q_{28}[27] = 1$ . Note that at step 29, $RR(\delta Q_{30}-\delta Q_{29}, 9) = \pm$ $2^{18}$ and $\delta Q_{27}$ can be eliminated or retained; $\delta m_2 = {\pm}$ $2^8$ is introduced. Therefore, $\delta Q_{26} = \pm$ $2^{18}\pm2^{17}\pm$ $2^8$ or $\pm$ $2^{18}\pm$ $2^8$ . $\delta Q_{25}$ has $\pm$ $2^{22}$ because of $\delta Q_{28}-\delta Q_{27}\neq0$ .

(b) DS9_2. For the message differences $\delta M = \pm$ $(\delta m_2 = 2^8, \delta m_{11} = 2^{15}, \delta m_4 = \delta m_{14} = 2^{31})$ , chaining value differences are $\delta Q_t = 2^{31}\ (27\le t\le 35)$ , $\delta Q_{26} = 2^{31}\pm$ $2^8$ , $\delta Q_{25} = \pm$ $2^{31}$ , and unavoidable sufficient conditions are $Q_t[31] = Q_{t+1}[31]\ (27\le t\le30)$ , $Q_{27}[8] = Q_{28}[8]$ .

2) ICS2 element has three classes and uses type Ⅰ difference. It is divided into two subsets:

(a) ICS2_1. For the message difference $\delta M = (\delta m_{14} = 2^{31})$ , chaining value differences are $\delta Q_t = 2^{31}\ (31\le t\le 32)$ , $\delta Q_t = 0\ (t\in\{30, 28\})$ , $\delta Q_{29} = \pm$ $2^{27}$ , $\delta Q_{27} = \pm$ $2^{17}$ , and unavoidable sufficient conditions are $Q_{30}[27] = Q_{31}[27]$ , $Q_{28}[27] = 1$ . Note that the calculation of $\delta Q_{26}$ and $\delta Q_{25}$ is similar to the calculation in DS9_1 except for no $\delta m_2 = \pm$ $2^8$ .

(b) ICS2_2. For the message difference $\delta M = \pm$ $(\delta m_{11} = 2^{15}, \delta m_4 = \delta m_{14} = 2^{31})$ , chaining value differences are $\delta Q_t = 2^{31}\ (25\le t\le 34)$ , and unavoidable sufficient conditions are $Q_t[31] = Q_{t+1}[31]\ (27\le t\le30)$ .

3) ICS3 element has three classes and uses type Ⅰ difference. It is divided into three subsets:

(a) ICS3_1. For the message difference $\delta M = (\delta m_5 = 2^{31})$ , chaining value differences are $\delta Q_t = 0\ (t\in\{30, 31\})$ , $\delta Q_t = 2^{31}\ (27\le t\le29)$ , $\delta Q_{26} = 2^{31}\pm2^{22}$ , $\delta Q_{25} = 2^{31}$ , and unavoidable sufficient conditions are $Q_{30}[31]\neq Q_{31}[31]$ , $Q_t[31] = Q_{t+1}[31]\ (27\le t\le28)$ , $Q_{27}[22] = Q_{28}[22]$ .

(b) ICS3_2. For the message difference $\delta M = (\delta m_5 = 2^{31}, \delta m_{11} = 2^{31})$ , chaining value differences are $\delta Q_t = 2^{31}\ (26\le t\le27, \ 29\le t\le31)$ , $\delta Q_{28} = 2^{31}\pm2^{11}$ , $\delta Q_{25} = 2^{31}\pm2^6$ , and unavoidable sufficient conditions are $Q_t[31] = Q_{t+1}[31]\ (27\le t\le30)$ , $Q_{29}[11] = Q_{30}[11]$ , $Q_{27}[11] = 1$ , $Q_{26}[11] = 0$ .

(c) ICS3_3. For the message difference $\delta M = (\delta m_5 = 2^{31}, \delta m_8 = 2^{31})$ , chaining value differences are $\delta Q_{30} = 2^{31}$ , $\delta Q_{27} = \pm$ $2^{17}$ , $\delta Q_{26} = \pm$ $2^{22}$ , $\delta Q_t = 0\ (t\in\{25, 28, 29, 31\})$ , and unavoidable sufficient conditions are $Q_{29}[31] = 1$ , $Q_{28}[31] = 0$ , $Q_{28}[17] = Q_{29}[17]$ , $Q_{26}[17] = 1$ .

3.2.2. The new individual checked set

We add seven classes to the new individual checked set (NICS). The SCs of each element consist of the SCCs in the last 16 steps and ASCs to ensure enough SCs. The details of the new elements are as follows:

$1)$ NICS1 element has two classes and uses type Ⅰ difference. Message difference is $\delta M = \pm$ $(\delta m_8 = 2^{25})$ . Chaining value differences are $\delta Q_t = 2^{31}\ (\ 57\le t\le59)$ , $\delta Q_t = \pm$ $2^{25}\ (\ 29\le t\le30)$ . Unavoidable sufficient conditions are $Q_{57}[31] \neq Q_{59}[31]$ , $Q_{56}[31] = 1$ , $Q_{55}[31] = 0$ , $Q_{28}[11] = Q_{29}[11]$ , $Q_{26}[11] = 1$ . Note that $\delta Q_{27}$ has $\pm$ $2^{11}$ owing to $\delta Q_{31}-\delta Q_{30}\neq0$ . Because at step 31, we get $\delta Q_{28} = \pm$ $2^{25}$ or 0, the differences of $\delta Q_t\ (\ 25\le t\le28)$ at 25th bit are uncertain.

$2)$ NICS2 element has one class and uses type Ⅰ difference. Message difference is $\delta M = (\delta m_8 = 2^{31})$ . Chaining value differences are $\delta Q_t = 2^{31}\ (25\le t\le26, \ 28\le t\le30)$ , $\delta Q_t = {\pm}$ $2^5\ (57\le t\le59)$ , $\delta Q_{27} = 2^{31}\pm2^{17}$ . Unavoidable sufficient conditions are $Q_{57}[5] \neq Q_{59}[5]$ , $Q_{56}[5] = 1$ , $Q_{55}[5] = 0$ , $Q_t[31] = Q_{t+1}[31]\ (27\le t\le29)$ , $Q_{28}[17] = Q_{29}[17]$ , $Q_{26}[17] = 1$ .

$3)$ NICS3 element has two classes and uses type Ⅱ difference. Message difference is $\delta M = \pm$ $(\delta m_6 = 2^8, \delta m_9 = \delta m_{15} = 2^{31})$ . Chaining value differences are $\delta Q_{59} = \pm$ $2^{23}\pm$ $2^9$ , $\delta Q_{58} = \pm$ $2^9$ , $\delta Q_t = 2^{31}\ (25\le t\le43)$ . Unavoidable sufficient conditions are $Q_{57}[9] = 1$ , $Q_{56}[9] = 0$ , $Q_t[31] = Q_{t+1}[31]\ (27\le t\le30)$ .

$4)$ NICS4 element has two classes and uses type Ⅱ difference. Message difference is $\delta M = \pm$ $(\delta m_2 = 2^8, \delta m_{14} = 2^{31})$ . Chaining value differences are $\delta Q_t = 2^{31}\ (30\le t\le47)$ , $\delta Q_{29} = 2^{31}\pm2^{27}$ , $\delta Q_{27} = \pm$ $2^{17}$ , $\delta Q_{26} = \pm$ $2^8$ , $\delta Q_{25} = \pm$ $2^{22}$ . Unavoidable sufficient conditions are $Q_{49}[31] = 0$ , $Q_{48}[31] = 1$ , $Q_{30}[27] = Q_{31}[27]$ , $Q_{28}[27] = 1$ , $Q_{27}[8] = Q_{28}[8]$ . The difference on 31th bit of $\delta Q_t\ (25\le t\le28)$ are uncertain. Because at step 29, $\delta Q_{27} = \pm$ $2^{17}$ can be eliminated or retained, the differences of $\delta Q_{26}$ at 17th bit are undetermined.

3.3. Our new algorithm and implementation

In this subsection, we discuss how ASCs can be integrated into Shen's algorithm. Our new algorithm is described in . Details are as follows. In ^[11], all possible bit values are listed in a table using a hexadecimal representation; the key bits determine the rest of the bits; the bit mask indicates those positions where the SCs are. We use the same method to represent ASCs. Because there are only 6 bits, we use binary to represent them. The most significant bit is $\delta Q_{26}[b]$ , and the least significant bit is $\delta Q_{31}[b]\ (b \in \{31, 27, 22, 17, 11, 8\})$ . For example, ASCs of ICS_2 are $Q_t[31] = Q_{t+1}[31]\ (27\le t\le30)$ . They are respected by two binary values, 0b000000 and 0b011111. The bit mask is 0b011111. We choose $Q_{30}[31]$ and $Q_{31}[31]$ as key bits. Obviously, $Q_{30}[31] = Q_{31}[31] = 0$ indicates 0b000000; $Q_{30}[31] = Q_{31}[31] = 1$ indicates 0b011111. Two bits are used as they determine up to four different values.

Algorithm 3 Algorithm integrated with ASC
Input: padded and split message blocks $M_0, ..., M_{N-1}$ .
Output: ${\texttt{True}}$ if a near-collision or pseudo-collision is detected and ${\texttt{False}}$ otherwise.
1: Let $IHV_0$ be the initial chaining values.
2: for k=0 to N-1 do
3: $(Q_{-3}, Q_{0}, Q_{-1}, Q_{-2})=IHV_k$ .
4: Calculate $Q_{1}, ..., Q_{64}$ and $IHV_{k+1}$ .
5: Obtain $mSCs=Q_{46}[31]\\|...\\| Q_{59}[31]$ , $aSCs_{b}=Q_{46}[b]\\|...\\| Q_{59}[b]\ (b=5, 9, 14, 20)$ and $ASCs_{b}=Q_{26}[b]\\|...\\| Q_{31}[b]\ (b=31, 27, 22, 17, 11, 8)$ .
6: if mSCs is one of the SCs of a element $E$ in DS or mSCs and $aSCs_b$ are one of the SCs of a element $E$ in ICS or $mSCs$ , $aSCs_{b}$ and $ASCs_b$ is one of the SCs of a element $E$ in NICS then
7: if $E$ belongs to the Subdivided Set then
8: according to $ASCs_b$ , choose $\delta M$ and $\delta WS_{41}$ and calculate $IHV_{k+1}^{\prime}$ and $IHV_{k+1}$ .
9: if $IHV_{k+1}^{\prime}=IHV_{k+1}$ then
10: return ${\texttt{True}}$ .
11: end if
12: else
13: for each $(\delta M, \delta WS_{41})$ in $E$ do
14: Calculate $IHV_{k+1}^{\prime}$ and $IHV_{k+1}$ using $\delta M$ and $\delta WS_{41}$ .
15: if $IHV_{k+1}^{\prime}=IHV_{k+1}$ then
16: return ${\texttt{True}}$ .
17: end if
18: end for
19: end if
20: end if
21: for each element $E$ in NDS do
22: for each $(\delta M, \delta WS_{41})$ in $E$ do
23: Calculate $IHV_{k+1}^{\prime}$ and $IHV_{k+1}$ using $\delta M$ and $\delta WS_{41}$ in $E$ .
24: if $IHV_{k+1}^{\prime}=IHV_{k+1}$ then
25: return ${\texttt{True}}$ .
26: end if
27: end for
28: end for
29: end for
30: return ${\texttt{False}}$

| Show Table

DownLoad: CSV

For the elements in the subdivided set, conditions are specified using key bits according to the table of SCCs in ^[11]. Then, ASCs are used to determine which message difference these conditions belong to. Therefore, it is no longer necessary to calculate the ${IHV}_{out}$ of all messages difference, in turn, to detect a collision. ASCs of the subdivided set are listed in . The column where the value of SCs is located is determined by the key bits. For instance, the key bits are 11, which means column 3. Note that the condition $Q_{27}[8] = Q_{28}[8]$ exists both in DS9_1 and DS9_2, so it is not used. DS9 and ICS2 conditions are identical in , so they share a $2\times5$ array in implementation. ICS3 conditions are saved in a $6\times5$ array. For the key bits, ICS3_1^* and ICS3_2^† are $Q_{29}[b]$ and $Q_{30}[b]$ ; ICS3_1^† and ICS3_3^† are $Q_{27}[b]$ and $Q_{28}[b]$ ; ICS3_3^* are $Q_{28}[b]$ and $Q_{29}[b]$ ; all others are $Q_{30}[b]$ and $Q_{31}[b]$ .

Table 2. ASCs of the subdivided set.

Element	C[i][0]	C[i][1]	C[i][2]	C[i][3]	Bit mask
DS9_1	0b001000	-1	-1	0b001011	0b001011
DS9_2	0b000000	-1	-1	0b011111	0b011111
ICS2_1	0b001000	-1	-1	0b001011	0b001011
ICS2_2	0b000000	-1	-1	0b011111	0b011111
ICS3_1^*	0b000001	0b000010	0b011101	0b011110	0b011111
ICS3_1^†	0b000000	-1	-1	0b011000	0b011000
ICS3_2^*	0b000000	-1	-1	0b011111	0b011111
ICS3_2^†	0b010000	-1	-1	0b010110	0b110110
ICS3_3^*	-1	0b000100	-1	-1	0b001100
ICS3_3^†	0b100000	-1	-1	0b111000	0b111000
^* is the main ASCs. ^† is the auxiliary ASCs. -1 denotes that there is no value.

| Show Table

DownLoad: CSV

We add NICS to the end of ICS in the original algorithm. Namely, after checking ICS, the algorithm will process the elements in NICS. SCs in NICS are listed in Table 3. Note that the values of NICS1^‡ and NICS1^† are identical to NICS2^‡ and NICS2^†, respectively. They are merged in the implementation. For the key bits, NICS1^†, NICS2^*, NICS2^† and NICS3^* are $Q_{28}[b]$ and $Q_{29}[b]$ ; NICS4^* are $Q_{29}[b]$ and $Q_{30}[b]$ ; NICS4^† are $Q_{27}[b]$ and $Q_{28}[b]$ ; NICS1^‡ and NICS2^‡ are $Q_{57}[b]$ and $Q_{59}[b]$ .

Table 3. SCCs and ASCs of the new individual checked set.

Element	C[i][0]	C[i][1]	C[i][2]	C[i][3]	Bit mask
NICS1^‡	-1	0x00009	0x0000c	-1	0x0001d
NICS1^†	0b100000	-1	-1	0b101100	0b101100
NICS2^‡	-1	0x00009	0x0000c	-1	0x0001d
NICS2^*	0b000000	-1	-1	0b011110	0b011110
NICS2^†	0b100000	-1	-1	0b101100	0b101100
NICS3^‡	0x00004	-1	-1	-1	0x00004
NICS3^*	0b000000	-1	-1	0b011111	0b011111
NICS4^‡	0x20000	-1	-1	-1	0x20000
NICS4^*	0b001000	-1	-1	0b001011	0b001011
NICS4^†	0b000000	-1	-1	0b011000	0b011000
^‡ is the SCCs of the last 16 steps.

| Show Table

DownLoad: CSV

4. Experiments

4.1. Verification of ASCs

We use "HashClash" developed by Stevens to verify whether ASCs exist in the differential path. "HashClash" using the MIT License is a tool to construct differential paths for MD5 and SHA-1, also including the chosen-prefix collision attack on MD5. It consists of three parts, i.e., forward extension, backward extension, and connection. One can find the source code of "HashClash" in ^[13].

To achieve the maximum complexity of conditions as much as possible, we set the parameter as follows:

$1)$ $Autobalance = 1,000,000$ . The parameter limits the maximum number of paths that can be saved per step;

$2)$ $Estimate = 4$ . The program will evaluate the maximum number of conditions under $4\times1,000,000$ available paths.

$3)$ $Maxweight = 32$ and $Maxsdrs = 2000$ . $Maxweight$ controls the carrying extension length; $Maxsdrs$ limits the total number of $\Delta Q$ . Such a setting means no limitation to carrying extension of $\delta Q$ .

Other parameters are the default values. The backward program of MD5 is used to construct differential paths starting from step 40 to step 16. After observing the paths with the minimum and maximum conditions, we found that all ASCs existed. This result suggests that the conditions found are sufficient. One can find the results and our new algorithm code in https://github.com/Finsenty54/Improved-Collision-Detection-Of-MD5.

4.2. Complexity

The algorithm of Shen et al. takes 78.44 times the runtime of hashing a message block to detect collisions in theory. In this paper, as seven classes can be distinguished which make a major contribution to improving the detection efficiency, the total complexity of the new collision algorithm is about 71.44 times of computing the MD5 hash of a message block. Compared with the original algorithm, the complexity is reduced by 8.92%.

The C++ code of the collision detection algorithm with ASCs was compiled by g++ 10.2.1 under -O3 optimization and was run on AMD Ryzen 5 2600 at 3.40 GHz under Parrot OS 4.11. We conducted three experiments on the original code of Shen et al. and our code, respectively. The total number of messages generated is $2^{24}$ . The results are shown in Table 4. The average checks number of the original algorithm is 72.2949 in contrast to 66.3834 of our new algorithm. The experiments show that it reduces the runtime by 8.18% with ASCs.

Table 4. Comparison of the original and our new algorithm.

Experiments	Generate random message [ $\times$ $10^6/ms$ ]	Standard MD5 hash [ $\times$ $10^6/ms$ ]	Collision detection [ $\times$ $10^8/ms$ ]	Average check number [multiples of one hashing operation]
Original test1	3.27323	5.11324	1.35930	72.0957
Original test2	3.27849	5.10581	1.35466	72.3395
Original test3	3.26173	5.09832	1.36322	72.4496
New test1	3.22541	5.07214	1.26397	66.6971
New test2	3.25677	5.11430	1.27354	66.4313
New test3	3.21775	5.07999	1.26165	66.0214

| Show Table

DownLoad: CSV

5. Conclusions

In this paper, we derive the sufficient conditions between steps 31 and 28 of MD5 based on the properties of the Boolean function of the second round and describe the reverse deduction ways. Combining the additional sufficient conditions, we improve Shen's detection algorithm by subdividing three sets and distinguishing seven more classes. After adding the subdivided set and NICS to their algorithm, we get our new detection algorithm. The cost of collision detection of MD5 is reduced by 8.18%. This study further supports the idea of using unavoidable conditions to accelerate collision detection in previous research. The experimental results of Shen's algorithm differ from those claimed in ^[11]. This inconsistency may be due to the various experimental setting.

Acknowledgments

This work was supported by the Zhejiang Provincial Natural Science Foundation of China [Grant No. LQ20F020019] and the Technology on Communication Security Laboratory [Grant No. 6142103190105].

Conflict of interest

All authors declare no conflicts of interest in this paper.

References

[1]	I. B. Damgård, A design principle for hash functions, in Advances in Cryptology –- CRYPTO' 89 Proceedings (ed. G. Brassard), Springer, New York, NY, (1990), 416–427. https://doi.org/10.1007/0-387-34805-0_39
[2]	R. C. Merkle, One way hash functions and DES, in Advances in Cryptology –- CRYPTO' 89 Proceedings (ed. G. Brassard), Springer, New York, NY, (1990), 428–446. https://doi.org/10.1007/0-387-34805-0_40
[3]	X. Wang, H. Yu, How to break MD5 and other hash functions, in Advances in Cryptology – EUROCRYPT 2005 (ed. R. Cramer), Springer, Berlin, Heidelberg, (2005), 19–35. https://doi.org/10.1007/11426639_2
[4]	A. D. Dwivedi, S. Dhar, G. Srivastava, R. Singh, Cryptanalysis of round-reduced fantomas, robin and iSCREAM, Cryptography, 3 (2019), 4. https://doi.org/10.3390/cryptography3010004 doi: 10.3390/cryptography3010004
[5]	M. Stevens, A. Lenstra, B. de Weger, Chosen-prefix collisions for MD5 and colliding X.509 certificates for different identities, in Advances in Cryptology - EUROCRYPT 2007 (ed. M. Naor), Springer, Berlin, Heidelberg, (2007), 1–22. https://doi.org/10.1007/978-3-540-72540-4_1
[6]	M. Stevens, A. Sotirov, J. Appelbaum, A. Lenstra, D. Molnar, D. A. Osvik, et al., Short chosen-prefix collisions for MD5 and the creation of a rogue CA certificate, in Advances in Cryptology - CRYPTO 2009 (ed. S. Halevi), Springer, Berlin, Heidelberg, (2009), 55–69. https://doi.org/10.1007/978-3-642-03356-8_4
[7]	M. Stevens, Counter-cryptanalysis, in Advances in Cryptology – CRYPTO 2013 (eds. R. Canetti and J. A. Garay), Springer, Berlin, Heidelberg, (2013), 129–146. https://doi.org/10.1007/978-3-642-40041-4_8
[8]	X. Wang, H. Yu, W. Wang, H. Zhang, T. Zhan, Cryptanalysis on HMAC/NMAC-MD5 and MD5-MAC, in Advances in Cryptology - EUROCRYPT 2009 (ed. A. Joux), Springer, Berlin, Heidelberg, (2009), 121–133. https://doi.org/10.1007/978-3-642-01001-9_7
[9]	M. Stevens, Attacks on hash functions and applications, Ph.D thesis, Leiden University in Leiden, 2012. https://doi.org/10.6100/ir749233
[10]	M. Stevens, D. Shumow, Speeding up detection of {SHA-1} collision attacks using unavoidable attack conditions, in 26th USENIX Security Symposium (USENIX Security 17), USENIX Association, Vancouver, BC, (2017), 881–897. https://doi.org/10.5555/3241189.3241259
[11]	Y. Shen, T. Wu, G. Wang, X. Dong, H. Qian, Improved collision detection of MD5 using sufficient condition combination, Comput. J., (2021). https://doi.org/10.1093/comjnl/bxab109 doi: 10.1093/comjnl/bxab109
[12]	R. L. Rivest, The MD5 message-digest algorithm, RFC, 1321 (1992), 1–21. https://doi.org/10.17487/RFC1321 doi: 10.17487/RFC1321
[13]	M. Stevens, Project HashClash - MD5 & SHA-1 cryptanalytic toolbox [Internet]. Available from: https: //github.com/cr-marcstevens/hashclash.

This article has been cited by:

1.	D. Paul Joseph, P. Viswanathan, SDOT: Secure Hash, Semantic Keyword Extraction, and Dynamic Operator Pattern-Based Three-Tier Forensic Classification Framework, 2023, 11, 2169-3536, 3291, 10.1109/ACCESS.2023.3234434
2.	Donagani Ramakrishna, Mohammed Ali Shaik, A Comprehensive Analysis of Cryptographic Algorithms: Evaluating Security, Efficiency, and Future Challenges, 2025, 13, 2169-3536, 11576, 10.1109/ACCESS.2024.3518533

Reader Comments

Your name:*

Email:*
© 2022 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Electronic Research Archive

1 1.3

Metrics

Article views(2357) PDF downloads(133) Cited by(2)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Tables(4)

Electronic Research Archive

Improved collision detection of MD5 with additional sufficient conditions

Related Papers:

Abstract

1. Introduction

2. Preliminaries

2.1. Description of MD5

2.2. Related works

2.2.1. Stevens' collision detection algorithm

2.2.2. Shen's collision detection algorithm

3. Improved classification

3.1. Properties of MD5

3.2. Additional sufficient conditions

3.2.1. The subdivided set

3.2.2. The new individual checked set

3.3. Our new algorithm and implementation

4. Experiments

4.1. Verification of ASCs

4.2. Complexity

5. Conclusions

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Electronic Research Archive

Improved collision detection of MD5 with additional sufficient conditions

Related Papers:

Abstract

1. Introduction

2. Preliminaries

2.1. Description of MD5

2.2. Related works

2.2.1. Stevens' collision detection algorithm

2.2.2. Shen's collision detection algorithm

3. Improved classification

3.1. Properties of MD5

3.2. Additional sufficient conditions

3.2.1. The subdivided set

3.2.2. The new individual checked set

3.3. Our new algorithm and implementation

4. Experiments

4.1. Verification of ASCs

4.2. Complexity

5. Conclusions

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog