Boosting microscopic object detection via feature activation map guided poisson blending

Haixu Yang; Yunqi Zhu; Jiahui Yu; Luhong Jin; Zengxi Guo; Cheng Zheng; Junfen Fu; Yingke Xu; Haixu Yang; Yunqi Zhu; Jiahui Yu; Luhong Jin; Zengxi Guo; Cheng Zheng; Junfen Fu; Yingke Xu

doi:10.3934/mbe.2023813

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 10: 18301-18317. doi: 10.3934/mbe.2023813

Previous Article Next Article

Research article

Boosting microscopic object detection via feature activation map guided poisson blending

1.
Department of Biomedical Engineering, MOE Key Laboratory of Biomedical Engineering, State Key Laboratory of Extreme Photonics and Instrumentation, Zhejiang Provincial Key Laboratory of Cardio-Cerebral Vascular Detection Technology and Medicinal Effectiveness Appraisal, Zhejiang Provincial Key Laboratory of Traditional Chinese Medicine for Clinical Evaluation and Translational Research, Zhejiang University, Hangzhou, 310027, China
2.
Binjiang Institute of Zhejiang University, Hangzhou, 310053, China
3.
Zhejiang Institute for Food and Drug Control, NMPA Key Laboratory of Quality Evaluation of Traditional Chinese Medicine (Traditional Chinese Patent Medicine), Hangzhou 310052, China
4.
Department of Endocrinology, Children's Hospital of Zhejiang University School of Medicine, National Clinical Research Center for Children's Health, Hangzhou, 310051 China
†These authors contributed to the work equally and should be regarded as co-first authors

Received: 07 July 2023 Revised: 05 September 2023 Accepted: 17 September 2023 Published: 22 September 2023

Microscopic examination of visible components based on micrographs is the gold standard for testing in biomedical research and clinical diagnosis. The application of object detection technology in bioimages not only improves the efficiency of the analyst but also provides decision support to ensure the objectivity and consistency of diagnosis. However, the lack of large annotated datasets is a significant impediment in rapidly deploying object detection models for microscopic formed elements detection. Standard augmentation methods used in object detection are not appropriate because they are prone to destroy the original micro-morphological information to produce counterintuitive micrographs, which is not conducive to build the trust of analysts in the intelligent system. Here, we propose a feature activation map-guided boosting mechanism dedicated to microscopic object detection to improve data efficiency. Our results show that the boosting mechanism provides solid gains in the object detection model deployed for microscopic formed elements detection. After image augmentation, the mean Average Precision (mAP) of baseline and strong baseline of the Chinese herbal medicine micrograph dataset are increased by 16.3% and 5.8% respectively. Similarly, on the urine sediment dataset, the boosting mechanism resulted in an improvement of 8.0% and 2.6% in mAP of the baseline and strong baseline maps respectively. Moreover, the method shows strong generalizability and can be easily integrated into any main-stream object detection model. The performance enhancement is interpretable, making it more suitable for microscopic biomedical applications.

Keywords:

Citation: Haixu Yang, Yunqi Zhu, Jiahui Yu, Luhong Jin, Zengxi Guo, Cheng Zheng, Junfen Fu, Yingke Xu. Boosting microscopic object detection via feature activation map guided poisson blending[J]. Mathematical Biosciences and Engineering, 2023, 20(10): 18301-18317. doi: 10.3934/mbe.2023813

Related Papers:

[1]	Xiaoju Zhang, Kai Zheng, Yao Lu, Huanhuan Ma . Global existence and long-time behavior of solutions for fully nonlocal Boussinesq equations. Electronic Research Archive, 2023, 31(9): 5406-5424. doi: 10.3934/era.2023274
[2]	Bidi Younes, Abderrahmane Beniani, Khaled Zennir, Zayd Hajjej, Hongwei Zhang . Global solution for wave equation involving the fractional Laplacian with logarithmic nonlinearity. Electronic Research Archive, 2024, 32(9): 5268-5286. doi: 10.3934/era.2024243
[3]	Ahmed Alsaedi, Madeaha Alghanmi, Bashir Ahmad, Boshra Alharbi . Uniqueness results for a mixed $p$ -Laplacian boundary value problem involving fractional derivatives and integrals with respect to a power function. Electronic Research Archive, 2023, 31(1): 367-385. doi: 10.3934/era.2023018
[4]	Ping Yang, Xingyong Zhang . Existence of nontrivial solutions for a poly-Laplacian system involving concave-convex nonlinearities on locally finite graphs. Electronic Research Archive, 2023, 31(12): 7473-7495. doi: 10.3934/era.2023377
[5]	Mingfa Fei, Wenhao Li, Yulian Yi . Numerical analysis of a fourth-order linearized difference method for nonlinear time-space fractional Ginzburg-Landau equation. Electronic Research Archive, 2022, 30(10): 3635-3659. doi: 10.3934/era.2022186
[6]	Hongyu Li, Liangyu Wang, Yujun Cui . Positive solutions for a system of fractional $q$ -difference equations with generalized $p$ -Laplacian operators. Electronic Research Archive, 2024, 32(2): 1044-1066. doi: 10.3934/era.2024051
[7]	Yuchen Zhu . Blow-up of solutions for a time fractional biharmonic equation with exponentional nonlinear memory. Electronic Research Archive, 2024, 32(11): 5988-6007. doi: 10.3934/era.2024278
[8]	Cheng He, Changzheng Qu . Global weak solutions for the two-component Novikov equation. Electronic Research Archive, 2020, 28(4): 1545-1562. doi: 10.3934/era.2020081
[9]	N. Bazarra, J. R. Fernández, R. Quintanilla . A dual-phase-lag porous-thermoelastic problem with microtemperatures. Electronic Research Archive, 2022, 30(4): 1236-1262. doi: 10.3934/era.2022065
[10]	Liangying Miao, Man Xu, Zhiqian He . Existence and multiplicity of positive solutions for one-dimensional $p$ -Laplacian problem with sign-changing weight. Electronic Research Archive, 2023, 31(6): 3086-3096. doi: 10.3934/era.2023156

Abstract

1. Introduction

In recent years, the printing press has developed in the direction of green, high speed, automation and intelligence in modern industry. Rolling bearings are one of the most important components in rotating machinery, and play a role in reducing the friction coefficient during motion, and ensuring rotary accuracy in the printing press ^[1,2,3,4]. Therefore, it's health status has an important influence on the printing press's overall performance, stability and life ^[5]. Due to the corrosion of the rolling bears by chemical substances, such as fountain solution, ink, or bearing failure, may be due to insufficient lubrication, water or foreign matter intrusion, improper bearing assembly, long time operation of the machine, etc. It is easy to cause failure and abnormal vibration of the printing press bearings, then cause faults in the printing press, such as paper gripper being gripped, inaccurate overprinting, ink bar and poor overprinting ^[1]. Also, the printing presses usually work in the poor environment, so fault is not easy to be found. This strong background noise environment can lead to low fault diagnosis accuracy, which can lead to safety problems. Therefore, monitoring the operating status of the printing press bearings plays an increasingly important role in actual production, and by introducing fault diagnosis technology in machinery, it can find the potential problems of machinery in time, and also can be solved in a timely manner, greatly reducing the failure rate of equipment.

Traditional fault diagnosis is achieved by extracting bearing fault features in the original signal through signal processing techniques and mathematical transformation, and then identifying the information that can reflect the fault features ^[7]. However, with the increased productivity and the expanding scale of equipment, the bearing vibration signals collected by the equipment have the features of massive, multi-source and high-dimensional, which leads to inefficiency and low accuracy of the traditional fault diagnosis methods ^[8,9]. While intelligent fault diagnosis is based on data-driven methods, they can quickly and accurately handle a large amount of complex data, so it has become one of the current research hotspots for condition monitoring and fault diagnosis of mechanical equipment.

The mechanical structure of the printing press is complex, and the printing press generates strong noises. So, the vibration signals of the printing press bearings collect the features with strong noise. It is hard to extract the fault features accurately in the vibration signal with strong noise, and it's easy to leads to the gradient disappearance and overfitting problem of the network model during training of traditional machine learning, such as Support Vector Machine (SVM) and Logistic Regression. As a new network architecture, the ResNet can effectively solve the gradient disappearance or gradient explosion caused by the data or network depth. In addition, the ResNet has a great ability to extract features, which can solve the problem of overfitting when training other network models, and reduce the reliance on fault diagnosis experience of expert and signal processing techniques ^[10,11,12]. Therefore, the ResNet is selected as the base network in this proposed method. Lin ^[13] proposed an improved one-dimensional convolution ResNet fault diagnosis. The extraction and compression of data fault features are first completed by convolution pooling. Then, an improved ResNet is added to avoid network degradation and uneven data distribution in the training model. Zagoruyko and Komodakis ^[14] studied the use of a wide residual block, and demonstrated it, experimentally, at a reasonable depth. Konovalenko ^[15] studied a deep ResNet-based model for rolling steel surface defect recognition and classification, and constructed a classifier to detect three types of defects on flat metal surfaces. Yan ^[16] proposed the deep order-wavelet convolutional variational autoencoder (DOWCVAE), which improves the feature learning capability of convolutional variational autoencoder ordinary convolutional variational encoder (CVAE). Peng ^[17] combined ResNet and data fusion to achieve higher recognition accuracy. Yan ^[18] proposed a deep regularized variational autoencoder (DRVAE) intelligent fault diagnosis model for rotor-bearing systems, which solves the overfitting problem of the original variational autoencoder (VAE) and enhances the feature learning capability of the network model. Wang ^[19] proposed a deep separable ResNet model, which uses a reduced number of parameters through deep separable convolution, and uses residual connections to transfer features, which can effectively predict the remaining life of rolling bearings. Liang ^[20] proposed a frequency domain analysis method based on wavelet transform, and improved ResNet with global singular value decomposition (SVD) adaptive. Zhang ^[1] proposed a blockchain-based distributed joint transfer learning method to improve the accuracy of mechanical fault diagnosis, and applied it to collaborative mechanical fault diagnosis. Yan ^[22] proposed multiscale cascading deep belief network (MCDBN) for rotating machinery fault diagnosis, which improves fault identification accuracy by learning a wider range of feature representations. Lin ^[23] proposed a photovoltaic array fault diagnosis scheme using a multi-scale SE-ResNet, and designed a multi-scale perceptual field fusion module to improve the diagnosis performance of the model. Wan ^[24] used an improved deep ResNet as a feature extractor to extract metastable features from the original vibration signal. The classifier used the extracted domain-invariant features to complete cross-domain fault identification. Yan ^[25] proposed a wind speed prediction model based on long short-term memory, deep belief network and grasshopper optimization algorithm (GOA) to improve the accuracy and efficiency of wind speed prediction. Yang ^[26] classified real-time data collected from dissolved oxygen sensors to diagnosis online faults and experimentally proved that ResNet has great performance. Hao ^[27] proposed a new network structure by replacing the fully connected layer part of traditional ResNet with the global average pooling technique. The problem of too many parameters of the traditional ResNet model was effectively solved. Zhang ^[28] proposed a fault diagnosis method based on adaptive loss-weighted meta-ResNet (ALWM-ResNet), which uses a weighted network and a meta-network cloned from the original ResNet to establish a mapping of weighting functions, and adaptively learn weights from data containing clean labels. However, the convolution kernel in the ResNet structure has a fixed shape, poor adaptability to changes in unknown images, and poor generalization ability ^[29,30]. Thus, this proposed method introduces a deformable convolution layer based on the ResNet model, so that the convolution layer can adaptively change the shape of the convolution kernel according to the different input samples, so the convolution kernel can precisely locate and track the target for small fault features, and use the deformable convolution to adaptively identify the fault feature points, thus learning more detailed features and improving the accuracy of feature extraction.

However, the signals collected under actual working conditions often contain complex noises, leading to a decline in accuracy if they are inputting direct into the model. Therefore, this proposed method introduces the signal preprocessing method to extract fault features before input to improve the accuracy of the model. The Short Time Fourier Transform (STFT), Wigner-Ville Distribution (WVD), Empirical Mode Decomposition (EMD), and Wavelet Analysis are widely used preprocessing methods in the field of fault diagnosis. Hartono ^[31] proposed a joint time-frequency analysis method for gear fault diagnosis using the combined autoregressive model-based filtering and redistribution with the smoothing pseudo-Wigner Ville distribution (RSPWVD) method. Nezamivand ^[32] performed empirical modal decomposition and wavelet packet decomposition of vibration signals, and achieved good results for fault state identification of rolling bearings by SVM. Surti ^[33] proposed a new technique for early bearing fault detection and diagnosis based on discrete wavelet transform (DWT) and K-nearest neighbor (KNN), which showed good accuracy and the ability to distinguish between healthy and unhealthy bearing conditions. Li ^[34] proposed a deep learning-based remaining useful life (RUL) prediction method to solve the sensor failure problem, and introduced adversarial learning to extract invariant features in generalized sensors. Dubey ^[1] proposed a Hilbert transform footprint analysis and neural network for ball bearing fault analysis, and the method achieved high fault classification accuracy. Tian ^[36] proposed a method to detect bearing faults and monitor bearing degradation in electric motors using Programmable Counter Array and semi-supervised KNN distance to combine these features to form health indicators for detection. Amar ^[37] proposed a novel vibration spectrum imaging (VSI) feature enhancement system for low signal-to-noise ratio (SNR) conditions that enhances and provides a visual representation of feature vibrational spectral features in the form of images. Zhang ^[38] preliminarily constructed the wavelet-overlapping group sparse (WOGS) optimization model based on the overlapping features of Morlet wavelet transform coefficients, and constructed the weight coefficients in the model by analyzing the pulse features of the signal. Mao ^[39] used a classification method combining multiscale alignment entropy and support vector machine to achieve the fault-type classification of bearings. Zhao ^[40] proposed a new convolution neural network scheme based on attention-enhanced convolution blocks (AECB) to achieve higher training accuracy for control moment gyroscope (CMG) fault diagnosis data sets with different sliding window parameters.

The above method used the data collected under the standard data set, and used simple preprocessing into the neural network to achieve bearing fault diagnosis. However, the above-preprocessing methods need to select the appropriate window function or wavelet basis function, as traditional preprocessing methods lead to modal blending and endpoint effects, resulting in low fault diagnosis accuracy, such as STFT, Wavelet Transform and WVD. This isdue to the faulty bearing signals being collected within strong noise vibrations in the actual working conditions. The FSWT combines the advantages of STFT and Wavelet Transform, which not only reduces the dependence of wavelets and wavelet packets on wavelet basis functions in reconstructing signals, but also realizes the reconstruction of signals in arbitrary frequency bands and the accurate description of local features, which can flexibly realize the filtering and segmentation of signals ^[41]. Furthermore, the TFD after FSWT processing contains feature information of the vibration signal in both time domain and frequency domain, which is beneficial to neural networks for feature extraction and improves the efficiency of model identification effectively.

In summary, this method proposes a diagnosis method that integrate FSWT, deformable convolution layer, and ResNet to improve diagnosis efficiency under strong background noise conditions while ensuring diagnosis accuracy. The experimental results show that the proposed method has higher recognition accuracy than other diagnoses. In addition, the diagnosis accuracy is improved while the model training efficiency is enhanced. The main contributions of this proposed method are summarized as follows.

(1) In order to reduce the influence of noise on diagnosis accuracy, this method proposes a method preprocessing bearing vibration signals using FSWT. Compared with other methods, this method can improve the efficiency and accuracy of subsequent intelligent fault diagnosis significantly.

(2) In this proposed method, the ResNet is selected to solve the problem of gradient disappearance and overfitting during the training of traditional network models. At the same time, in order to enhance the extraction ability of the model for subtle features in the TFD, the deformable convolution layer is introduced into the ResNet, which improves the adaptiveness of the convolution layer shape so that the model can effectively capture the subtle features drowned in noise.

(3) This proposed method is tested with the Case Western Reserve University (CWRU) data set with an accuracy of 99.77%. In the application of actual working conditions, the model is also superior to other methods, and the diagnostic accuracy is 93.90%.

The rest of the paper is organized as follows: Section 2 presents the frequency slice wavelet transform. Section 3 introduces the ResNet model used in this proposed method. Section 4 introduces the structure of the deformable convolution layer and the structure of the DC-ResNet. Section 5 introduces the fault diagnosis process, parameters of the DC-ResNet and model hyperparameters proposed. The experimental validation of the method and the analysis of the experimental results are presented in Section 6. Finally, the conclusions are shown in Section 7.

2. Data preprocessing

The common method of rolling bearing fault diagnosis is based on vibration signals. In traditional bearing fault diagnosis, the one-dimensional time-domain vibration signal is usually spliced into the two-dimensional signal, and the splicing will lead the vibration signal addition of two-dimensional information, which is not available in the original time-domain vibration signal. As the input of the ResNet, the two-dimensional information created artificially will affect the feature extraction of the signal by the ResNet, thus affecting the accuracy of the fault diagnosis. The FSWT have advantages over STFT and wavelet transform, can realize signal filtering and segmentation flexibly ^[42], and the original time-domain signal can be converted to a time-frequency signal. Due to the rolling bearing fault signals of the printing press, the collection is a signal with strong noise, and the time-frequency domain features of the signal can be better observed by FSWT. Therefore, in this proposed method, FSWT is used to preprocess the bearing vibration signals of the printing press.

2.1. Frequency slice wavelet transform

Let the signal $f(t) \in L^2(R)$ , if $\widehat{p}\left(\omega \right)$ is the Fourier transform of $p\left(t\right)$ , then its FSWT is defined in the frequency domain as ^[43].

${W}_{f}\left(t, \omega , \lambda , \sigma \right) = \frac{1}{2\pi }\lambda {\int }_{-\infty }^{+\infty }f\left(u\right)\widehat{p}*\left(\frac{u-\omega }{\sigma }\right){e}^{iut}du ,$

(2.1)

where $\lambda$ is the energy factor $\left(\lambda \ne 0\right)$ , scale $\sigma$ is either an invariant or a function of $\omega$ , $t$ and $u$ , "*" is the conjugate of the function, $\omega$ and $t$ are observation frequency and observation time, $u$ is the frequency of assessment and $\widehat{p}\left(\omega \right)$ is the FSF. By the Parseval equation, the above equation can be transformed to the time domain as follows:

${W}_{f}\left(t, \omega , \lambda , \sigma \right) = \sigma \lambda {\epsilon }^{i\omega t}{\int }_{-\infty }^{\infty }f\left(\tau \right){e}^{-iw\tau }p*\left(\sigma \left(\tau -t\right)\right)d\tau$

(2.2)

2.2. Determination of scale factor

In general, taking $\lambda = 1$ , let $\sigma = \omega /\kappa$ , $\kappa > 0$ , then the Eq (2.1) can be rewritten as:

$W\left(t, \omega , \kappa \right) = \frac{1}{2\pi }{\int }_{-\infty }^{+\infty }\widehat{f}\left(u\right)\widehat{p}\left(\kappa \frac{u-\omega }{\omega }\right){e}^{iut}du$

(2.3)

Among them, $\kappa$ is unrelated to $\omega$ , and $u$ is mainly used to adjust the sensitivity of the FSWT to frequency or time, and it is called the time-frequency resolution factor. From the Heisenberg uncertainty principle, it is not possible to obtain a high resolution in both the frequency and time domains, so we then used $\sigma$ and $\omega$ to estimate the time-frequency resolution factor, while introducing two evaluation coefficients: the frequency resolution ratio η and the amplitude expectation response ratio $v\left(0 < v\le 1\right)$ , where $v$ is usually taken as $\sqrt{2}/\mathrm{2, 0.5, 0.25}$ , etc.

2.3. Frequency slicing wavelet inversion

The inversion of frequency slice wavelets can reconstruct the original signal in many different ways, and the commonly used forms of inverse transform are as follows:

$f\left(t\right) = \frac{1}{2\pi \lambda }{\int }_{-\infty }^{+\infty }{\int }_{-\infty }^{+\infty }W\left(\tau , \omega , \lambda , \sigma \right){e}^{i\omega \left(t-\tau \right)}d\tau d\sigma$

(2.4)

The Eq (2.4) shows that the inverse transform is independent on $p\left(t\right)$ , $p\left(\omega \right)$ and $\sigma$ , and the fast Fourier transform algorithm can directly obtain the reconstructed signal. The FSWT realizes time-frequency analysis of the signal, which can filter and segment the signal components in any frequency.

3. Residual neural network

ResNet was proposed by He of Microsoft Research in 2015 ^[44]. The ResNet solves the training difficulties caused by increasing the depth of the network, and its network performance is far better than the traditional network models.

Suppose a regular convolution neural network has $L$ layers, where the input of layer $i$ ( $i\in \mathrm{1, 2}, ..., L$ ) is ${x}^{i}$ and its corresponding parameter is ${w}^{i}$ , and the output of this layer is ${y}^{i} = {x}^{i}+1$ . For the sake of simplicity of presentation, ignoring the number of layers and bias, the relationship between them can be expressed as Eq (2.5):

$y = F\left(x, {w}_{f}\right) ,$

(2.5)

where $F$ is the nonlinear activation function and ${w}_{f}$ is the convolution operation. The depth of ResNet can be expressed by Eq (2.6):

$y = F\left(x, {w}_{f}\right)+x$

(2.6)

A simple deformation of Eq (2.6) yields Eq (2.7):

$F\left(x, {w}_{f}\right) = y-x$

(2.7)

The function $F$ that the network needs to learn is actually the residual term $y-x$ at the right end of the formula, called the residual function.

As is shown in , there are two branches in the residual learning module: The residual function F(x) and the input constant mapping x. These two branches are integrated by adding the corresponding elements through the nonlinear transformation activation function ReLU, then forms the whole residual learning module, and the structure formed by stacking multiple residual modules is called "ResNet". This deep learning framework adds residual connectivity, which is more convenient than the original mapping without residual connectivity. It uses a stacked nonlinear combination to fit a constant mapping, making its residuals more likely to converge to zero. Optimizing the residual function $y = F\left(x, {w}_{f}\right)+x$ is easier than optimizing a complex nonlinear mapping $F\left(x, {w}_{f}\right) = y-x$ , when the network layers are deep enough.

Figure 1. Residual learning modules.

DC-ResNet Structure	Output Dimension
Input Layer	(190,150, 3)
Convolution Layer 1	(184,144, 64)
Max Pooling	(92, 72, 64)
Residual Block 1	(92, 72, 64)
Residual Block 2	(92, 72, 64)
Residual Block 3	(92, 72,128)
Residual Block 4	(92, 72,128)
Residual Block 5	(92, 72,256)
Residual Block 6	(92, 72,256)
Residual Block 7	(92, 72,512)
Residual Block 8	(92, 72,512)
Global Average Pooling	512
Output Layer	10

Preprocessing Methods	Number of Training Samples/Test Samples	Average Accuracy (%)
FSWT	3000/1000	99.77
Morlet Wavelet Transform		88.72
Grayscale		74.92

Learning Rate	Batch Size	Epoch	Average Accuracy (%)
0.1	16	30	25.26
0.1	32	50	28.25
0.1	64	70	24.59
0.01	16	30	94.99
0.01	32	50	96.25
0.01	64	70	92.68
0.001	16	30	93.56
0.001	32	50	99.83
0.001	64	70	99.72
0.0001	16	30	85.33
0.0001	32	50	91.69
0.0001	64	70	87.16

Experimental Methods	Input Samples	Average Accuracy (%)	Computing time(s)
DC-ResNet	TFD of FSWT	99.83	961.2
ResNet18		99.77	837.6
DC-ResNet50		99.83	9279.6
ResNet50		99.82	1300.2
DC-ResNet101		99.76	18199.2
ResNet101		99.70	1832.0

Bearing type	Inner diameter /mm	Outer diameter /mm	Width /mm	Weight /g
JYB6004	20	42	12	69

Method	Preprocessing Method	Average Accuracy (%)
DC-ResNet	FSWT	93.90%
DC-ResNet	Grayscale	77.52%
ResNet18	FSWT	89.98%
ResNet18	Grayscale	67.14%
AlexNet	FSWT	83.27%
AlexNet	Grayscale	68.20%
LSTM	FSWT	83.36%
LSTM	Grayscale	67.15%

[1]	J. Hipp, T. Flotte, J. Monaco, J. Cheng, A. Madabhushi, Y. Yagi, et al., Computer aided diagnostic tools aim to empower rather than replace pathologists: Lessons learned from computational chess, J. Pathol. Inform., 2 (2011), 25. https://doi.org/10.4103/2153-3539.82050 doi: 10.4103/2153-3539.82050
[2]	Z. Q. Zhao, P. Zheng, S. T. Xu, X. Wu, Object detection with deep learning: A review, IEEE Transact. Neural Networks Learn. Syst., 30 (2019), 3212–3232. https://doi.org/10.1109/icABCD49160.2020.9183866 doi: 10.1109/icABCD49160.2020.9183866
[3]	Z. Liu, L. Jin, J. Chen, Q. Fang, S. Ablameyko, Z. Yin, et al., A survey on applications of deep learning in microscopy image analysis, Comput. Biol. Med., 134 (2021), 104523. https://doi.org/10.1109/TNNLS.2017.2766168 doi: 10.1109/TNNLS.2017.2766168
[4]	C. Matek, S. Schwarz, K. Spiekermann, C. Marr, Human-level recognition of blast cells in acute myeloid leukaemia with convolutional neural networks, Nat. Machine Intell., 1 (2019), 538–544. https://doi.org/10.1038/s42256-019-0101-9 doi: 10.1038/s42256-019-0101-9
[5]	B. Midtvedt, J. Pineda, F. Skärberg, E. Olsén, H. Bachimanchi, E. Wesén, et al., Single-shot self-supervised object detection in microscopy, Nat. Commun., 13 (2022), 7492. https://doi.org/10.1038/s41467-022-35004-y doi: 10.1038/s41467-022-35004-y
[6]	C. Li, L. Li, H. Jiang, K. Weng, Y. Geng, L. Li, et al., YOLOv6: A single-stage object detection framework for industrial applications, arXiv: 2209.02976, 2022. https://doi.org/10.48550/arXiv.2209.02976
[7]	C.-Y. Wang, A. Bochkovskiy, H.-Y. M. Liao, YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors, in CVF Conference on Computer Vision and Pattern Recognition, 2023, 7464–7475. https://doi.org/10.1109/CVPR52729.2023.00721
[8]	Z. Liu, H. Zhang, L. Jin, J. Chen, A. Nedzved, S. Ablameyko, et al., U-Net-based deep learning for tracking and quantitative analysis of intracellular vesicles in time-lapse microscopy images, J. Innov. Opt. Health Sci., 15 (2022), 2250031. https://doi.org/10.1142/S1793545822500316 doi: 10.1142/S1793545822500316
[9]	C. Sun, A. Shrivastava, S. Singh, A. Gupta, Revisiting unreasonable effectiveness of data in deep learning era, in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 843–852. https://doi.org/10.1109/ICCV.2017.97
[10]	V. Cheplygina, M. de Bruijne, J. P. W. Pluim, Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis, Med. Image Anal., 54 (2019), 280–296. https://doi.org/10.1016/j.media.2019.03.009 doi: 10.1016/j.media.2019.03.009
[11]	A. Bilodeau, C. V. L. Delmas, M. Parent, P. De Koninck, A. Durand, F. Lavoie-Cardinal, Microscopy analysis neural network to solve detection, enumeration and segmentation from image-level annotations, Nat. Mach. Intell., 4 (2022), 455–466. https://doi.org/10.1038/s42256-022-00472-w doi: 10.1038/s42256-022-00472-w
[12]	A. Halevy, P. Norvig, F. Pereira, The unreasonable effectiveness of data, IEEE Intell. Syst., 24 (2009), 8–12. https://doi.org/10.1109/MIS.2009.36 doi: 10.1109/MIS.2009.36
[13]	H. Zhang, M. Cisse, Y. N. Dauphin, D. Lopez-Paz, mixup: Beyond empirical risk minimization, in International Conference on Learning Representations (ICLR), 2018.
[14]	S. Yun, D. Han, S. Chun, S. J. Oh, Y. Yoo, J. Choe, CutMix: Regularization strategy to train strong classifiers with localizable features, in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 6022–6031. https://doi.org/10.1109/ICCV.2019.00612
[15]	T. Devries, G. W. Taylor, Improved regularization of convolutional neural networks with cutout, arXiv: 1708.04552, 2017. https://doi.org/10.48550/arXiv.1708.04552
[16]	Z. Zhong, L. Zheng, G. Kang, S. Li, Y. Yang, Random erasing data augmentation, in Proceedings of the AAAI Conference on Artificial Intelligence, 2020. https://doi.org/10.1609/aaai.v34i07.7000
[17]	S. Liu, L. Qi, H. Qin, J. Shi, J. Jia, Path aggregation network for instance segmentation, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8759–8768. https://doi.org/10.1109/CVPR.2018.00913
[18]	R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587. https://doi.org//10.1109/CVPR.2014.81
[19]	K. Grauman, T. Darrell, The pyramid match kernel: Discriminative classification with sets of image features, in Tenth IEEE International Conference on Computer Vision (ICCV'05), 1 (2005), pp.1458–1465. https://doi.org/10.1109/ICCV.2005.239
[20]	T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 936–944. https://doi.org/10.1109/CVPR.2017.106
[21]	R. Girshick, Fast R-CNN, in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448. https://doi.org/10.1109/ICCV.2015.169
[22]	J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-time object detection, in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788. https://doi.org/10.1109/CVPR.2016.91
[23]	M. Tan, R. Pang, Q. V. Le, EfficientDet: Scalable and efficient object detection, arXiv: 1911.09070, 2019. https://doi.org/10.1109/CVPR42600.2020.01079
[24]	A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet classification with deep convolutional neural networks, Commun. ACM, 60 (2012), 84–90. https://doi.org/10.1145/3065386 doi: 10.1145/3065386
[25]	G. Jocher, A. Stoken, A. Chaurasia, J. Borovec, Y. Kwon, K. Michael, et al., ultralytics/yolov5: v6. 0—YOLOv5n 'Nano'models, Roboflow integration, TensorFlow export, OpenCV DNN support, Zenodo Tech. Rep., (2021).
[26]	W. Ouyang, C. F. Winsnes, M. Hjelmare, A. J. Cesnik, L. Åkesson, H. Xu, et al., Analysis of the Human Protein Atlas Image Classification competition, Nat. Methods, 16 (2019), 1254–1261. https://doi.org/10.1038/s41592-019-0658-6 doi: 10.1038/s41592-019-0658-6
[27]	R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: Visual explanations from deep networks via gradient-based localization, in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618–626. https://doi.org/10.1109/ICCV.2017.74
[28]	N. Dvornik, J. Mairal, C. Schmid, Modeling visual context is key to augmenting object detection datasets, in European Conference on Computer Vision (ECCV) 2018, Springer International Publishing, Cham, 2018, pp. 375–391. https://doi.org/10.1007/978-3-030-01258-8_23
[29]	P. Pérez, M. Gangnet, A. Blake, Poisson image editing, ACM Trans. Graph., 22 (2003), 313–318. https://doi.org/10.1145/1201775.882269 doi: 10.1145/1201775.882269
[30]	C.C. Pharmacopoeia, Pharmacopoeia of the People's Republic of China, 2010.
[31]	J. Redmon, A. J. A. P. A. Farhadi, Yolov3: An incremental improvement, arXiv: 1804.02767. 2018. https://doi.org/10.48550/arXiv.1804.02767
[32]	S. Qiao, L. C. Chen, A. Yuille, DetectoRS: Detecting objects with recursive feature pyramid and switchable atrous convolution, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 10208–10219. https://doi.org/10.1109/CVPR46437.2021.01008
[33]	X. Zhu, W. Su, L. Lu, B. Li, X. Wang, J. Dai, Deformable DETR: Deformable transformers for end-to-end object detection, in International Conference on Learning Representations, 2021.
[34]	S. Zhang, C. Chi, Y. Yao, Z. Lei, S. Z. Li, Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. https://doi.org/10.1109/cvpr42600.2020.00978
[35]	H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, et al., DINO: DETR with improved denoising anchor boxes for end-to-end object detection, arXiv: 2203.03605, 2022. https://doi.org/10.48550/arXiv.2203.03605
[36]	Z. Chen, C. Yang, J. Chang, F. Zhao, Z. J. Zha, F. Wu, DDOD: Dive deeper into the disentanglement of object detector, IEEE Transact. Mult., (2023), 1–15. https://doi.org/10.1109/TMM.2023.3264008 doi: 10.1109/TMM.2023.3264008
[37]	B. Zhu, J. Wang, Z. Jiang, F. Zong, S. Liu, Z. Li, et al., AutoAssign: Differentiable label assignment for dense object detection, arXiv: 2007.03496, 2020. https://doi.org/10.48550/arXiv.2007.03496
[38]	X. Zhu, H. Hu, S. Lin, J. Dai, Deformable ConvNets V2: More deformable, better results, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9300–9308. https://doi.org/10.1109/CVPR.2019.00953
[39]	K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, et al., MMDetection: Open MMLab detection toolbox and benchmark, arXiv: 1906.07155, 2019. https://doi.org/10.48550/arXiv.1906.07155

Mathematical Biosciences and Engineering

Boosting microscopic object detection via feature activation map guided poisson blending

Related Papers:

Abstract

1. Introduction

2. Data preprocessing

2.1. Frequency slice wavelet transform

2.2. Determination of scale factor

2.3. Frequency slicing wavelet inversion

3. Residual neural network

4. Deformable convolution based on ResNet

5. Fault diagnosis model based on DC-ResNet

6. Experiment analysis

6.1. Case 1: bearing data set of case western reserve university

6.1.1. Data description

6.1.2. Results and analysis

6.2. Case 2: printing press bearing data set

6.2.1. Data description

6.2.2. Results and analysis

7. Conclusions

Acknowledgments

Conflict of interest

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog