Fractional calculus of generalized Lommel-Wright function and its extended Beta transform

Saima Naheed; Shahid Mubeen; Thabet Abdeljawad; Saima Naheed; Shahid Mubeen; Thabet Abdeljawad

doi:10.3934/math.2021479

AIMS Mathematics

2021, Volume 6, Issue 8: 8276-8293. doi: 10.3934/math.2021479

Previous Article Next Article

Research article Special Issues

Fractional calculus of generalized Lommel-Wright function and its extended Beta transform

1.
Department of Mathematics, University of Sargodha, 40100 Sargodha, Pakistan
2.
Department of Mathematics and General Sciences, Prince Sultan University, 12345, Riyadh, Saudi Arabia
3.
Department of Medical Research, China Medical University, 40402, Taichung, Taiwan
4.
Department of Computer Science and Information Engineering, Asia University, 40402, Taichung, Taiwan

Received: 20 March 2021 Accepted: 11 May 2021 Published: 28 May 2021
MSC : 26A33, 33B15, 33C20, 44A20

In this work, we apply generalized Saigo fractional differential and integral operators having $k$ -hypergeometric function as a kernel, to extended Lommel-Wright function. The results are communicated in the form of the k-Wright function and are utilized to compute beta transform. The novelty and the generalization of the obtained results are shown by relating them with existing literature as special cases.

Keywords:

Citation: Saima Naheed, Shahid Mubeen, Thabet Abdeljawad. Fractional calculus of generalized Lommel-Wright function and its extended Beta transform[J]. AIMS Mathematics, 2021, 6(8): 8276-8293. doi: 10.3934/math.2021479

Related Papers:

[1]	Yu Wang . Bi-shifting semantic auto-encoder for zero-shot learning. Electronic Research Archive, 2022, 30(1): 140-167. doi: 10.3934/era.2022008
[2]	Li Sun, Bing Song . Feature adaptive multi-view hash for image search. Electronic Research Archive, 2023, 31(9): 5845-5865. doi: 10.3934/era.2023297
[3]	Shixiong Zhang, Jiao Li, Lu Yang . Survey on low-level controllable image synthesis with deep learning. Electronic Research Archive, 2023, 31(12): 7385-7426. doi: 10.3934/era.2023374
[4]	Xingsi Xue, Miao Ye . Interactive complex ontology matching with local and global similarity deviations. Electronic Research Archive, 2023, 31(9): 5732-5748. doi: 10.3934/era.2023291
[5]	Hui-Ching Wu, Yu-Chen Tu, Po-Han Chen, Ming-Hseng Tseng . An interpretable hierarchical semantic convolutional neural network to diagnose melanoma in skin lesions. Electronic Research Archive, 2023, 31(4): 1822-1839. doi: 10.3934/era.2023094
[6]	Shuangjie Yuan, Jun Zhang, Yujia Lin, Lu Yang . Hybrid self-supervised monocular visual odometry system based on spatio-temporal features. Electronic Research Archive, 2024, 32(5): 3543-3568. doi: 10.3934/era.2024163
[7]	Guozhong Liu, Qiongping Tang, Changnian Lin, An Xu, Chonglong Lin, Hao Meng, Mengyu Ruan, Wei Jin . Semantic segmentation of substation tools using an improved ICNet network. Electronic Research Archive, 2024, 32(9): 5321-5340. doi: 10.3934/era.2024246
[8]	Wu Zeng, Heng-liang Zhu, Chuan Lin, Zheng-ying Xiao . A survey of generative adversarial networks and their application in text-to-image synthesis. Electronic Research Archive, 2023, 31(12): 7142-7181. doi: 10.3934/era.2023362
[9]	Chengyong Yang, Jie Wang, Shiwei Wei, Xiukang Yu . A feature fusion-based attention graph convolutional network for 3D classification and segmentation. Electronic Research Archive, 2023, 31(12): 7365-7384. doi: 10.3934/era.2023373
[10]	Jing Lu, Longfei Pan, Jingli Deng, Hongjun Chai, Zhou Ren, Yu Shi . Deep learning for Flight Maneuver Recognition: A survey. Electronic Research Archive, 2023, 31(1): 75-102. doi: 10.3934/era.2023005

Abstract

1. Introduction

Content-based image retrieval (CBIR) has been widely studied in the past decade ^[1]. Due to computational and memory constraints, these methods are unable to deal with large-scale data. In recent years, the large-scale of and ever-growing nature of online image data makes approximate nearest neighbor (ANN) search popular in image semantic retrieval tasks^[2,3,4,5]. For ANN search, most research efforts have been devoted to developing two promising binarization solutions, such as learning to hash (L2H) ^{[6,7,8,9,10,11,12,13]} and learning to quantization (L2Q) ^{[4,5,14,15,16,17,18]}. By encoding real-valued images into binary codes, hashing based methods or quantization based methods can achieve efficient storage and retrieval of image data in a large-scale database.

L2H based methods mainly aim to map high-dimensional data into a low-dimensional Hamming space while preserving the data similarities or the semantic information. L2Q based methods mainly aim to approximate feature representation using a quantizer (i.e., sign funciton) ^[4,5,11,14] or approximate the high-dimensional data with a set of learned quantizers (i.e., different codebooks) ^{[15,16,17,18]}. Recent studies ^[5,16,17,18] indicate that L2Q based methods perform generally better than L2H methods for image semantic retrieval tasks. The reason may be that L2Q methods can control the quantization error until the statistically minimized error is arrived. Therefore, L2Q methods can generate higher quality of binary codes than L2H methods. Generally speaking, the encoding time and retrieval efficiency of quantization methods are slightly more costly than hashing methods ^[16].

Figure 1. An illustrative diagram of the comparison of the existing zero-shot hashing framework and the proposed one.

DownLoad: Full-Size Img PowerPoint

It should be noticed that existing ANN search approaches are based on the hypothesis that the concepts of both database samples and query samples are seen at the training stage. However, the hypothesis can be violated with the explosive growth of web data because a fast growing number of images with the new semantic concepts spring up on the web. For the fast growing new concepts, it seems almost impossible to annotate sufficient training data timely, and unrealistic to retrain the model over and over again. Existing ANN search approaches yield poor retrieval performance because they tend to recognize the images of unseen categories as one of the seen categories. Therefore, the generalization ability of the model is essential for solving the retrieval problem of the unseen concepts.

To alleviate the problem mentioned above, zero-shot learning (ZSL) techniques ^[19,20,21] assume both seen classes and unseen classes share a common semantic space where all the classes reside. The shared semantic space can be characterized by attributes ^[22], word2vec ^[23] or WordNet ^[24]. In the zero-shot classification task, the image classes in the training set and the test set are referred to as seen classes and unseen classes respectively. During the test phase, the image from the unseen class is assigned to the nearest class embedding vector in the shared space by a simple nearest neighbor search strategy. Although ZSL techniques have achieved progress in zero-shot image classification, zero-shot image retrieval has not yet been well explored.

Recently, zero-shot learning techniques have been introduced into learning to hash to improve the generalization ability of the hashing model ^[25]. SitNet ^[25] incorporates a semantic embedding loss and a regularized center loss into a multi-task architecture to capture the semantic structure in the semantic space. To facilitate knowledge transferring and reduce the quantization error in the training process, some quantization based methods ^[26,27] propose to simultaneously transfer the semantic information to binary codes and control the quantization error between low-dimensional feature representations and learned binary codes. However, a significant disadvantage of these methods is that the minimization of the quantization error in the training process is still unsatisfactory. Moreover, the inconsistency of the visual space and semantic space has not been considered sufficiently, which can increase the risk of the overfitting the seen classes and reduce the expansibility of the training model to the unseen classes ^[28]. Last but not least, the works in ^[26,27] utilize the semantic space as the embedding space, which means projecting the visual feature vectors or hash codes into the semantic space. This will shrink the variance of the projected data points and thus result in higher hubness (i.e., the projected data points will be closer to each other on average) ^[20]. In turn, the hubness problem in the semantic space can decrease the semantic transfer ability of the visual feature vectors or hash codes for the zero-shot image retrieval task.

In this paper, we propose a novel deep quantization network with visual-semantic alignment (VSAQ) for efficient zero-shot image retrieval. Specifically, we design a deep quantization network architecture which consists of the following components: $1)$ an image feature network to generate discriminative and polymeric image representations for facilitating the visual-semantic alignment and guiding the semantic embedding more easily; $2)$ a semantic embedding network to maximize the compatibility score between the image and semantic vectors for knowledge transfer; $3)$ a quantization loss layer to control the quantization error of image representation and generate high quality of binary codes for visual-semantic alignment and alleviating the hubness problem. We compare the proposed method with several state-of-the-art methods on several benchmark datasets and the experimental results validate the superiority of the proposed method.

The remainder of this paper is organized as follows: related work is reviewed in Section 2 and we illustrate the proposed method in Section 3. Evaluation on three commonly used benchmark datasets is described in Section 4, followed by conclusions in Section 5.

2. Related work

2.1. Hashing for retrieval

Due to the ever-growing amount of image data on the internet, hashing has become a popular technique for image retrieval. Generally, we can divide existing hashing approaches into two categories: data-independent and data-dependent hashing methods. Data-independent hashing methods map the data points from the original feature space into a binary code space by using random projections as hash functions. Representative data-independent hashing methods include Locality Sensitive Hashing (LSH) ^[3]. These methods provide theoretical guarantees for mapping the nearby data points into the same hash codes with high probabilities. However, they need long binary codes to achieve high precision. Data-dependent hashing methods learn hash functions and compact binary codes from training data. Typical data-dependent hashing methods include spectral hashing (SH) ^[6], anchor graph hashing (AGH) ^[7], supervised hashing with kernels (KSH) ^[8], supervised discrete hashing (SDH) ^[9] and column sampling based discrete supervised hashing (COSDISH) ^[10]. Recently, benefiting from the power of deep convolutional networks, deep hashing methods which integrate feature learning and hash-code learning into the same end-to-end framework have been proposed to further improve the semantic retrieval performance. Typical deep hashing methods include convolutional neural network hashing deep pairwise supervised hashing (DPSH) ^[5], deep supervised discrete hashing (DSDH) ^[29], deep supervised hashing (DSH) ^[13], and deep hashing network (DHN) ^[12]. Although there has been success in semantic image retrieval, most existing hashing methods fail on zero-shot image retrieval, due to the low generalization ability of learned hashing models for unseen concepts.

2.2. Quantization for retrieval

Quantization-based methods attempt to control the quantization error of the feature representations using a quantizer (i.e., sign funciton) ^{[4,5,11,14,30]} or approximate the high-dimensional data with a set of learned quantizers (i.e., different codebooks) ^{[15,16,17,18]}. For example, ^[4,5,14] and ^[30] try to minimize the Euclidean distance and the cosine distance between continuous representations and their signed binary codes respectively. Alternatively, ^[11] utilizes a sequence of smoothing activation functions to gradually approach the sign function. Although the quantization error can be controlled using a single quantizer, it is not statistically minimized for generating high-quality binary codes. To further reduce the quantization error, ^{[15,16,17,18]} utilize the vector quantization (VQ) technique ^[31] to improve the accuracy and efficiency of the quantification process. Benefiting from the power of VQ, the retrieval performance has been improved significantly. However, these methods focus on traditional image retrieval (i.e., the concepts of all samples are seen in the training set), and how to integrate them into zero-shot image retrieval is still an open problem.

2.3. Zero-shot learning

Zero-shot learning recognizes unseen or novel classes that did not appear in the training stage ^[19,20,21]. The zero-shot learning framework learns a compatible visual-semantic embedding space and utilizes the learned embedding space as an intermediate to accomplish the zero-shot image classification task. The method in ^[20] utilizes a latent space as the visual-semantic embedding space and introduces the least square loss between the embedded visual features and the embedded semantic vectors to cope with the hubness problem. The method in ^[21] utilizes the semantic space as the visual-semantic embedding space and introduce an image feature structure constraint and a semantic embedding structure constraint to learn structure-preserving image features and improve the generalization ability of the learned embedding space respectively. Recently, some works ^[25,26,27] attempt to utilize the zero-shot learning for solving the zero-shot image retrieval problem. The method in ^[26] projects the binary codes to the semantic space with the ridge regression formulation, which can exacerbate the hubness problem. However, the quantization error is not statistically minimized and the inconsistency of the visual space and semantic space has not been considered sufficiently.

3. Deep quantization network with visual-semantic alignment for zero-shot image retrieval

3.1. Problem definition

We follow the definition of zero-shot image retrieval in ^[25,26]. The training set is defined as $\mathcal{S}\equiv\{x_{i}^{s}, y_{i}^{s}, a_{i}^{s}\}_{i = 1}^{n_{s}}$ . Each image $x_{i}^{s}\in\mathcal{X}_{\mathcal{S}}$ is associated with a corresponding class label $y_{i}^{s}\in\mathcal{Y}_{\mathcal{S}}$ . Similarly, the test set is defined as $\mathcal{U}\equiv\{x_{j}^{u}, y_{j}^{u}, a_{j}^{u}\}_{i = j}^{n_{u}}$ . Each image $x_{j}^{u}\in\mathcal{X}_{\mathcal{U}}$ is associated with a corresponding class label $y_{j}^{u}\in\mathcal{Y}_{\mathcal{U}}$ . The side information matrix $\textbf{A}\in \mathbb{R}^{r\times (|\mathcal{Y}_{\mathcal{S}}|+|\mathcal{Y}_{\mathcal{U}}|)}$ is obtained from the user-defined attributes or word2vec to transfer knowledge across concepts. The side information of image $x_{i}^{s}$ can be denoted as $a_{i}^{s} = \textbf{A}_{y_{i}^{s}}$ , which corresponds to the $y_{i}^{s}$ -th column of $\textbf{A}$ . According to the setting of zero-shot learning, $\mathcal{Y}_{\mathcal{S}}\cap\mathcal{Y}_{\mathcal{U}} = \emptyset$ , i.e., the seen classes are disjoint from the unseen classes. The goal of zero-shot hashing is to predict the binary codes of images from both seen classes and unseen classes.

3.2. Network architecture

As illustrated in , the proposed architecture mainly consists of three different components: $1)$ the image feature network (FNet) for learning discriminative and polymeric image representations; $2)$ the embedding network (Enet) for learning an embedding space to associate the visual information with the semantic information; and $3)$ the quantization loss layer for controlling coding quality, aligning the visual and semantic information and alleviating the hubness problem.

Figure 2. An overall architecture overview of the proposed VSAQ model. We use the label information to learn the image representations with discrimination and polymerization via the image feature network (FNet), and then input the image representations into the image embedding network (ENet) for improving the compatibility between the visual features and the semantic vectors. The semantic vectors are mapped to the visual space and aligned with the corresponding image representations via a collective quantization framework for alleviating the hubness problem.

DownLoad: Full-Size Img PowerPoint

3.2.1. The image feature network (FNet)

The image feature network (FNet) aims to learn the semantic image representations with discrimination and polymerization. We adopt AlexNet ^[32] as the base network using the layers from conv1 to fc7 and replace fc8 with a $q$ -dimensional fully-connected layer (4096-128). In addition, the $tanh(\cdot)$ activation function and an L2 Normalization Layer are added to enhance the nonlinear representation ability and constrain the range of the output features. Inspired by ^[33], a variant of the softmax loss is utilized to increase the discrimination of inter-class features and the compactness of intra-class features as follows:

$\begin{equation} \begin{aligned} \mathcal{L}_{f} = -\frac{1}{n}\sum\limits_{i = 1}^{n}\text{log}\frac{\text{exp}(\gamma_{1}\langle\phi_{f}(x_{i}^{s}), \hat{\textbf{c}}_{k}\rangle)}{\sum\limits_{j = 1}^{|\mathcal{Y}_{\mathcal{S}}|}\text{exp}(\gamma_{1}\langle\phi_{f}(x_{i}^{s}), \hat{\textbf{c}}_{j}\rangle)} \end{aligned} \end{equation}$

(3.1)

where $\hat{\textbf{c}}_{j}$ denotes the centroid of the features associated with the $j$ -th class, and $\gamma_{1}$ is set to 10 in all experiments. The $\phi_{f}(x)$ refers to the output of the FNet. Under the guidance of the label information $\mathcal{Y}_{\mathcal{S}}$ of the seen classes, the FNet can learn semantic-preserving image representations. In addition, the following image embedding network can learn the visual-semantic embedding space more easily with the help of such semantic-preserving image representations. Finally, it can assist the visual-semantic alignment more easily.

3.2.2. The semantic embedding network (ENet)

The embedding network (ENet) aims to learn an embedding space to associate the visual information with the semantic information. According to most of previous ZSL methods, we utilize the semantic space of $\mathcal{A}$ as the visual-semantic embedding space, i.e., projecting the outputs of FNet into the semantic space. Therefore, the ENet is constructed by an $r$ -dimensional fully-connected layer (128-d) followed by the $tanh(\cdot)$ activation function and an L2 Normalization Layer, where $r$ denotes the length of the semantic vectors. We use the following inner product to define the compatibility score between the visual embedding $\phi_{e}(x)$ and the semantic vector $a^{y}$ . Similar to traditional image classification tasks, we replace the classification score with the compatibility score in the following softmax loss:

$\begin{equation} \begin{aligned} \mathcal{L}_{e} = -\frac{1}{n}\sum\limits_{i = 1}^{n}\text{log}\frac{\text{exp}(\gamma_{2}\langle\phi_{e}(x_{i}^{s}), \hat{\textbf{a}}^{s}_{k}\rangle)}{\sum\limits_{j = 1}^{|\mathcal{A}_{\mathcal{S}}|}\text{exp}(\gamma_{2}\langle\phi_{e}(x_{i}^{s}), \hat{\textbf{a}}^{s}_{j}\rangle)} \end{aligned} \end{equation}$

(3.2)

where $\phi_{e}(x_{i}^{s})$ denotes the output of the ENet, $\hat{\textbf{a}}^{s}_{j}$ denotes the L2-normalized side information (attribute or word2vec) associated with the $j$ -th class and $\gamma_{2}$ is set to 10 in all experiments.

3.2.3. Deep quantization with visual-semantic alignment

The acquirement of the semantic information is independent of visual samples. Therefore, the class structures between the visual space and semantic space are usually inconsistent. For example, the concepts of 'cat' and 'dog' locate quite close to each other in the semantic space, while the appearance features of 'cat' and 'dog' are far away from each other in the visual space. If we only use the semantic space as the visual-semantic embedding space, the mapped visual embeddings can be collapsed to hubs ^[34], i.e., nearest neighbours to many other projected visual feature representation vectors. To alleviate the hubness problem, we map the semantic information to the visual space and align the projected semantic vectors with the visual features in the visual space using a collective quantization framework.

Specifically, we use a matrix $\textbf{W}\in \mathbb{R}^{r\times q}$ to map the L2-normalized semantic vectors to the visual space. The semantic image representations $\phi_{f}(x_{i})$ and the corresponding mapped semantic vectors $\textbf{W}^{T}\hat{\textbf{a}}_{j}$ are quantized using two codebooks $\textbf{C} = [\textbf{C}_{1}, \cdots, \textbf{C}_{M}]$ and $\textbf{D} = [\textbf{D}_{1}, \cdots, \textbf{D}_{M}]$ respectively. Each sub-codebook $\textbf{C}_{m}$ (or $\textbf{D}_{m}$ ) consists of $K$ codewords $\textbf{C}_{m} = [\textbf{C}_{m1}, \cdots, \textbf{C}_{mK}]$ where the $k$ -th codeword $\textbf{C}_{mk}$ corresponds to a $q$ -dimensional vector. The basic idea for visual-semantic alignment is to learn two codebooks to quantize the visual features and the corresponding mapped semantic vectors into binary codes and enforce the binary codes to be the same between them. The loss function can be written as:

$\begin{equation} \begin{aligned} \mathcal{L}_{q} = &\frac{1}{n}\sum\limits_{i = 1}^{n}\|\phi_{f}(x_{i}^{s})-\sum\limits_{m = 1}^{M}\textbf{C}_{m}\textbf{b}_{mi}\|^{2}+\\ &\frac{1}{n}\sum\limits_{i = 1}^{n}(\|\textbf{W}^{T}\hat{\textbf{A}}_{y_{i}^{s}}-\sum\limits_{m = 1}^{M}\textbf{D}_{m}\textbf{b}_{mi}\|^{2}+\lambda\|\textbf{W}\|^{2}), \\ s.t., \; \; \; &\|\textbf{b}_{mi}\|_{0} = 1, \; \textbf{b}_{mi}\in\{0, 1\}^{K}, \end{aligned} \end{equation}$

(3.3)

where $\lambda > 0$ is a balancing parameter, and $\hat{\textbf{A}}_{y_{i}^{s}}$ is the L2-normalized semantic vector of $i$ -th image. $\|\cdot\|_{0}$ refers to the $\ell_{0}$ -norm which returns the number of the vector's non-zero values. The constraint indicates that $\{\textbf{b}_{mi}\}_{m = 1}^{K}$ are the one-of- $K$ encodings which means only one of the codeword per sub-codebook in codebooks $\textbf{C}$ and $\textbf{D}$ can be activated to approximate the semantic image representations $\phi_{f}(x)$ and the corresponding mapped semantic vectors $\textbf{W}^{T}\hat{\textbf{a}}_{j}$ . Each one-of- $K$ encodings $\{\textbf{b}_{mi}\}_{m = 1}^{M}$ can be compressed in $\text{log}_{2}K$ bits. We can obtain compact binary codes with $B = M\text{log}_{2}K$ bits by concatenating all $M$ compressed encodings. The one-of- $K$ encodings $\{\textbf{b}_{mi}\}_{m = 1}^{M}$ play the key role to align the visual space and the semantic space, thus the consistency of the class structures can be guaranteed in the two spaces.

The final objective function for training the whole network is constructed by aggregating all the loss functions as follows:

$\begin{equation} \begin{aligned} \mathcal{L} = \mathcal{L}_{f}+\alpha\mathcal{L}_{e}+\beta\mathcal{L}_{q} \end{aligned} \end{equation}$

(3.4)

where $\alpha$ and $\beta$ are two hyperparameters to balance the influence of different terms.

Approximate nearest neighbor search with the inner product distance is a powerful tool for quantization techniques. Given an unseen image query $x_{q}$ and the binary codes of database points $\{\textbf{b}_{n} = [\textbf{b}_{1n}; \cdots; \textbf{b}_{Mn}]\}_{n = 1}^{N}$ , we first use the trained image feature network to obtain the image representations. Following the asymmetric search method in ^[16,17,18], we adopt the asymmetric quantizer distance (AQD) to compute the inner-product similarity between the unseen query $x_{q}^{u}$ and database point $x_{n}$ as follows:

$\begin{equation} \begin{aligned} AQD(x_{q}^{u}, x_{n}) = \sum\limits_{m = 1}^{M}{\phi_{f}(x_{q}^{u})}^{T}(\sum\limits_{m = 1}^{M}\textbf{C}_{m}\textbf{b}_{mn}) \end{aligned} \end{equation}$

(3.5)

where $\sum_{m = 1}^{M}\textbf{C}_{m}\textbf{b}_{mn}$ is used to approximate the image representation of the database point $x_{n}$ . Given an unseen query $x_{q}^{u}$ , the inner-products between $\phi_{f}(x_{q}^{u})$ and all $M$ codebooks $\{\textbf{C}_{m}\}_{m = 1}^{M}$ and all $K$ possible values of $\textbf{b}_{mn}$ can be pre-computed and stored in a $M\times K$ lookup table. Therefore, the computation of AQD between the unseen query and all database points can be speed up. Considering computational complexity, it is slightly more costly than the Hamming distance, since $M$ table lookups and additions are involved.

4. Learning algorithm

The optimization problem contains four sets of variables including the network parameters $\Theta$ , the centroid of the features $\widehat{\textbf{C}} = \{\hat{\textbf{c}}_{1}, \cdots, \hat{\textbf{c}}_{|\mathcal{Y}_{\mathcal{S}}|}\}$ , the projection matrix $\textbf{W}$ , the codebooks $\textbf{C}$ and $\textbf{D}$ , and the binary codes $\textbf{B} = [\textbf{b}_{1}, \cdots, \textbf{b}_{n}]$ . In the following optimization process, we adopt an alternating optimization strategy that updates one variable while holding fixed all other variables iteratively.

Updating $\Theta$ . We adopt the standard back-propagation algorithm with automatic differentiation techniques in Pytorch ^[35] to update the network parameters $\Theta$ .

Updating $\widehat{\textbf{C}}$ . We can update $\{\hat{\textbf{c}}_{i}\}_{i = 1}^{|\mathcal{Y}_{\mathcal{S}}|}$ as follows:

$\begin{equation} \begin{aligned} \hat{\textbf{c}}_{j} = \frac{1}{|\{y_{i}^{s}\in j\}_{i = 1}^{n_{s}}|}\sum\limits_{y_{i}^{s}\in j}\phi_{f}(x_{i}^{s}) \end{aligned} \end{equation}$

(4.1)

where $\{y_{i}^{s}\in j\}_{i = 1}^{n_{s}}$ denotes the set of samples from class $j$ .

Updating $\textbf{W}$ . We can update the projection matrix $\textbf{W}$ by optimizing the following subproblem

$\begin{equation} \begin{aligned} \min\limits_{\textbf{W}}\sum\limits_{i = 1}^{n}\|\textbf{W}^{T}\hat{\textbf{A}}_{y_{i}^{s}}-\sum\limits_{m = 1}^{M}\textbf{D}_{m}\textbf{b}_{mi}\|^{2}+\lambda\|\textbf{W}\|^{2}. \end{aligned} \end{equation}$

(4.2)

We can obtain an analytic solution for this unconstrained quadratic problem as follows:

$\begin{equation} \begin{aligned} \textbf{W} = (\hat{\textbf{A}}\textbf{Y}^{\mathcal{S}}{\textbf{Y}^{\mathcal{S}}}^{T}\hat{\textbf{A}}^{T}+\lambda \textbf{I})^{-1}\hat{\textbf{A}}\textbf{Y}^{\mathcal{S}}\textbf{B}^{T}\textbf{D}^{T} \end{aligned} \end{equation}$

(4.3)

where $\textbf{Y}^{\mathcal{S}} = [y_{1}^{s}, \cdots, y_{n}^{s}]\in\{0, 1\}^{(|\mathcal{Y}_{\mathcal{S}}|+|\mathcal{Y}_{\mathcal{U}}|)\times n}$ is the label matrix of training images with each column corresponding to a one-hot vector and $\textbf{I}$ is an identity matrix.

Updating $\textbf{C}$ . We rewrite the optimization problem w.r.t. the dictionary $\textbf{C}$ in matrix formulation as follows:

$\begin{equation} \begin{aligned} \min\limits_{\textbf{C}} \|\Phi_{f}-\textbf{C}\textbf{B}\|^{2} \end{aligned} \end{equation}$

(4.4)

where $\Phi_{f} = [\phi_{f}(x_{1}^{s}), \cdots, \phi_{f}(x_{n}^{s})]$ . We can update $\textbf{C}$ with the following analytic solution

$\begin{equation} \begin{aligned} \textbf{C} = \Phi_{f}\textbf{B}^{T}(\textbf{B}\textbf{B}^{T})^{-1}. \end{aligned} \end{equation}$

(4.5)

Updating $\textbf{D}$ . Similarly to the update method for $\textbf{C}$ , we can update $\textbf{D}$ with the following analytic solution

$\begin{equation} \begin{aligned} \textbf{D} = \textbf{W}^{T}\hat{\textbf{A}}\textbf{Y}^{\mathcal{S}}\textbf{B}^{T}(\textbf{B}\textbf{B}^{T})^{-1}. \end{aligned} \end{equation}$

(4.6)

Updating $\textbf{B}$ . We can decompose the optimization problem for $\textbf{B}$ into $n$ subproblems, since $\{\textbf{b}_{i}\}_{i = 1}^{n}$ are independent of each other. For $\textbf{b}_{i}$ , the subproblem can be written as

$\begin{equation} \begin{aligned} \min\limits_{\textbf{b}_{i}}\|\phi_{f}(x_{i}^{s})-\sum\limits_{m = 1}^{M}\textbf{C}_{m}\textbf{b}_{mi}\|^{2} + \|\textbf{W}^{T}\hat{\textbf{A}}_{y_{i}^{s}}-\sum\limits_{m = 1}^{M}\textbf{D}_{m}\textbf{b}_{mi}\|^{2} \end{aligned} \end{equation}$

(4.7)

which can be further simplified as

$\begin{equation} \begin{aligned} \min\limits_{\textbf{b}_{i}}\|\left[\begin{array}{c}\phi_{f}(x_{i}^{s}) \\ \textbf{W}^{T}\hat{\textbf{A}}_{y_{i}^{s}} \end{array}\right ]-\sum\limits_{m = 1}^{M}\left[\begin{array}{c}\textbf{C}_{m}\\\textbf{D}_{m}\end{array}\right ]\textbf{b}_{mi}\|^{2}. \end{aligned} \end{equation}$

(4.8)

Generally, the above optimization problem is NP-hard. We adopt the iterated conditional modes (ICM) algorithm ^[36] to solve $M$ indicators $\{\textbf{b}_{mi}\}_{m = 1}^{M}$ alternatively. Specifically, fixing $\{\textbf{b}_{m'i}\}_{m'\neq m}$ , we check all the elements in $\left[\begin{array}{c}\textbf{C}_{m}\\\textbf{D}_{m}\end{array}\right]$ exhaustively and find the element such that the obective function is minimized. Then, the corresponding entry of $\textbf{b}_{mi}$ is updated to 1 and the rest is updated to 0. The ICM algorithm is guaranteed to converge until the maximum iterations reached. The algorithm is summarized in Algorithm 1.

Algorithm 1 VSAQ algorithm

Input:
Training set

$\mathcal{S}\equiv\{x_{i}^{s}, y_{i}^{s}, a_{i}^{s}\}_{i = 1}^{n_{s}}$ ;
Output:
Parameter

$\Theta$ of the deep neural networks.
Initialization:
Initialize network parameter

$\Theta$ , mini-batch size

$M$ , the iteration number

$T$ ;
1: for

$epoch = 1, 2, \ldots, T$ do
2: Update

$\textbf{W}$ according to Eq (4.2);
3: Update

$\textbf{C}$ according to Eq (4.5);
4: Update

$\textbf{D}$ according to Eq (4.6);
5: Update

$\textbf{B}$ according to Eq (4.8);
6: Update the parameter

$\Theta$ by using backpropagation;
7: end for

5. Experiments

We evaluate and compare the proposed method with state-of-the-art baselines on several benchmark datasets. The proposed method is implemented with the open-source deep learning toolbox Pytorch ^[35]. All the experiments are carried out on a server with an Intel(R) Xeon(R) E5-2620 v4@2.10GHz CPU, 128GB RAM and two GeForce TITAN X GPUs with 24GB memory.

5.1. Datasets

Three widely used datasets including Animals with Attributes ^[37], CIFAR10 ^[32] and ImageNet ^[38] are adopted to evaluate the proposed method and other baselines.

Animals with Attributes: contains 30,475 images from 50 animal categories. Each class is provided with 85 semantic attributes.

CIFAR-10: consists of 60,000 color images. The image size is $32\times32$ pixels. Each image is associated with one of the ten classes with each class containing 6000 images.

ImageNet: consists of 1.2 million images labeled with 1000 categories/synsets for the Large Scale Visual Recognition Challenge 2012 (ILSVRC2012).

5.2. Experimental settings

Following the settings in ^[25,26], we construct the zero-shot scenario by splitting the benchmark datasets into seen classes and unseen classes. Specifically, for the Animals with Attributes (AwA) dataset, we randomly split the 50 animal categories into five groups with each group containing ten categories. In turn, we use one group as the unseen classes and the remaining groups as the seen classes. Therefore, we can obtain 5 different seen-unseen splits. We utilize 85-dim attribute vectors as the semantic vector. For the CIFAR10 dataset, we use one category as the unseen class and the remaining categories as the seen classes. Consequently, we can obtain 10 different seen-unseen splits. The 300-dimensional semantic vector is extracted from class names using the word2vec tool. For the ImageNet dataset, we randomly select a subset of ImageNet with 100 categories, which gives us about 130,000 images for evaluation. The 100 selected categories have the semantic vector from word2vec. We use 10 categories as seen classes and the remaining 90 categories as unseen classes, and thus we can obtain 10 different seen-unseen splits. Similar to CIFAR10, we use the word2vec tool to extract 300-dimensional semantic vectors from class names. For all three datasets, we randomly take 1000 images from the unseen categories as the query set. The remaining images from the remaining unseen categories images and all the seen categories images are treated as the retrieval database. For training, we randomly select 10,000 images from the seen categories as the training set.

We use the widely used mean Average Precision (mAP) based on Hamming ranking as the evaluation metric. The final experimental results are averaged over the different seen-unseen splits for all datasets.

5.3. Baselines

We compare the proposed method with the following state-of-the-art hashing methods. These methods fall into two categories: $1)$ Hashing methods for traditional image retrieval: Iterative Quantization (ITQ) ^[4], supervised discrete hashing (SDH) ^[9], deep pairwise supervised hashing (DPSH) ^[5], deep supervised discrete hashing (DSDH) ^[29]; $2)$ zero-shot hashing methods: zero-shot hashing via transferring supervised knowledge (TSK) ^[26] and zero-shot hashing with discrete similarity transfer network (SitNet) ^[25]. We implement SitNet with Pytorch by ourselves. For the other compared methods, we adopt the public codes and suggested parameters from the their papers. For the non-CNN hashing methods, we adopt the pre-trained AlexNet model for extracting the 4096-dimensional CNN features as image representations for fair comparison.

5.4. Implementation details

We implement the VSAQ model via Pytorch. For the Animals with Attributes and CIFAR-10 datasets, the initial learning rate was set to $0.001$ . For the ImageNet dataset, the initial learning rate was set to $0.01$ . As the last fully connected layers in FNet and ENet are training from scratch, the learning rates of these layers are set to 10 times the other layers. We set the batch size to 128 and train the model for 10 epochs. The dimension of image representations $q$ is set to 128 following ^[17]. The hyperparameters are set as $\alpha = 1, \beta = 10, \lambda = 0.01$ across all the following experiments.

5.5. Experimental results

5.5.1. Results on AwA

The zero-shot image retrieval performances on AwA in terms of MAP with respect to different code lengths (i.e., $\{8, 16, 32, 48\}$ ) are shown in Table 1. We find that our VSAQ method outperforms all other baseline methods by a large margin in terms of MAP, especially from 8 to 32 bits. In addition, we find that the unsupervised hashing method ITQ achieves comparable results with some supervised hashing method SDH. This demonstrates that the generalization ability of existing supervised hashing is limited for unseen concepts. The existing state-of-the-art deep hashing methods, including DPSH and DSDH, perform poorly on the zero-shot retrieval task over the AwA dataset. The main reason can be that the trained CNN compatible with the label information can fall into the risk of overfitting the seen classes, which reduces the expansibility of the training model to the unseen classes. We also find that TSK performs worse, especially at the lower bits (e.g., 8 and 16 bits). The main reason is that the hubness problem is exacerbated by projecting the binary codes to the semantic space with the ridge regression formulation ^[20], which will decrease the semantic transfer ability of hash codes in turn. To alleviate such a problem, the proposed VSAQ model utilizes the visual space as the embedding space for learning compact binary codes. In addition, we adopt a collective quantization technique for visual-semantic alignment which can improve the generalization ability of the proposed model.

Table 1. The comparisons of mAP on zero-shot image retrieval over AwA dataset from 12 to 48 bits.

Method	AwA
Method	8 bits	16 bits	32 bits	48 bits
ITQ	0.0886	0.1359	0.1723	0.2024
SDH	0.0966	0.1370	0.1835	0.2122
DPSH	0.0726	0.1080	0.1435	0.1525
DSDH	0.0808	0.1081	0.1320	0.1469
TSK	0.0349	0.0591	0.1320	0.1617
SitNet	0.1036	0.1651	0.1870	0.2121
VSAQ	0.1948	0.2099	0.2187	0.2218

| Show Table

DownLoad: CSV

5.5.2. Results on CIFAR-10

The performances of the proposed VSAQ and other baselines on CIFAR-10 with different code length are illustrated in Table 2. From Table 2, we can find that VSAQ consistently outperforms other baselines at all bits by a large margin. For example, VSAQ surpasses SitNet with the second best performance by 3 to 4 percent. Even though the code length is short, VSAQ still achieves superior retrieval performance compared to the baselines with longer code length. It can be attributed to the lower quantization error controlled by the quantization technique. The deep hashing methods DPSH and DSDH perform better than the non-deep hashing methods ITQ and SDH, which demonstrates that CNNs can utilize the proper supervision to discover the complicated semantic similarity structure. VSAQ utilizes the label information to learn the semantic image representations with discriminative and polymeric structure, which can assist the visual-semantic alignment more easily. The unsupervised hashing mehtod ITQ achieves comparable performance with TSK which demonstrates that the generalization ability degenerates due to the existing hubness problem.

Table 2. The comparisons of mAP on zero-shot image retrieval over {CIFAR-10} dataset from 12 to 48 bits.

Method	CIFAR-10
Method	8 bits	16 bits	32 bits	48 bits
ITQ	0.1507	0.1736	0.1871	0.1972
SDH	0.1226	0.1331	0.1553	0.2068
DPSH	0.2176	0.2205	0.2280	0.2261
DSDH	-	-	-	-
TSK	0.1507	0.1759	0.1740	0.2132
SitNet	0.2208	0.2303	0.2351	0.2471
VSAQ	0.2615	0.2682	0.2670	0.2867

| Show Table

DownLoad: CSV

5.5.3. Results on ImageNet

The performances of the proposed VSAQ and other baselines on ImageNet with different code length are demonstrated in Table 3. As we can see, the proposed VSAQ model outperforms the baseline approaches by significant margins. For example, VSAQ surpasses SitNet with the second best performance by 2 to 9 percent. It clearly demonstrates that the VSAQ model generalizes better for unseen concepts compared with other state-of-the-art methods, which validates the effectiveness of the proposed method for zero-shot image retrieval.

Table 3. The comparisons of mAP on zero-shot image retrieval over ImageNet dataset from 12 to 48 bits.

Method	ImageNet
Method	8 bits	16 bits	32 bits	48 bits
ITQ	0.0507	0.0732	0.1123	0.1357
SDH	0.0400	0.0727	0.1107	0.1312
DPSH	0.0409	0.0524	0.0712	0.0881
DSDH	-	-	-	-
TSK	0.0162	0.0206	0.0247	0.0609
SitNet	-	-	-	-
VSAQ	0.1472	0.1516	0.1579	0.1614

| Show Table

DownLoad: CSV

5.6. Effectiveness of the proposed framework

The proposed VSAQ model consists of three components: an image feature loss layer $\mathcal{L}_{f}$ for learning discriminative and polymeric image representations, a semantic embedding loss layer $\mathcal{L}_{e}$ for maximizing the compatibility score between the image and semantic vectors for knowledge transfer, and a quantization loss layer $\mathcal{L}_{q}$ for visual-semantic alignment. The quantization loss layer $\mathcal{L}_{q}$ is an essential part of generating binary codes. To study the contribution of different components for the zero-shot image retrieval performance, we compare the proposed method with the following submodels: $1)$ $\mathcal{L}_{f}+\mathcal{L}_{q}$ ( $\text{VSQA}$ -1); $2)$ $\mathcal{L}_{e}+\mathcal{L}_{q}$ ( $\text{VSQA}$ -2); $3)$ $\mathcal{L}_{f}+\mathcal{L}_{e}+\mathcal{L}_{q}^{1}$ ( $\text{VSQA}$ -3), where $\mathcal{L}_{q}^{1}$ refers to the first term in Eq (3.3), i.e., only considering the visual features for quantization. illustrates the experimental results of different submodels. From , we can see that the combination of the image feature loss, the semantic embedding loss and the quantization loss achieves the best performance. The results demonstrate that the proposed framework improves the zero-shot image retrieval performance indeed. Comparing the performance of $\text{VSQA}$ -3 and $\text{VSQA}$ , we can find that the visual-semantic alignment will help the knowledge transfer from the seen concepts to the unseen concepts. Though the comparisons of $\text{VSQA}$ -2 and $\text{VSQA}$ , we can find that the discriminative and polymeric image representations will improve the performance a lot, which means that it can assist the visual-semantic alignment and semantic embedding more easily. Comparing the performance of $\text{VSQA}$ -1 and $\text{VSQA}$ , we can find that the knowledge transfer ability can be significantly improved by the semantic embedding.

Table 4. The impact of different submodels of our VSAQ on mAP for AwA, CIFAR-10 and ImageNet datasets from 12 to 48 bits.

Method	AwA				CIFAR-10				ImageNet
Method	12 bits	24 bits	32 bits	48 bits	12 bits	24 bits	32 bits	48 bits	12 bits	24 bits	32 bits	48 bits
VSAQ	0.1948	0.2099	0.2187	0.2218	0.2615	0.2682	0.2670	0.2867	0.1472	0.1516	0.1579	0.1614
VSAQ-1	0.1830	0.1849	0.1923	0.2012	0.2360	0.2487	0.2538	0.2539	0.1288	0.1319	0.1386	0.1497
VSAQ-2	0.1911	0.1956	0.2026	0.2089	0.2412	0.2533	0.2613	0.2665	0.1296	0.1365	0.1463	0.1495
VSAQ-3	0.1816	0.1736	0.1825	0.1998	0.2278	0.2324	0.2405	0.2487	0.1053	0.1150	0.1194	0.1256

| Show Table

DownLoad: CSV

6. Conclusions

In this paper, we propose a novel deep quantization network with visual-semantic alignment for efficient zero-shot image retrieval. In the proposed deep architecture, we use the label information and the sematic vector to supervise the image feature extraction and improve the compatibility between the image representations and the semantic vectors respectively. The semantic vectors are mapped to the visual space and aligned with the corresponding image representations via a collective quantization framework for alleviating the hubness problem. The experimental results on three datasets show that the proposed model outperforms the state-of-the-art methods on zero-shot image retrieval tasks. In the future work, we will investigate the zero-shot multi-label image (i.e., an image is assigned with multiple categories) retrieval task.

Acknowledgments

I would like to thank all anonymous reviewers for their constructive comments through each stage of the process.

Conflict of interest

All authors declare that they have no conflicts of interest.

References

[1]	B. H. Wang, Y. Y. Wang, C. Q. Dai, Y. X. Chen, Dynamical characteristic of analytical fractional solitons for the space-time fractional Fokas-Lenells equation, Alex. Eng. J., 59 (2020), 4699–4707. doi: 10.1016/j.aej.2020.08.027
[2]	P. H. Lu, B. H. Wang, C. Q. Dai, Fractional traveling wave solutions of the $(2+1)$ -dimensional fractional complex Ginzburg-Landau equation via two methods, Math. Meth. Appl. Sci., 43 (2020), 8518–8526. doi: 10.1002/mma.6511
[3]	J. J. Fang, C. Q. Dai, Optical solitons of a time-fractional higher-order nonlinear Schr $\ddot{o}$ dinger equation, Optik, 209 (2020), 164574. doi: 10.1016/j.ijleo.2020.164574
[4]	H. B. Han, H. J. Li, C. Q. Dai, Wick-type stochastic multi-soliton and soliton molecule solutions in the framework of nonlinear Schrödinger equation, Appl. Math. Lett., 120 (2021), 107302. doi: 10.1016/j.aml.2021.107302
[5]	D. Kumar, J. Singh, S. D. Purohit, R. Swroop, A hybrid analytic algorithm for nonlinear wave-like equations, Math. Model. Nat. Phenom., 14 (2019), 304. doi: 10.1051/mmnp/2018063
[6]	A. Yokus, S. G $\ddot{u}$ lbahar, Numerical solutions with linearization techniques of the fractional Harry Dym equation, Appl. Math. Nonlinear Sci., 4 (2019), 35–42. doi: 10.2478/AMNS.2019.1.00004
[7]	K. S. Al-Ghafri, H. Rezazadeh, Solitons and other solutions of (3+1)-dimensional space-time fractional modified KdV-Zakharov Kuznetsov equation, Appl. Math Nonlinear Sci., 4 (2019), 289–304. doi: 10.2478/AMNS.2019.2.00026
[8]	K. M. Owolabi, Z. Hammouch, Mathematical modeling and analysis of two-variable system with non integer-order derivative, Chaos, 29 (2019), 013145. doi: 10.1063/1.5086909
[9]	L. Galu $\acute{e}$ , S. L. Kalla, T. V. Kim, Composition of Erdelyi-Kober fractional operators, Integr. Transf. Spec. F., 9 (2000), 185–196. doi: 10.1080/10652460008819254
[10]	V. N. Mishra, D. L. Suthar, S. D. Purohit, Marichev-Saigo-Maeda fractional calculus operators, Srivastava polynomials and generalized Mittag-Leffler function, Cogent Math., 4 (2017), 1320830. doi: 10.1080/23311835.2017.1320830
[11]	M. Saigo, A remark on integral operators involving the Gauss hypergeometric functions, Mathematical reports of College of General Education, Kyushu University, 11 (1978), 135–143.
[12]	M. Saigo, A certain boundary value problem for the Euler-Darboux equation I, Math. Japonica, 24 (1979), 377–385.
[13]	D. L. Suthar, M. Andualem, B. Debalkie, A study on generalized multivariable Mittag-Leffler function via generalized fractional calculus operators, J. Math., 2019 (2019), 9864737.
[14]	V. Kiryakova, A brief story about the operators of the generalized fractional calculus, Fract. Calc. Appl. Anal., 11 (2008), 203–220.
[15]	H. M. Srivastava, R. K. Saxena, Operators of fractional integration and their applications, Appl. Math. Comput., 118 (2001), 1-52. doi: 10.1016/S0096-3003(99)00208-8
[16]	A. A. Kilbas, H. M. Srivastava, J. J. Trujillo, Theory and applications of fractional diferential equations, Amsterdam, Netherlands: Elsevier, 2006.
[17]	A. M. Mathai, R. K. Saxena, H. J. Haubold, The H-function: Theory and applications, New York, USA: Springer, 2010.
[18]	S. G. Samko, A. A. Kilbas, O. I. Marichev, Fractional integrals and derivatives, theory and applications, Yverdon, Switzerland: Gordon and Breach, 1993.
[19]	S. Mubeen, G. M. Habibullah, $k$ -fractional integrals and application, Int. J. Contemp. Math. Sci., 7 (2012), 89–94.
[20]	G. A. Dorrego, An alternative definition for the $k$ -Riemann-Liouville fractional derivative, Appl. Math. Sci., 9 (2015), 481–491.
[21]	M. Samraiz, Z. Perveen, T. Abdeljawad, S. Iqbal, S. Naheed, On certain fractional calculus operators and applications in mathematical physics, Phys. Scr., 95 (2020), 115210. doi: 10.1088/1402-4896/abbe4e
[22]	M. Samraiz, Z. Perveen, G. Rahman, K. S. Nisar, D. Kumar, On the $(k, s)$ -Hilfer-Prabhakar fractional derivative with applications to mathematical physics, Front. Phys., 8 (2020), 309. doi: 10.3389/fphy.2020.00309
[23]	H. T. Nguyen, H. C. Nguyen, R. H. Wang, Y. Zhou, Initial value problem for fractional Volterra integro-differential equations with Caputo derivative, Discrete Cont. Dyn.-B, 2021, DOI: 10.3934/dcdsb.2021030.
[24]	N. H. Can, N. H. Luc, D. Baleanu, Y. Zhou, L. D. Long, Inverse source problem for time fractional diffusion equation with Mittag-Leffler kernel, Adv. Differ. Equ., 2020 (2020), 210. doi: 10.1186/s13662-020-02657-2
[25]	N. H. Can, Y. Zhoub, , N. H. Tuan, T. N. Thach, Regularized solution approximation of a fractional pseudo-parabolic problem with a nonlinear source term and random data, Chaos Solitons Fract., 136 (2020), 109847. doi: 10.1016/j.chaos.2020.109847
[26]	A. Gupta, C. L. Parihar, Saigo's $k$ -Fractional calculus operators, Malaya J. Mat., 5 (2017), 494–504.
[27]	K. S. Gehlot, J. C. Prajapati, Fractional calculus of generalized $k$ -Wright function, J. Fract. Calc. Appl., 4 (2013), 283–289.
[28]	R. Diaz, E. Pariguan, On hypergeometric functions and Pochhammer $k$ -symbol, Divulgaciones $Matem\acute{a}$ ticas, 15 (2007), 179–192.
[29]	M. B. M. de Oteiza, S. Kalla, S. Conde, Un estudio sobre la funcition Lommel- Maitland, Revista Técnica de la Facultad de Ingenieria de la Universidad del Zulia, 9 (1986), 33–40.
[30]	G. N. Watson, A treatise on the theory of Bessel functions, 2 Eds., London, New York: Cambridge University Press, 1944.
[31]	D. L. Suthar, Composition formulae for the $k$ -fractional calculus operators associated with $k$ -Wright Function, J. Math., 2020, 5471715.
[32]	R. Agarwal, S. Jain, R. P. Agarwal, D. Baleanu, A remark on the fractional integral operators and the image formulas of generalized Lommel-Wright function, Front. Phys., 6 (2018), 79. doi: 10.3389/fphy.2018.00079
[33]	A. I. Prieto, S. S. de Romero, H. M Srivastava, Some fractional calculus results involving the generalized Lommel-Wright and related functions. Appl. Math. Lett., 20 (2007), 17–22.
[34]	R. Diaz, C. Teruel, $q, k$ -Generalized gamma and beta functions, J. Nonlinear Math. Phys., 12 (2005), 118–134. doi: 10.2991/jnmp.2005.12.1.10

This article has been cited by:

Huadong Sun, Zhibin Zhen, Yinghui Liu, Xu Zhang, Xiaowei Han, Pengyi Zhang, Embedded Zero-Shot Image Classification Based on Bidirectional Feature Mapping, 2024, 14, 2076-3417, 5230, 10.3390/app14125230

Reader Comments

Your name:*

Email:*
© 2021 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

AIMS Mathematics

1.8 3.4

Metrics

Article views(3026) PDF downloads(166) Cited by(0)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

AIMS Mathematics

Fractional calculus of generalized Lommel-Wright function and its extended Beta transform

Related Papers:

Abstract

1. Introduction

2. Related work

2.1. Hashing for retrieval

2.2. Quantization for retrieval

2.3. Zero-shot learning

3. Deep quantization network with visual-semantic alignment for zero-shot image retrieval

3.1. Problem definition

3.2. Network architecture

3.2.1. The image feature network (FNet)

3.2.2. The semantic embedding network (ENet)

3.2.3. Deep quantization with visual-semantic alignment

4. Learning algorithm

5. Experiments

5.1. Datasets

5.2. Experimental settings

5.3. Baselines

5.4. Implementation details

5.5. Experimental results

5.5.1. Results on AwA

5.5.2. Results on CIFAR-10

5.5.3. Results on ImageNet

5.6. Effectiveness of the proposed framework

6. Conclusions

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Other Articles By Authors

Catalog

AIMS Mathematics

Fractional calculus of generalized Lommel-Wright function and its extended Beta transform

Related Papers:

Abstract

1. Introduction

2. Related work

2.1. Hashing for retrieval

2.2. Quantization for retrieval

2.3. Zero-shot learning

3. Deep quantization network with visual-semantic alignment for zero-shot image retrieval

3.1. Problem definition

3.2. Network architecture

3.2.1. The image feature network (FNet)

3.2.2. The semantic embedding network (ENet)

3.2.3. Deep quantization with visual-semantic alignment

4. Learning algorithm

5. Experiments

5.1. Datasets

5.2. Experimental settings

5.3. Baselines

5.4. Implementation details

5.5. Experimental results

5.5.1. Results on AwA

5.5.2. Results on CIFAR-10

5.5.3. Results on ImageNet

5.6. Effectiveness of the proposed framework

6. Conclusions

Acknowledgments

Conflict of interest

References

This article has been cited by:

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog