Loading [MathJax]/jax/output/SVG/jax.js
Review Topical Sections

AI-Driven precision in solar forecasting: Breakthroughs in machine learning and deep learning

  • These authors contributed equally to this study.
  • Received: 22 June 2024 Revised: 05 August 2024 Accepted: 20 August 2024 Published: 09 September 2024
  • The need for accurate solar energy forecasting is paramount as the global push towards renewable energy intensifies. We aimed to provide a comprehensive analysis of the latest advancements in solar energy forecasting, focusing on Machine Learning (ML) and Deep Learning (DL) techniques. The novelty of this review lies in its detailed examination of ML and DL models, highlighting their ability to handle complex and nonlinear patterns in Solar Irradiance (SI) data. We systematically explored the evolution from traditional empirical, including machine learning (ML), and physical approaches to these advanced models, and delved into their real-world applications, discussing economic and policy implications. Additionally, we covered a variety of forecasting models, including empirical, image-based, statistical, ML, DL, foundation, and hybrid models. Our analysis revealed that ML and DL models significantly enhance forecasting accuracy, operational efficiency, and grid reliability, contributing to economic benefits and supporting sustainable energy policies. By addressing challenges related to data quality and model interpretability, this review underscores the importance of continuous innovation in solar forecasting techniques to fully realize their potential. The findings suggest that integrating these advanced models with traditional approaches offers the most promising path forward for improving solar energy forecasting.

    Citation: Ayesha Nadeem, Muhammad Farhan Hanif, Muhammad Sabir Naveed, Muhammad Tahir Hassan, Mustabshirha Gul, Naveed Husnain, Jianchun Mi. AI-Driven precision in solar forecasting: Breakthroughs in machine learning and deep learning[J]. AIMS Geosciences, 2024, 10(4): 684-734. doi: 10.3934/geosci.2024035

    Related Papers:

    [1] Youliang Zhang, Guowu Yuan, Hao Wu, Hao Zhou . MAE-GAN: a self-supervised learning-based classification model for cigarette appearance defects. Applied Computing and Intelligence, 2024, 4(2): 253-268. doi: 10.3934/aci.2024015
    [2] Noah Gardner, John Paul Hellenbrand, Anthony Phan, Haige Zhu, Zhiling Long, Min Wang, Clint A. Penick, Chih-Cheng Hung . Investigation of ant cuticle dataset using image texture analysis. Applied Computing and Intelligence, 2022, 2(2): 133-151. doi: 10.3934/aci.2022008
    [3] Hao Zhen, Oscar Lares, Jeffrey Cooper Fortson, Jidong J. Yang, Wei Li, Eric Conklin . Unraveling the dynamics of single-vehicle versus multi-vehicle crashes: a comparative analysis through binary classification. Applied Computing and Intelligence, 2024, 4(2): 349-369. doi: 10.3934/aci.2024020
    [4] Muhammad Amir Shafiq, Zhiling Long, Haibin Di, Ghassan AlRegib . A novel attention model for salient structure detection in seismic volumes. Applied Computing and Intelligence, 2021, 1(1): 31-45. doi: 10.3934/aci.2021002
    [5] Xuetao Jiang, Binbin Yong, Soheila Garshasbi, Jun Shen, Meiyu Jiang, Qingguo Zhou . Crop and weed classification based on AutoML. Applied Computing and Intelligence, 2021, 1(1): 46-60. doi: 10.3934/aci.2021003
    [6] Yongcan Huang, Jidong J. Yang . Semi-supervised multiscale dual-encoding method for faulty traffic data detection. Applied Computing and Intelligence, 2022, 2(2): 99-114. doi: 10.3934/aci.2022006
    [7] Mohamed Wiem Mkaouer, Tarek Gaber, and Zaineb Chelly Dagdia . Effects of COVID-19 pandemic on computational intelligence and cybersecurity: Survey. Applied Computing and Intelligence, 2022, 2(2): 173-194. doi: 10.3934/aci.2022010
    [8] Hao Zhen, Yucheng Shi, Jidong J. Yang, Javad Mohammadpour Vehni . Co-supervised learning paradigm with conditional generative adversarial networks for sample-efficient classification. Applied Computing and Intelligence, 2023, 3(1): 13-26. doi: 10.3934/aci.2023002
    [9] Sheyda Ghanbaralizadeh Bahnemiri, Mykola Pnomarenko, Karen Eguiazarian . Iterative transfer learning with large unlabeled datasets for no-reference image quality assessment. Applied Computing and Intelligence, 2024, 4(2): 107-124. doi: 10.3934/aci.2024007
    [10] Francis Nweke, Abm Adnan Azmee, Md Abdullah Al Hafiz Khan, Yong Pei, Dominic Thomas, Monica Nandan . A transformer-driven framework for multi-label behavioral health classification in police narratives. Applied Computing and Intelligence, 2024, 4(2): 234-252. doi: 10.3934/aci.2024014
  • The need for accurate solar energy forecasting is paramount as the global push towards renewable energy intensifies. We aimed to provide a comprehensive analysis of the latest advancements in solar energy forecasting, focusing on Machine Learning (ML) and Deep Learning (DL) techniques. The novelty of this review lies in its detailed examination of ML and DL models, highlighting their ability to handle complex and nonlinear patterns in Solar Irradiance (SI) data. We systematically explored the evolution from traditional empirical, including machine learning (ML), and physical approaches to these advanced models, and delved into their real-world applications, discussing economic and policy implications. Additionally, we covered a variety of forecasting models, including empirical, image-based, statistical, ML, DL, foundation, and hybrid models. Our analysis revealed that ML and DL models significantly enhance forecasting accuracy, operational efficiency, and grid reliability, contributing to economic benefits and supporting sustainable energy policies. By addressing challenges related to data quality and model interpretability, this review underscores the importance of continuous innovation in solar forecasting techniques to fully realize their potential. The findings suggest that integrating these advanced models with traditional approaches offers the most promising path forward for improving solar energy forecasting.



    In the field of computer vision, deep learning methods have made great achievements in both applied computing and machine intelligence. Remarkably, deep learning attains unprecedented success in image classification. Exploiting many powerful deep neural networks, machines can perform at a level close to or even beyond that of humans in many applications as long as sufficient labelled samples are provided [29,78,89]. However, the conventional deep neural network models rely on many important factors in order to achieve excellent performance. Typically, deep neural networks require a huge number of labelled samples for training, whilst massive sample collection and labelling may unfortunately be difficult, time-consuming, or even impossible in many cases.

    In fact, not in line with deep neural networks' high demanding on data, there are many scenarios which are commonly seen in practice:

    Large target size. Human beings could distinguish around 3, 000 basic-level classes [6], and each basic class could be expanded as subordinate ones, such as dogs in different breed [115]. Such a huge number of categories makes it infeasible to construct a task where each category has a sufficient number of labelled samples.

    Rare target classes. Some tasks suffer from rare classes for which the corresponding samples are difficult to be obtained, such as fine-grained classification over flowers and birds [13,46] or medical images corresponding to certain specific situation [11].

    Growing target size. The target set for some tasks changes rapidly, with candidate classes increasing over time, such as detection of new events in newly collected media data [10], recognizing the brand of a product [61] or learning some writing styles [35].

    In those scenarios, re-training a deep neural network model over target classes appears not very feasible. Fine-tuning the trained model might be tractable only if some of the labelled target samples could be obtained. To overcome such restrictions, zero-shot learning, earlier called zero-data learning, is set up to simulate the learning capacity of human beings [45]. Assuming a child is equipped with knowledge including the shape of the horse, the concept of stripes, and colours of black and white, once being told that zebra looks like a horse covered in black and white stripes, the child has a good chance of recognizing a zebra even if seeing it for the first time [19]. Figure 1 demonstrates a schematic graph for the efficient learning process that situations are also similar in zero-shot learning. Based on the auxiliary information used to describe each category and some corresponding samples, a model can be trained to construct the correlation between samples and the auxiliary information, thus enabling to extend classification on unseen categories, based on their correlation as well as the auxiliary information.

    Figure 1.  Examples for learning processes.

    In this article, we present an overview of image classification in zero-shot learning including its relevant definitions, learning scenarios, and various methodologies. While we properly structure each part and summarize each family of methods with illustration, visualization, and tables, we put one main focus of this work on sorting out the implementation details, such as commonly used benchmarks, and diverse experiment settings so as to offer a more practical guidance to researchers in the area. In the end, the comparison results of various representative methods are collected on a number of benchmarks, aiming to provide a fair and objective reference for evaluating different methods.

    Compared to the recently-presented surveys [73,99], our paper shows three major differences. First, our work introduces most recently published important methods, as more seminar works and even breakthroughs emerged recently, thus reflecting a more timely and comprehensive review. Second, based on model components, training strategies, and learning objectives, we provide a more detailed hierarchical classification for zero-shot image classification methods. Third, we put one main focus of our survey on comparing different methods from the perspective of implementations, thus offering practical guidelines for applying zero-shot learning on real scenarios.

    To describe the zero-shot classification task precisely, we will first review and explain some commonly used terms and notations in this section, then focus on introducing the zero-shot image classification methods which employ semantic descriptions as auxiliary information in the next two sections. Based on the design of the information extractor, we classify the current methods into two main categories: embedding methods and generative methods, and propose a taxonomy structure for these methods as shown in Figure 2. For simplicity of expression, all subsequent references to zero-shot learning refer to the image classification task under this domain.

    Figure 2.  The taxonomy structural diagram for Zero-Shot image classification methods.

    In zero-shot learning, the target classes without corresponding training samples are named as unseen classes, whilst the classes with labelled samples during training are called seen classes. Due to the absence of training samples for unseen classes, auxiliary information is essential for constructing the cognitive concepts of unseen categories. The space of such auxiliary information should contain enough information to distinguish all classes. In other words, for each class, corresponding auxiliary information should be unique and sufficiently representative to guarantee that an effective correlation between the auxiliary information and the samples can be learned for classification. Since zero-shot learning is inspired from the human efficient learning process, semantic information has become the commonly dominant auxiliary information [46,44,95]. Similar to the feature space for image processing, there is also a corresponding semantic space holding numeric values in zero-shot learning. To obtain such semantic space, two different kinds of semantic sources, attributes and textual descriptions, are mainly leveraged.

    Attribute. Attribute is the earliest and most commonly used source of semantic space in zero-shot learning [43,45,99]. As a kind of human-annotated information, attribute contains precise classification knowledge though its collection might be time-consuming. Considering an attribute as a word or phrase introducing a property, one can build up a list of attributes. By combing these attributes, all the seen and unseen classes can be described. Moreover, these combined descriptions should be different for each class. Then the vectors, holding binary values 0 and 1 with sizes equal to the number of the attributes, form a semantic space where each value denotes whether the described class is equipped with the corresponding attributes or not. In other words, the attribute vectors for all the classes share the same size, and each dimension of the vector denotes a specific property in a settled order. For example, in animals recognition, one attribute could be stripe. Value 1 in the dimension of stripe of the attribute vector means that the described animal is with stripes [43]. Suppose there are only 3 attributes: black, white, and stripes, then the attribute vectors describing classes panda, polar bear and zebra should be something like [1,1,0], [0,1,0] and [1,1,1] respectively. However, since an attribute vector is designed to describe the entire class, it might be imprecise to use binary values only. The diversity of individuals within each class may lead to a mismatch between the sample and attributes. Taking again the animal recognition as an example, we can see horses might be also in pure black and pure white. If the attribute values of both black and white equal 1 for the class horse, then the black horse samples are contradictory to the attribute white, so are the white horses to the black. Therefore, instead of taking the binary value, it makes more sense to employ continuous values indicating the degree or confidence level for an attribute. It is shown in [2] that adopting the average value of the voting results or the proportion of the samples corresponding to an attribute leads to better classification performance. Additionally, the relative attribute measuring the degree of attribute among classes is also suggested [71].

    Text. Instead of using human-annotated attributes, descriptions of a class such as the name or definition could also be considered as the source to construct a semantic space. However, it is not straightforward to transform the unstructured textual information into representative real values. When the class name is exploited as the semantic source without any external knowledge, the contained information might be far from enough for achieving good classification among images. In this case, pre-trained word embedding models borrowed from natural language processing could embed the class names to some representative word vectors and form a meaningful semantic space. Specifically, the semantic similarity of two vocabularies can be approximately measured by the distance between the two corresponding embedded vectors, thus the similarity knowledge contained in the training text corpora (for constructing the word embedding models) could be adopted for classification. In the existing methods, Word2Vec [3,69,96,103] and GloVe [3,103,58] pre-trained on English language Wikipedia [85] are two commonly used embedding models for class name sources. Such semantic similarity measure space can also be constructed via the knowledge in terms of ontology. An example is to adopt the hierarchical embedding from a large-scale hierarchical database WordNet [3]. The keyword is another optional semantic source. The descriptions of classes are collected through databases or search engines to extract keywords. Consequently, the binary occurrence indicator [74] or frequencies [3] in Bag-of-Words, or transformed term frequency–inverse document frequency features [13,46,14] can construct such semantic vectors. The description in the form of paragraph could also be used as a semantic source. For example, visual descriptions in the form of ten single sentences are collected for images in [76]. After that, the text encoder model is utilized to return the required semantic vectors. This kind of semantic source contains more information as well as more noises.

    Other auxiliary information. In addition to the semantic source, other types of supporting information also exist. That kind of information is often employed simultaneously with semantic information to assist the model in extracting more effective classification knowledge. For instance, hierarchical labels in taxonomy are introduced to provide additional supervision of classification [79,107]; the human defined correlation between attributes [32] capturing the gaze point of each sample is adopted as the attention supervision to improve the attention module producing more representative feature maps [58]; Some of these information may not provide sufficient knowledge to accomplish the entire classification task. However, they can be regarded as the supplementary of semantic information which may better construct cognitive concepts of unknown categories.

    In conventional image classification tasks, due to the differences in the distribution of instances between the training and test sets, the trained model does not perform as well during the test as it does on the training set. This phenomenon is also present in zero-shot learning, and even more severe owing to the disjoint property of seen and unseen classes. Such differences in the distribution between seen and unseen classes are called domain shift [18]. Moreover, the poor model performance is termed as class-level over-fitting [120].

    To address this challenge, by effectively employing classification knowledge from samples and auxiliary information, researchers have proposed various methods of introducing knowledge at different stages (including training and testing). As a result, the implementation scenarios become diverse. Both sample space and auxiliary information space can be defined in zero-shot learning, according to which we can divide the scenarios accordingly. In general, from the perspective of the training stage, the task can be divided into three scenarios, namely inductive, semantic transductive, and transductive, which are defined as follows:

    Inductive zero-shot learning. Only labelled training samples and auxiliary information of seen classes are available during training.

    Semantic transductive zero-shot learning. Labelled training samples and auxiliary information of all classes are available during training.

    Transductive zero-shot learning. Labelled training samples, unlabelled test samples, and auxiliary information of all classes are available during training.

    From the definition, the inductive zero-shot learning represents the most severe learning scenario because both the target classes and instances are unknown. Models trained in this scenario are more likely to suffer from class-level over-fitting. In comparison, models trained in the rest two transductive scenarios share a clear learning objective since the classification knowledge is guided by the unseen information. However, these trained models will not generalize to new unseen classes as well as the models trained in the inductive scenario [99].

    When the zero-shot problem was first proposed in the early stage, researchers focused only on achieving good classification on unseen classes, which is known as conventional Zero-Shot Learning. Later, it was found that the classification of the unseen classes would suffer from a devastating blow once the seen categories were also included as candidates for classification. In other words, the early proposed models could not distinguish well between seen and unseen categories and thus fail to construct the cognition concepts of new classes. Consequently, a more challenging task called Generalized Zero-Shot Learning attracts much attention, which requires classifying both seen and unseen classes [9]. The original intention of zero-shot learning is to simulate the human process of constructing the cognition concept of classes from learned knowledge and supporting information in the absence of samples. Since the constructed cognitive concepts can be evaluated accurately only if the unseen and seen classes can also be correctly distinguished, the focus of current works has shifted to the generalized one. Figure 3 shows the schematic of different scenarios in training and test, where the combination of the different scenarios forms six common settings.

    Figure 3.  Schematic diagrams of utilizing data for different scenarios in training and test.

    In zero-shot learning, each sample is originally designed as an image containing certain specific objects in a tensor form holding value for each pixel. To ensure more convenient implementation, the visual features extracted by a pre-trained deep neural network are commonly regarded as the samples instead of using the image. For a rigorous presentation, here we take the entire image as the input sample in our article. Assuming there are totally N samples from K classes, we denote X=XSXU as the set of all the image samples from both seen and unseen classes, and F() as a feature extractor for obtaining the feature F(xi) of the image xi. Similarly the corresponding label set could be denoted as Y=YSYU, and yi=k indicates that sample xi belongs to the kth-class. The set of the auxiliary information is denoted as A=ASAU which contains K vectors where each vector ak stands for the auxiliary information of the kth-class. Here let KS and KU indicate the number of seen and unseen classes respectively and the first KS classes represented in A are assumed as the seen ones for convenience. Note that the seen and unseen classes are disjoint, which means XSXU=YSYU=ASAU=. As partial of seen class samples are adopted as test instances which should not participate in the training process, the seen sets of the samples and labels are further consistently divided into training and test sets as XS=XStrXSte and YS=YStrYSte. Specifically, both of the train and test seen sets should cover all the KS seen classes. Since there are three scenarios for the training process, the training set Dtr={Xtr,Ytr,Atr} can be respectively defined for the inductive, semantic transductive, and transductive scenarios in the three forms as DItr={XStr,YStr,AS}, DSTtr={XStr,YStr,A} and DTtr={XStrXU,YStr,A}. For the test set Dte={Xte,Yte,Ate}, it can also be defined in two forms as DCte={XU,YU,AU} for conventional task and DGte={XUXSte,YUYSte,A} for generalized task respectively. With these definitions, the target of zero-shot learning can be represented to train an information extractor M (containing the feature extractor F()) with a settled or a learnable classifier C on the training set Dtr to achieve classification on Xte.

    In the embedding methods, the information extractor M={θ(), ϕ()} is designed as a union of embedding functions θ() and ϕ(). The aim of these extractors is to find the proper embedding spaces for both visual samples and auxiliary information so that the trainable or settled classifier C can achieve class recognition among the target space. From the perspective of the learning objective, we further classify the existing embedding methods as: (1) feature-vector-based, (2) image-based, and (3) mechanism-improved methods.

    Considering the limitation of the sample size and the latent distribution differences between the samples of the unseen and seen classes, the most easily associated and appropriate visual feature space is the learned space in large-scale conventional image classification tasks. Fair data splits and extracted features for several benchmarks are discussed and evaluated in [104]. The feature vector space learned by the deep residual network called ResNet101 [26] over a benchmark dataset ImageNet [12] is commonly selected in the implementations. Based on the fixed feature extractor F=Ff, feature vectors Ff(X) are regarded as the visual samples and the insight of the feature-vector-based methods is to design embedding functions or classifiers trying to improve the performance where the classifier C(xi,A,M) is commonly constructed as a function taking the embedded features and attributes to return the predicted confidence scores of all the classes represented in A. We will review this family of methods according to their mainly relied frameworks.

    Space alignment framework. These encoding based methods often have a specific embedding target space, which can be a commonly-used visual feature space, an manually defined semantic description space, or an unknown hidden space for detecting certain correlations. This idea is the first as well as one most common solution to zero-shot learning.

    The classifier can be designed based on a fixed distance metric d(,) such as Euclidean distance or Cosine distance. Thereby, the predicted label for each visual feature Ff(xi) is obtained as

    ^yi=argmink(d(θ(Ff(xi)),ϕ(ak)),s.t.akAte. (3.1)

    In the following, we briefly review some representative work in the space alignment framework. In [113,121], semantic-to-visual mappings are learned to align semantic and visual features from the same class. Specifically, the method in [121] utilizes a multi-layer neural network as the embedding function implying that the visual feature space is more appropriate as a target space to avoid aggravating the hubness problem, and a self-focus ratio according to the position of the embedded attributes is learned as an attention for each dimension of the visual feature space during optimization in [113]. More studies adopt the reconstruction or bi-direction mapping (a relaxed form of reconstruction) process to align the information from different spaces. Linear embedding functions are applied for both visual-to-semantic and semantic-to-visual projections in [41], and a rank minimization technique is additionally adopted for optimizing the linear transformation matrices. In [30], the encoding processes of the reconstruction are designed in both visual and semantic spaces, and achieve the joint embedding by minimizing the maximum mean discrepancy in the hidden layer. Then as a more strict case, the embeddings for the visual feature and semantic attributes from the same class are enforced to be equal in [118], and a two-alternate-steps algorithm is proposed in [53] to solve transformation matrices in the joint embedding with reconstruction supervision in two alternate steps. Similar classes for each class are selected via a threshold among cosine similarity in [4], then a semantic-to-visual-to-semantic reconstruction process is proposed, where the inter-class distances are pushed and the intra-class distances are reduced on the visual space. A projecting codebook is learned in [48] with an additional center loss in [24] and a reconstruction loss in [41] to embedded visual features and semantic attributes to a hidden orthogonal semantic space. The label space is selected as the embedding target space in [56], where the embedding of the unseen semantic attributes to the label space can be achieved by learning the projecting function from both the semantic and visual spaces to the label space. Such embedding is equivalent to linearly representing the labels of unseen classes by those of seen classes, thus improving the generalization of the model in the label space.

    The classifier can also be designed learnable such as a bilinear function W, which predicts the confidence scores as

    C(xi,A,M,W)=θ(Ff(xi))TW ϕ(A). (3.2)

    The semantic attributes of both the seen and unseen classes are purely represented by those of seen in [122] to train the bilinear function which thus associates unseen classes with seen classes. Norms of the embedded semantic attributes and embedded visual feature is constrained in [77] for fair comparisons over classes and bounding the variation in semantic space respectively. In [120], the bilinear function is decomposed into two transformation matrices, and it is proved that minimizing the mean squared error between similarity matrices and the predicted scores for all samples is equivalent to restricting those transformation matrices to be orthogonal. A pairwise ranking loss function similar to the one in [102] is proposed in [17] as

    KSjyi[I(j=yi)+C(xi,aj,M,W)C(xi,ayi,M,W)]+. (3.3)

    Instead of the sum of all these pairwise terms, the ranking loss is modified by focusing on the pair. This leads to the maximum value in [1] and results in a weighted approximate one in [3] inspired by the unregularized ranking support vector machine [37]. It can also be redesigned with a triplet mining strategy to construct the triplet loss with the most negative samples and the most negative attributes as proposed in [34].

    Moreover, the classifier can be defined in other forms. The instances from each class are assumed to follow an exponential family distribution in [93] where the parameters are learned from the semantic attributes. The method in [103] develops the ranking loss into a non-linear classifier case by learning multi-bilinear classifiers where each time this model chooses the one with the highest confidence score to be optimized. In [36] the attributes of unseen classes are utilized to reconstruct those of seen classes by the sparse coding approach. The solved coefficients are regarded as the similarity between classes. Then a neural network is designed to learn the similarity between the embedded attributes and visual features under the supervision of the labels and the similarities.

    Graph based framework. A graph containing correlations between classes can be additionally constructed to enhance the generalization of the trained model. In [60], two relation graphs of the features in the hidden space are constructed based on the k-nearest neighbors among samples and the class labels which contribute to reducing distances between highly relevant features. This design is improved in [101] where two separated latent spaces are learned for embedding the visual samples and semantic attributes, and the k-nearest neighbor is replaced by the Cosine similarity to imply the relations among samples. Based on the two embedding spaces and the weighted sum of relations among samples and class labels an asymmetric graph structure with orthogonal projection is introduced to improve the learned latent space. By fixing the number of super-classes in different class layers, clusters obtained through the clustering algorithm among the attributes are taken to represent the super-class in [47], thereby a hierarchical graph over classes can be constructed to overcome the domain gap between seen and unseen classes. In [110], the relations among the classes are captured by augmenting the original label matrix in a dependency propagation process with the support of the low-rank constraint.

    The graphic convolutional neural network (GCN) is a neural network that directly approximates localized spectral filters on graphs to learn hidden layer representations more relevant to the target task [40]. GCN is applied on the word embeddings of all the classes in [100] to learn the classifier parameters for each class. Then a dense graph propagation module is proposed in [38] where the connections from nodes to their ancestors and descendants are considered. In addition to the graph among word embeddings, in [98], the graph constructed through the k-nearest neighbor in the attribute space is also employed to learn the classifier parameters. The outputs of the GCN based on two graphs are weighted summed to learn the final parameters.

    Meta learning framework. Meta learning process proposed in the few-shot learning aims to train models with high knowledge transfer ability [75]. In zero-shot learning, models trained on seen class data tend to overfit and perform poor on unseen classes. Therefore, the methods with similar meta learning strategies are developed to train more generalized models.

    Relation network (RN) [88] is designed to learn a similarity measure based on the neural network architecture. The visual feature and embedded semantic attributes will be concatenated and used as the input to the measure model to return the similarity. The whole model is trained under a meta learning process where each time the loss function is designed based on a meta learning task sampled from the training set. Specifically, each time a small group of the samples are selected to construct the meta classification task where the number of the included classes is not settled. By training over several meta tasks, the trained model would be more adaptive for different tasks. Therefore, the model would be more generalized.

    As an improvement of RN, CRnet [119] follows the same training process with the meta tasks. Additionally, an unsupervised K-means clustering algorithm is implemented to find the similar class groups and the corresponding group centers. Instead of training one embedding function among the semantic attributes, multi-attributes embedding functions are trained based on the group centers where the inputs are the differences between these centers and the semantic attributes. Then the sum of these embedded attributes is utilized for learning the similarity in the same way as RN.

    A similar process is adopted in a correction network [27]. Based on the sampled meta tasks, an additional correction module is trained to modify the predicted value of the original model to become more precise. Then the learned correction module would be generalized since it is adapted to different meta tasks. As such, the correction will contribute to better performance.

    In the image-based methods, it is the original images X instead of the extracted feature vectors Ff(X) that are regarded as samples. Moreover, the well-designed backbone architecture with pre-trained parameters from the image classification task is partially or entirely borrowed as a learnable one F=Fl. The insight underlying these methods is to optimize the feature extractor Fl simultaneously with the specific designed embedding function and classifier. Sometimes an additional module accompanying the backbone is designed to obtain a more adaptable feature space, thus improving the performance.

    Supervision based methods. By providing additional constraints or regularizations in the loss function for training, the feature extractor can be pushed to capture more relevant information, which results in a more representative feature space. Rather than training an embedding model with a bilinear classifier purely on the information from seen classes, unlabelled data are also employed in quasi-fully supervised learning [87]. Without supervised information, the predicted scores of the unseen classes for those unlabelled data are constrained to be large by constructing the sum of negative log values of them as a regularization term during optimization. Then training the whole model under this quasi-fully supervised setting with the designed loss will also improve the features extracted by the backbone. This can alleviate the bias towards seen classes.

    A discriminative feature learning process is introduced in [51]. A zoomed coordinate is learned based on the feature maps to reconstruct a zoomed image sample with the same size as the original one, where visual features are extracted from both of the zoomed and original image samples. Since the semantic attributes are not discriminative enough, only a partial list of learned embedded features is adopted for learning the bilinear classifier with the attributes. Additionally, a triplet loss based on the squared Euclidean distance is constructed among the rest of the embedded features to improve the learned feature space.

    Domain-aware visual bias eliminating [65] adopts a margin second-order embedding based on bilinear pooling [52] and a softmax loss function with a kind of temperature during training. As a result, the learned feature space constrained to be more discriminative leads to a low entropy for the instances from seen class. Then the instances from unseen class during the test would be distinguished with a relatively high entropy.

    Attention based methods. As the attention mechanism has achieved significant performance in the image classification tasks [97], several attention relevant modules are also designed in zero-shot learning for capturing more representative features corresponding to the semantic information. In most of these methods, the attention module is utilized to obtain local features corresponding to certain specific semantic property. To produce more adequate supervision on the attention based feature space, a second-order operation [52] is applied on the learned features and semantics [108]. In the region graph embedding network [109], a transformation matrix is solved to represent the similarity between the attributes of the seen and the unseen classes. According to these similarities, a cross-entropy loss is then designed to ensure that the classifier also outputs a higher score for similar unseen classes when classifying samples from seen classes. As a result, the feature extractor is pushed to learn the feature space capturing more correlation information between seen and unseen classes. In [125], a triplet loss is designed to push the inter-class distances and reduce intra-class distances between features corresponding to both local and entire images. This model thus improves the learned feature space more conducive to the classification task.

    Instead of purely training the attention module through the loss function defined on the feature space, additional explicit human annotated labels for attention can also be provided to supply the training. For example, in [58], captured gaze points are employed to generate the ground truth of the attention maps for constructing the binary cross-entropy loss across all the pixels. In addition to capturing local features, the attention learned from several feature maps is combined to guide the learning of the bilinear classifier [57].

    The insight of the mechanism-improved methods is to propose a generalized mechanism without changing or slightly changing the structure of the original method. The proposed mechanism can be an improvement of the training process, an optimization of a specific loss function, or a redesign prediction process. Commonly, this family of methods are designed for those zero-shot models sharing certain commonalities.

    Training process focused. A theoretical explanation to normalization on attributes is presented in [83]. Then a more efficient normalization scheme is proposed standardizing the embedded attributes to alleviate the irregular loss surface.

    During the feature extracting process, a fine-tuned backbone is proposed in the attribute prototype network (APN) [112]. In this work, assume the size of attributes is Da. The prototype for each attribute P={pdaRC}Dada=1 is learned to generate similarity map Mda={mdai,j}h×w with height h and width w through multiplication of these prototypes and the corresponding feature maps. During the fine-tuning, the commonly used linear embedding classification loss is optimized with several regularization terms. An attribute decorrelation term is defined as the sum of l2-norm of each dimension of the prototypes in the same disjoint attribute groups. This thus helps decorrelate unrelated attributes via enforcing prototypes in the same group sharing the value. Another similarity map compactness term can enforce the similarity maps concentrating on the peak region [123], which is given as

    LCPT=Dada=1hi=1wj=1mdai,j[(i˜i)2+(j˜j)2], (3.4)

    where (˜i,˜j) is the coordinate of the maximum value in Mda. This element-wise multiplication between the similarity map and the distance among coordinates constrains the similarity map to focus on a small number of local features. Thereby, each similarity map Mda can be regarded as the attention map corresponding to da-th attribute. The comparison result in this work shows that the fine-tuned backbone in APN outperforms the ones in some other methods [106,124], even when fine-tuning is also implemented. In this sense, it can be regarded as a general improved one for feature extracting.

    Isometric propagation network (IPN) [54] is proposed to guarantee the relation between classes in a propagation process based on a specific similarity measure. By defining the average of samples from the same class as the initialized visual class prototype, in the propagation, each time the prototype is re-represented by the weighted sum of the prototypes of similar classes. The similar classes are detected through a threshold and similarity measure which is the softmax with temperature among the Cosine similarity for each prototype. The similarity is also utilized as the weight for the re-representation. Such a propagation process can also be implemented on the semantic prototypes learned based on the trained semantic embedding module in other methods such as that used in [119]. During the test, the unseen prototypes could be obtained using the weighted sum of the propagated prototypes of seen classes according to the similarity measure, which contributes to significant performance improvement with the commonly used linear classification model.

    The image is divided into different regions for extracting more precise features with the attention module in [31,33,82]. Moreover, an additional seen-unseen trade-off loss can be adopted to balance the predicted scores for seen and unseen classes. For example, a self-calibration loss term as a biased cross-entropy loss for the predicted unseen scores among samples from seen classes is designed in [31], and a soft cross-entropy loss based on the similarity between seen and unseen classes is utilized in [82]. Training the models with these additional constraints increases the prediction scores for unseen classes, thereby promoting the sensitivity of unseen class recognition.

    A meta learning process with constructed meta training tasks is adopted in [75,94] for few-shot learning. Instead of employing a loss function associated with the original classification task among the whole training set, several semi-tasks of the original task, namely meta tasks, are constructed with the meta training data sampled from the original training set. Adopting this meta learning process in zero-shot learning improves the generalization and restrains over-fitting [54,88,114,119]. Figure 4 demonstrates an example of the meta zero-shot task in [88].

    Figure 4.  One illustrative example of meta tasks in meta learning process adopted in RN [88].

    Test process focused. Since most of the methods suffer the class-lever over-fitting in generalized zero-shot tasks, a mechanism named Calibrated stacking is proposed in [9] to adjust the predicted confidence score for each class. With a trained classifier C and corresponding information extractor M, the predicted confidence score in the regular test process can be obtained as C(xi,Ate,M). Then the prediction based on the calibrated stacking is defined as

    ^yi=argmaxkC(xi,Ate,M)γI(kKS),s.t.akAte, (3.5)

    where I() is the indicator function judging whether the k-th class belongs to a seen class and γ is the hyper-parameter controlling the scale of the adjustment. This calibrated stacking mechanism is simply subtracting a certain value for all the predicted seen confidence scores. Specifically, assume all the confidence scores are scaled in the range (0, 1). Setting γ=1 will lead that all the predicted labels belong to unseen classes, and conversely γ=1 will cause all the predicted labels as seen classes. In other words, setting γ=1 and 1 lead to zero accuracies for the unseen classes and seen classes, respectively. By adjusting γ from -1 to 1 with a tiny step size, one can obtain the adjusted accuracies for both the seen and unseen classes. Then a seen versus unseen accuracy curve can be plotted. In this case, the area under seen-unseen accuracy curve (AUSUC) is proposed as one optimal criterion measuring the overall performance of the models in generalized zero-shot learning tasks. A schematic is shown in Figure 5.

    Figure 5.  Schematic of the seen-unseen accuracy curve as defined in [9]. The black point with γ=0 denotes the original performance of the model, the red point with γ=1 and the green point with γ=1 represent the adjusted results where the predicted scores are fully biased towards the seen and unseen classes, respectively.

    Entire process focused. Instead of adjusting directly the confidence scores, a gradient based instance perturbation is introduced in [114]. A regularization term in [63] as the sum of the l2-norm of input samples is adopted to achieve robust learning [21]. This training process could be regarded as adversarial defense which makes the learned classifier sufficiently robust to small perturbations in the sample space. During the test, the perturbed instance inclined to the unseen the most is obtained in the neighborhood of the original sample through calculating an adversarial perturbation based on a designed classification loss. Since the classifier is robust among the training classes, the predictions of unseen class instances will tend to be unseen, while those of seen class instances will keep consistent.

    In [7], a self-learning process is proposed where each time hard unseen classes are selected based on the frequencies of the prediction during the test. Then an expanded training set with additional sampled instances from those hard unseen classes is constructed to re-train the model. The modified training set could enhance the sensitivity of model for those hard classes thus can boost the performance of the model under the transductive scenario.

    The core component of generative methods is the generator that takes semantic information as input and outputs corresponding pseudo samples. Such a generator can be constructed based on variational autoencoder (VAE) [39] or generative adversarial network (GAN) [20] architecture. It can be also trained with the labelled samples with corresponded semantics. Then, by employing the unseen semantics, pseudo samples of unseen classes could be generated where the zero-shot learning task is converted to common classification. In this case, the information extractor M denotes a training process and the output is a trained generator G which takes A (sometimes combined with Xtr) as inputs and outputs synthesized samples for corresponding classes. With the synthesized samples of unseen classes to support the training, the classifier can be designed as a common image classifier C() which takes samples as input and outputs the confidence score for each class. Here we will review those representative generative methods in different frameworks.

    Variational autoencoder is designed to derive a recognition model in the form qϕ(z|x) to approximate the real intractable posterior ptheta(z|x) with the objective function:

    L(θ,ϕ,xi)=DKL(qϕ(z|xi)||pθ(z))+Eqϕ(z|xi)[logpθ(xi|z)], (4.1)

    where DKL denotes the Kullback-Leibler distance, qϕ(z|x) is regarded as a probabilistic encoder, and pθ(x|z) is regarded as a probabilistic decoder. As the most straightforward form of VAE, conditional VAE [86] is applied to zero-shot learning in [66] as shown in Figure 6, where the sample is concatenated with the corresponding attributes to learn the distribution parameters; the sampled random variables based on the learned parameters are again concatenated with the corresponding attributes to reconstruct the sample. The objective function can be simply redesigned as

    L(θ,ϕ,xi,ayi)=DKL(qϕ(z|xi,ayi)||pθ(z|ayi))+Eqϕ(z|xi)[logpθ(xi|z,ayi)]. (4.2)
    Figure 6.  Schematic diagram of conditional VAE used in [66], where denotes concatenation, E denotes the encoder, and D denotes the decoder.

    In [90], Kullback-Leibler distance relevant to the synthesized samples and regression error of the semantic attributes from the corresponding synthesized samples are proposed as two additional regularization terms. A dual VAE architecture is designed in [81] where two VAE frameworks are trained respectively on the visual features and semantic attributes. The correlation between these two frameworks is constructed via minimizing the cross reconstruction errors and the Wasserstein distances between the latent Gaussian distribution for those sample-attributes pairs coming from the same class. The dual VAE is improved in [64], where a deep embedding network achieving the regression task from the semantic attribute to visual features is additionally designed. Then the hidden layer of this network is utilized as the input of the semantic VAE framework. The designed regression forces the hidden layer to become representative for both visual features and semantic attributes, thus benefiting the entire VAE framework. A disentangled dual VAE is designed in [50]. Different from the original dual VAE, each VAE framework learns two distributions, thereby sampling two random variables zpm and ztm. Notice that m denotes the modality which could be s and v representing semantic space and visual space respectively. For a group of pairs of training data, {zpm,i} is shuffled as {˜zpm,i} and then added up with {ztm,i}. Optimizing the model with this additional classification loss disentangles category-distilling factors and category-dispersing factors from both of the visual and semantic features. The multimodal VAE proposed in [5] builds one VAE framework for the concatenation of the visual feature and the embedded semantic attributes from the same class to capture the correlations between modalities. In identifiable VAE designed in [22], three VAE frameworks sharing the decoder for sample reconstruction are built taking the sample, the attribute, and both of them as inputs respectively. With an additional regularization term [42] encouraging disentanglement during inference, the learned latent space captures more significant information for generating discriminative samples.

    In generative adversarial networks, a generator G and a discriminator D are designed to be trained against each other iteratively with the loss function:

    minGmaxDL(D,G)=ExXtr[logD(x)]+Ezpz(z)[log(1D(G(z)))]. (4.3)

    Here, pz(z) denotes a prior on input noise variables z, the discriminator is trained to distinguish the generated pseudo samples from the samples in the original dataset and the target of generator is to synthesize pseudo samples as similar as the real samples so that the learned discriminator cannot recognize them. Then following the WGAN proposed in [23] where Wasserstein distance is leveraged, the loss of the conditional WGAN in zero-shot learning can be developed as

    minGmaxDLfWGAN(D,G)=ExXtr[logD(x,ay)]Ezpz(z)[D(G(z,ay),ay)]λEzpz(z)[(||G(z,ay)D(G(z,ay),ay)||21)2]. (4.4)

    In [105], a classifier among seen classes is pre-trained on the training set, then adopted to supply a classification supervision for the samples generated from a WGAN framework. Guided by this additional supervision, the generator will learn to synthesize more discriminate samples which benefits the training of the final classifier. Inspired by the prototypical networks in few-shot learning [84], multiple prototypes of each seen class are calculated in [49]. Samples of each class are grouped into several clusters, then the average of samples in each group is regarded as one prototype for the corresponding class. Similarly, the prototypes of the synthesized samples could also be obtained based on the clusters. By minimizing the distances from the synthesized samples to their closest corresponding prototypes and distances from the synthesized prototypes to their closest real prototypes, the synthesized samples are constrained to be highly related to the attributes and real samples. Instead of adopting the classification supervision, a gradient guidance from a pre-trained classifier is proposed in [80]. In this model, classifier parameters at different spots during training are employed for calculating the optimization gradients based on the real sample and synthesized sample respectively. Expectations of the Cosine distances between the gradients are calculated from the real and synthesized samples, which are then utilized as an additional loss term to promote synthesizing samples as representative as real ones. In [25], conditional GAN is adopted with the designed instance-level and class-level contrastive embedding, where two classification problems are constructed among the embedded feature space to encourage the features to capture strong discriminative information. By employing additional taxonomy knowledge, hierarchical labels are obtained to calculate multi-prototypes for each class in [107]. Constraining the synthesized samples close to all their corresponding prototypes will encourage the synthesized samples to capture the hierarchical correlations. Inspired by space-aligned embedding, semantic rectifying GAN is proposed [117], in which a semantic rectifying loss is designed to enhance the discriminativeness of semantics under the guidance of visual relationships and two pre- and post-reconstructions (used to keep the consistency between synthesized visual and semantic features). Considering that the original semantics might not be discriminative enough, disentangling class representation generative adversarial network [116] is proposed to search automatically discriminative representations by a multi-modal triplet loss that utilizes multi-modal information.

    Since GAN based methods tend to over-fit and VAE based methods tend to under-fit, some works adopt both the frameworks in their methods. CVAE is trained with a regressor against a discriminator in [28]. The framework proposed in [106] shares the decoder in conditional VAE as the generator for a conditional WGAN. This framework is also applicable for the transductive scenario by training another discriminator for unseen samples. In this model, a pre-trained classifier on the training set is adopted as classification supervision contributing to more discriminating synthesized samples. The dual VAE is trained with two additional discriminators in [62] based on the sum of the dual VAE loss and the conditional WGAN loss to avoid blurry synthesized samples.

    As a meta learning process proposed in [16], Model-Agnostic Meta-Learning is referred to in zero-shot learning to train generative models. First, each meta task contains meta training and meta validation set which are sampled from the training set. The model optimized over each meta task can become more generalized due to the divergence of the meta tasks. Moreover, the optimization process for parameters is also conducted in a meta way. Rather than learning parameters performing the best over tasks, the target here is to learn the most adaptive ones for all the meta tasks. In other words, the learned parameters may not achieve the best performance in the current training meta task, but may attain significant performance in different tasks with few-step training on them.

    A conditional WGAN with a pre-trained classifier is optimized under this meta learning strategy in [91]. In [92], Model-Agnostic Meta-Learning is applied to the complex framework where the conditional VAE shares the decoder as the generator for a conditional WGAN. The parameters of the encoder, decoder (generator), and discriminator are optimized under such strategy to generate high-fidelity samples only relying on a few number of training examples from seen classes. Pseudo labels for the different meta task distribution is utilized for a task discriminator in [59]. During the training, once the task discriminator is defeated, the encoder is able to align multiple diverse tasks into a unified distribution. With the aligned embedded features, a conditional GAN which generates the pseudo embedded features from Gaussian noises and attributes with a learnable classifier can be trained under the meta learning strategy.

    Benchmarks. To avoid overlapping between unseen classes and training classes used for the pre-trained feature extractor, specific data splits for five commonly used benchmarks are proposed with extracted features in [104]. This work has greatly facilitated the evaluation of models for subsequent studies. Here, we will focus on four of them to set up a summary of the comparisons among the most representative methods.

    Animals with Attributes (AwA2) [104] contains 30, 475 images from public web sources for 50 highly descriptive animal classes with at least 92 labelled examples per class. For example, the attributes include stripes, brown, eats fish and so on. Caltech-UCSD-Birds-200-2011 datasets (CUB)) [95] is a fine-grained dataset with a large number of classes and attributes, containing 11, 788 images from 200 different types of birds annotated with 312 attributes. SUN Attribute (SUN) [72] is a fine-grained dataset, medium-scale in class number, containing 14, 340 scene images annotated with 102 attributes, e.g. sailing/boating, glass, and ocean. The dataset Attribute Pascal and Yahoo (aPY) [15] is a small-scale dataset with 64 attributes and 32 object classes, including animals, vehicles, and buildings.

    We recommend the splitting strategy used in [104] for the datasets, since most of the current methods are evaluated following such protocol. More details can be found in Table 1. Notice that Animals with Attributes (AwA1) [44] is not introduced here since it is not publicly available due to the copyright issue. It is worthy to mention that there are some other datasets adopted in zero-shot learning, e.g. the large scale dataset ImageNet-1K [12], the small scale fine-grained dataset Oxford Flower-102 (FLO) [68], and fMRI (functional Magnetic Resonance Images) [67]. Since they are not the most commonly used as the previous four benchmarks and some of the experimental settings on them are inconsistent in different studies, we will not go into details about them. Some evaluation protocols for them can be referred to in [8,13,17,46,70].

    Table 1.  Statistics for AwA1, AwA2, aPY, CUB and SUN in terms of granularity, class size, sample size and sample divergence.
    Dataset Size Granularity Semantic type Size of semantics Class size Sample size
    train(seen) unseen train testseen testunseen
    AwA1 medium coarse Attributes 85 40 10 19832 4958 5685
    AwA2 medium coarse Attributes 85 40 10 23527 5882 7913
    CUB medium fine Attributes 312 150 50 7057 1764 2967
    aPY small coarse Attributes/text 64 20 12 5932 1483 7924
    SUN medium fine Attributes 102 645 72 10320 2580 1440

     | Show Table
    DownLoad: CSV

    Evaluation criteria. Compared with the conventional zero-shot learning task, the generalized one can better evaluate the capability for constructing recognition conception of unseen classes, thus are selected for demonstrating the performances of the methods in this article. Since the model needs to discriminate between seen and unseen classes simultaneously ensuring correct classification, the performance of both seen and unseen classes needs to be measured. Following the most commonly used generalized task criteria defined in [104], we define ACCS and ACCU as two average per-class top-1 accuracies to measure the classification performances on seen and unseen classes as

    ACCS=1KSKSk=1TPkNk, (5.1)
    ACCU=1KUKUk=1TPkNk, (5.2)

    where TPk denotes the number of the true positive samples that is correctly predicted in kth-class and the Nk denotes the number of the instances in kth-class. In other words, the top-1 prediction accuracy for each class is considered equally independent of the sample size of that class. Specifically, the candidates for the predicted labels in such classification are all the classes but not singly those of seen or unseen. Then the comprehensive performance in generalized zero-shot learning task can be evaluated by the harmonic mean of these accuracies defined as follows:

    H=2×ACCS×ACCUACCS+ACCU. (5.3)

    In this section, we will summarize the reported performance of the representative methods with implementation details. Tables 2 and 3 present the comparisons on the methods on AwA2, CUB, aPY, and SUN benchmarks in types of embedding methods and generative methods, respectively. The results are obtained from the corresponding published papers or the comparisons provided in [104] and all the H values are displayed in boldface. The listed methods are roughly sorted according to the published periods and performances for different scenarios. Here we regard the ResNet101 pre-trained on ImageNet 1K outputs features in 2, 048 dimensions as the settled backbone for extracting visual features. In Table 2, the first part of the table divided by double solid lines presents the methods where the backbone is not changed and the rest summarizes those methods adjusting the backbone. The column Extra in the table contains several indicators about the implementation details that could boost the performance of the model, which are listed as follows.

    Table 2.  Comparisons of embedding methods on AwA2, CUB, aPY and SUN. Average ranking denotes the mean of the ranks of H values among the four datasets, "–" denotes the results were not reported, I, ST and T represent the inductive, semantic transductive, and transductive training scenarios respectively. Superscript with number denotes the same methods corresponding to different implementation setups.
    Method Scenario Extra AwA2 CUB aPY SUN Average ranking
    ACCU ACCS H ACCU ACCS H ACCU ACCS H ACCU ACCS H
    DeViSE (2013) [17] I 17.1 74.7 27.8 23.8 53.0 32.8 4.9 76.9 9.2 16.9 27.4 20.9 13.8
    SSE (2015) [122] I 8.1 82.5 14.8 8.5 46.9 14.4 0.2 78.9 0.4 2.1 36.4 4.0 17.8
    ESZSL (2015) [77] I 5.9 77.8 11.0 12.6 63.8 21.0 2.4 70.1 4.6 11.0 27.9 15.8 17.0
    SJE (2015) [3] I 8.0 73.9 14.4 23.5 59.2 33.6 3.7 55.7 6.9 14.7 30.5 19.8 14.5
    LatEm (2016) [103] I 11.5 77.3 20.0 15.2 57.3 24.0 0.1 73.0 0.2 14.7 28.8 19.5 16.5
    SAE (2017) [41] I 1.1 82.2 2.2 7.8 54.0 13.6 0.4 80.9 0.9 8.8 18.0 11.8 18.3
    DEM (2017) [121] I 30.5 86.4 45.1 19.6 57.9 29.2 11.1 75.1 19.4 20.5 34.3 25.6 12.8
    PSR (2018) [4] I 20.7 73.8 32.3 24.6 54.3 33.9 13.5 51.4 21.4 20.8 37.2 26.7 11.5
    LESAE (2018) [55] I 21.8 70.6 33.3 24.3 53.0 33.3 12.7 56.1 20.1 21.9 34.7 26.9 11.8
    RN (2018) [88] I 30.0 93.4 45.3 38.1 61.1 47.0 9.5
    SFDEM (2019) [113] I 39.0 84.5 53.4 21.9 47.5 30.0 26.2 78.5 39.3 10.3
    TVN (2019) [120] I 26.5 62.3 37.2 16.1 66.9 25.9 22.2 38.3 28.1 9.7
    PQZSL (2019) [48] I 31.7 70.9 43.8 43.2 51.4 46.9 27.9 64.1 38.8 35.1 35.3 35.2 8.0
    CRnet (2019) [119] I 52.6 78.8 63.1 45.5 56.8 50.5 32.4 68.4 44.0 36.5 34.1 35.3 3.8
    DTNet (2020) [34] I 44.9 53.5 48.9 25.5 59.9 35.5 8.0
    LAF (2020) [56] I 50.4 58.5 54.2 43.7 52.0 47.5 33.8 49.0 40.0 36.0 36.6 36.3 5.5
    advRN (2020) [114] I 49.3 84.0 62.2 44.3 62.6 51.9 28.0 66.0 39.3 5.3
    DVBE (2020)1 [65] I 63.6 70.8 67.0 53.2 60.2 56.5 32.6 58.3 41.8 45.0 37.2 40.7 2.5
    LRSG-ZSL (2021) [110] I 60.4 84.9 70.6 48.5 49.3 48.9 30.3 76.2 43.4 51.2 22.4 31.2 4.3
    IPN (2021) [54] I 67.5 79.2 72.9 60.2 73.8 66.3 37.2 66.0 47.6 1.0
    TCN (2019) [36] ST 61.2 65.8 63.4 52.6 52.0 52.3 24.1 64.0 35.1 31.2 37.3 34.0 5.5
    LFGAA1 (2019) [57] I B F 27.0 93.4 41.9 36.2 80.9 50.0 18.5 40.0 25.3 11.0
    AREN (2019) [108] I B F 54.7 79.1 64.7 63.2 69.0 66.0 30.0 47.9 36.9 40.3 32.3 35.9 7.0
    APN (2020) [112] I B F K 56.5 78.0 65.5 65.3 69.3 67.2 41.9 34.0 37.6 6.3
    DVBE (2020)2 [65] I F 62.7 77.5 69.4 64.4 73.2 68.5 37.9 55.9 45.2 44.1 41.6 42.8 3.5
    GEM-ZSL (2021) [58] I B F K 64.8 77.5 70.6 64.8 77.1 70.4 38.1 35.7 36.9 4.7
    DAZLE (2020) [31] ST B   60.3 75.7 67.1 56.7 59.6 58.1 52.3 24.3 33.2 8.3
    RGEN (2020) [109] ST B F 67.1 76.5 71.5 60.0 73.5 66.1 30.4 48.1 37.2 44.0 31.7 36.8 4.8
    AGAN (2022) [82] ST B   64.1 80.3 71.3 67.9 71.5 69.7 40.9 42.9 41.8 3.7
    LFGAA2 (2019) [57] T B F 50.0 90.3 64.4 43.4 79.6 56.2 20.8 34.9 26.1 10.0
    QFSL (2018) [87] T F 66.2 93.1 77.4 71.5 74.9 73.2 51.3 31.2 38.8 2.3
    STHS-S2V (2021) [7] T 91.4 92.3 91.8 71.2 74.5 72.8 70.7 44.8 54.8 1.3

     | Show Table
    DownLoad: CSV

    Backbone modification. Indicator B denotes that the architecture of the feature extractor is modified to improve the obtained visual feature space. Such modification includes designing additional attention modules accompanied with the backbone, repeatedly adopting the feature extractor to extract the divided image regions to obtain multiple features, employing the multi-channel feature map layer before pooling in the pre-trained ResNet, or constructing the backbone with other advanced neural network architectures.

    Fine-tuning. Indicator F specifies that the borrowed backbone is fine-tuned during training. As in most of the methods, the pre-trained backbone is frozen and the extracted visual features are directly employed as the training samples, their performances are evaluated under the same feature space. On the contrary, the methods fine-tuning the backbone with the proposed model lead to different feature spaces, thus the evaluation of them can not be considered strictly in the same setting as the methods without fine-tuning.

    Additional knowledge. Indicator K denotes that the information commonly not included in the benchmarks is leveraged to improve the performance of the model. Note that the pre-trained deep neural network is not counted as additional knowledge as this is somehow a common setting in zero-shot learning. Such additional knowledge includes taxonomy knowledge as hierarchical labels, correlations between attributes captured by manually defined or through word embedding models trained on extra text corpora, captured gaze point, and data augmentation technology.

    Compared with the embedding methods of Table 2 in the same period, most of those generative methods of Table 3 appear to achieve better performance. Nonetheless, strictly speaking, training the classifier via samples generated based on unseen semantics in generative models can be considered as employing additionally unseen information (which is not used in embedding methods). Therefore, their performance difference may be due to such subtle setting difference. In this sense, to construct rigorous comparisons, we advocate evaluating the embedding and generative methods separately. Moreover, as the current best models in both of these two families under the inductive scenario, i.e. IPN [54] and CE-GZSL [25], perform quite similar actually, we believe embedding and generative methods are of equal importance in zero-shot learning.

    Table 3.  Comparisons of generative methods on AwA2, CUB, aPY and SUN. Average ranking denotes the mean of the ranks of H values among the four datasets, "–" denotes the results were not reported, I, ST and T represent the inductive, semantic transductive, and transductive training scenarios respectively. Superscript with number denotes the same methods corresponding to different implementation setups.
    Method Scenario Extra AwA2 CUB aPY SUN Average ranking
    ACCU ACCS H ACCU ACCS H ACCU ACCS H ACCU ACCS H
    f-CLSWGAN (2018) [105] I 43.7 57.7 49.7 42.6 36.6 39.4 18.0
    SRGAN (2019) [117] I 31.3 60.9 41.3 22.3 78.4 34.8 22.1 38.3 27.4 15.3
    LisGAN (2019) [49] I 46.5 57.9 51.6 42.9 37.8 40.2 17.0
    GDAN (2019) [28] I 32.1 67.5 43.5 39.3 66.7 49.5 30.4 75.0 43.4 38.1 89.9 53.4 11.0
    CADA-VAE (2019) [81] I 55.8 75.0 63.9 51.6 53.5 52.4 47.2 35.7 40.6 15.7
    f-VAEGAN-D21 (2019) [106] I 57.6 70.6 63.5 48.4 60.1 53.6 45.1 38.0 41.3 15.0
    f-VAEGAN-D22 (2019) [106] I F 57.1 76.1 65.2 63.2 75.6 68.9 50.1 37.8 43.1 9.3
    ZSML (2020) [91] I 58.9 74.6 65.8 60.0 52.1 55.7 36.3 46.6 40.9 10.0
    DE-VAE (2020) [64] I 58.8 78.9 67.4 52.5 56.3 54.3 45.9 36.9 40.9 12.0
    DR-VAE (2021) [50] I 56.9 80.2 66.6 51.1 58.2 54.4 36.6 47.6 41.4 11.7
    M-VAE (2021) [5] I 61.3 72.4 66.4 57.1 62.9 59.8 42.4 58.7 49.2 8.3
    DGN (2021) [111] I 60.1 76.4 67.3 53.8 61.9 57.6 36.5 61.7 45.9 48.3 37.4 42.1 8.5
    DCRGAN (2021) [116] I 55.8 66.8 60.8 37.2 71.7 49.0 47.1 38.5 42.4 6.3
    CE-GZSL (2021) [25] I 63.1 78.6 70.0 63.9 66.8 65.3 48.8 38.6 43.1 7.0
    TGMZ (2021) [59] I   K 64.1 77.3 70.1 60.3 56.8 58.5 34.8 77.1 48.0 6.0
    CKL+TR (2021) [107] I   K 61.2 92.6 73.7 57.8 50.2 53.7 30.8 78.9 44.3 8.0
    APN+f-VAEGAN-D2 (2020) [112] I B F K 62.2 69.5 65.6 65.7 74.9 70.0 49.4 39.2 43.7 8.0
    AFGN (2022) [82] ST B   68.1 82.9 74.7 69.8 77.1 73.2 53.1 45.9 49.2 3.7
    f-VAEGAN-D23 (2019) [106] T 84.8 88.6 86.7 61.4 65.1 63.2 60.6 41.9 49.6 4.3
    f-VAEGAN-D24 (2019) [106] T F 86.3 88.7 87.5 73.8 81.4 77.3 54.2 41.8 47.2 3.0
    STHS-WGAN (2021) [7] T 94.9 92.3 93.6 77.4 74.5 75.9 67.5 44.8 53.9 1.3

     | Show Table
    DownLoad: CSV

    Moreover, as shown in these two tables, the methods with modified or fine-tuned backbones outperform their original counterparts published in the same year. Especially, the effectiveness of fine-tuning has been verified in the embedding method DVBE [65] and the generative method f-VAEGAN-D2 [106]. Fine-tuning leads to 2.4%, 12.0%, 3.4%, 2.1% absolute increment in the H values for DVBE on AwA2, CUB, aPY and SUN respectively. Similar improvements can also be observed for the f-VAEGAN-D2 under the inductive and transductive scenarios. These results imply that fine-tuning the backbone overall benefits the generalized zero-shot learning especially on the CUB benchmarks.

    Most outstanding embedding and generative methods under the inductive scenario often utilize additional knowledge. In this way, more adequate information can help better construct concepts of unseen classes through knowledge of seen classes. The validity of the employed additional knowledge is not accurately presented in these comparison tables. One can refer to each relevant paper for more details.

    When the methods in all the scenarios are compared together, for both the embedding and generative methods, one can find that methods STHS-S2V and STHS-WGAN [7] in the transductive scenario, attain the highest H values on most of the benchmarks. The unlabelled data with unseen classes attributes provide a detailed target guidance for the transformation of categorical knowledge, thus making such scenario the easiest generalized case. If one takes TCN [36] as the most similar method of RN [88] under the semantic transductive scenario (via accessing the unseen attributes during training), 18.1% and 5.3% absolute improvement have been achieved on AwA2 and CUB, respectively. Moreover, the gaps between the performances of the LFGAA [57] under both the semantic transductive and inductive scenarios also confirm the contribution of unseen attributes in training the model in generalized zero-shot learning.

    In this section, the type of the classifiers or the number of synthesized pseudo samples for training is not collated here, as the impact of these implementation details on model performance is uncertain when the models are structured differently or applied on different databases. We focus on specifying differences in the implementation details which commonly lead to explicit changes in performance among the current representative methods. On the one hand, we acknowledge the contribution of those methods of adopting additional knowledge or modifications; on the other hand, showcasing numerical comparisons among different methods with different implementation settings may not be rigorous enough, which could lead to a misleading assessment of the capability of the model. We advocate researchers set up comparisons among the methods under the same implementation settings. Moreover, all the additional operations and/or auxiliary knowledge appear critically important and thus should keep clear and be stated explicitly for fair and precise evaluations.

    In this article, we have provided a comprehensive survey of image classification with zero-shot learning. We have put one main focus of this survey on the implementation issues. Particularly, with the methods steadily improved, different problem settings, and diverse experimental setups have emerged, and thus we have examined three implementation details that can boost the performance of zero-shot learning, i.e. whether the backbone structure has been modified, whether fine-tuning has been conducted, and whether additional knowledge has been used. By annotating these experimental details, we have collected a more careful comparison among various zero-shot methodologies. While generative methods appear to outperform embedding methods overall, we argue that the performance difference may be due to the different settings, thus suggesting that it may be fairer to compare them separately. Moreover, we observe that the current best models in both families perform quite similar under the inductive scenario. Thus we believe embedding and generative methods are of equal importance in zero-shot learning.

    The work was partially supported by the following: National Natural Science Foundation of China under no.61876155; Jiangsu Science and Technology Programme (Natural Science Foundation of Jiangsu Province) under no. BE2020006-4; Key Program Special Fund in XJTLU under no. KSF-T-06, and KSF-E-26.

    All authors declare no conflicts of interest in this paper.



    [1] EI Hendouzi A, Bourouhou A (2020) Solar Photovoltaic Power Forecasting. J Electr Comput Eng 2020: 1–21. https://doi.org/10.1155/2020/8819925 doi: 10.1155/2020/8819925
    [2] Ürkmez M, Kallesøe C, Dimon Bendtsen J, et al. (2022) Day-ahead pv power forecasting for control applications. IECON 2022, 48th Annual Conference of the IEEE Industrial Electronics Society, Brussels, Belgium, 1–6. https://doi.org/10.1109/IECON49645.2022.9968709
    [3] Cheng S, Prentice IC, Huang Y, et al. (2022) Data-driven surrogate model with latent data assimilation: Application to wildfire forecasting. J Comput Phys 464: 111302. https://doi.org/10.1016/J.JCP.2022.111302 doi: 10.1016/J.JCP.2022.111302
    [4] Cheng S, Jin Y, Harrison SP, et al. (2022) Parameter Flexible Wildfire Prediction Using Machine Learning Techniques: Forward and Inverse Modelling. Remote Sens 14: 3228. https://doi.org/10.3390/RS14133228 doi: 10.3390/RS14133228
    [5] Zhong C, Cheng S, Kasoar M, et al. (2023) Reduced-order digital twin and latent data assimilation for global wildfire prediction. Nat Hazard Earth Sys 23: 1755–1768. https://doi.org/10.5194/NHESS-23-1755-2023 doi: 10.5194/NHESS-23-1755-2023
    [6] Gupta P, Singh R (2021) PV power forecasting based on data-driven models: a review. Int J Sustain Eng 14: 1733–1755. https://doi.org/10.1080/19397038.2021.1986590 doi: 10.1080/19397038.2021.1986590
    [7] López Santos M, García-Santiago X, Echevarría Camarero F, et al. (2022) Application of Temporal Fusion Transformer for Day-Ahead PV Power Forecasting. Energies 15: 5232. https://doi.org/10.3390/EN15145232 doi: 10.3390/EN15145232
    [8] Kanchana W, Sirisukprasert S (2020) PV Power Forecasting with Holt-Winters Method. 2020 8th International Electrical Engineering Congress (IEECON), 1–4. https://doi.org/10.1109/IEECON48109.2020.229517
    [9] Dhingra S, Gruosso G, Gajani GS (2023) Solar PV Power Forecasting and Ageing Evaluation Using Machine Learning Techniques. IECON 2023 49th Annual Conference of the IEEE Industrial Electronics Society, 1–6. https://doi.org/10.1109/IECON51785.2023.10312446
    [10] Hanif MF, Naveed MS, Metwaly M, et al. (2021) Advancing solar energy forecasting with modified ANN and light GBM learning algorithms. AIMS Energy 12: 350–386. https://doi.org/10.3934/ENERGY.2024017 doi: 10.3934/ENERGY.2024017
    [11] Hanif MF, Siddique MU, Si J, et al. (2021) Enhancing Solar Forecasting Accuracy with Sequential Deep Artificial Neural Network and Hybrid Random Forest and Gradient Boosting Models across Varied Terrains. Adv Theory Simul 7: 2301289. https://doi.org/10.1002/ADTS.202301289 doi: 10.1002/ADTS.202301289
    [12] Musafa A, Priyadi A, Lystianingrum V, et al. (2023) Stored Energy Forecasting of Small-Scale Photovoltaic-Pumped Hydro Storage System Based on Prediction of Solar Irradiance, Ambient Temperature, and Rainfall Using LSTM Method. IECON 2023 49th Annual Conference of the IEEE Industrial Electronics, 1–6. https://doi.org/10.1109/IECON51785.2023.10311982
    [13] Konstantinou M, Peratikou S, Charalambides AG (2021) Solar Photovoltaic Forecasting of Power Output Using LSTM Networks. Atmosphere 12: 124. https://doi.org/10.3390/ATMOS12010124 doi: 10.3390/ATMOS12010124
    [14] Jasiński M, Leonowicz Z, Jasiński J, et al. (2023) PV Advancements & Challenges: Forecasting Techniques, Real Applications, and Grid Integration for a Sustainable Energy Future. 2023 IEEE International Conference on Environment and Electrical Engineering and 2023 IEEE Industrial and Commercial Power Systems Europe (EEEIC/I & CPS Europe), Spain, 1–5. https://doi.org/10.1109/EEEIC/ICPSEUROPE57605.2023.10194796
    [15] Cantillo-Luna S, Moreno-Chuquen R, Celeita D, et al. (2023) Deep and Machine Learning Models to Forecast Photovoltaic Power Generation. Energies 16: 4097. https://doi.org/10.3390/EN16104097 doi: 10.3390/EN16104097
    [16] Kaushik AR, Padmavathi S, Gurucharan KS, et al. (2023) Performance Analysis of Regression Models in Solar PV Forecasting. 2023 3rd International Conference on Artificial Intelligence and Signal Processing (AISP), India, 1–5. https://doi.org/10.1109/AISP57993.2023.10134943
    [17] Halabi LM, Mekhilef S, Hossain M (2018) Performance evaluation of hybrid adaptive neuro-fuzzy inference system models for predicting monthly global solar radiation. Appl Energy 213: 247–261. https://doi.org/10.1016/J.APENERGY.2018.01.035 doi: 10.1016/J.APENERGY.2018.01.035
    [18] Zhang G, Wang X, Du Z (2015) Research on the Prediction of Solar Energy Generation based on Measured Environmental Data. Int J U e-Service Sci Technol 8: 385–402. https://doi.org/10.14257/IJUNESST.2015.8.5.37 doi: 10.14257/IJUNESST.2015.8.5.37
    [19] Peng Q, Zhou X, Zhu R, et al. (2023) A Hybrid Model for Solar Radiation Forecasting towards Energy Efficient Buildings. 2023 7th International Conference on Green Energy and Applications (ICGEA), 7–12. https://doi.org/10.1109/ICGEA57077.2023.10125987
    [20] Salisu S, Mustafa MW, Mustapha M (2018) Predicting Global Solar Radiation in Nigeria Using Adaptive Neuro-Fuzzy Approach. Recent Trends in Information and Communication Technology. IRICT 2017. Lecture Notes on Data Engineering and Communications Technologies, 5: 513–521. https://doi.org/10.1007/978-3-319-59427-9_54
    [21] Kaur A, Nonnenmacher L, Pedro HTC, et al. (2016) Benefits of solar forecasting for energy imbalance markets. Renewable Energy 86: 819–830. https://doi.org/10.1016/J.RENENE.2015.09.011 doi: 10.1016/J.RENENE.2015.09.011
    [22] Yang D, Li W, Yagli GM, et al. (2021) Operational solar forecasting for grid integration: Standards, challenges, and outlook. Sol Energy 224: 930–937. https://doi.org/10.1016/J.SOLENER.2021.04.002 doi: 10.1016/J.SOLENER.2021.04.002
    [23] Shi G, Eftekharnejad S (2016) Impact of solar forecasting on power system planning. 2016 North American Power Symposium (NAPS), 1–6. https://doi.org/10.1109/NAPS.2016.7747909
    [24] Shi J, Guo J, Zheng S (2012) Evaluation of hybrid forecasting approaches for wind speed and power generation time series. Renewable Sustainable Energy Rev 16: 3471–3480. https://doi.org/10.1016/j.rser.2012.02.044 doi: 10.1016/j.rser.2012.02.044
    [25] Mohanty S, Patra PK, Sahoo SS, et al. (2017) Forecasting of solar energy with application for a growing economy like India: Survey and implication. Renewable Sustainable Energy Rev 78: 539–553. https://doi.org/10.1016/J.RSER.2017.04.107 doi: 10.1016/J.RSER.2017.04.107
    [26] Sweeney C, Bessa RJ, Browell J, et al. (2020) The future of forecasting for renewable energy. Wiley Interdiscip Rev Energy Environ 9: e365. https://doi.org/10.1002/WENE.365 doi: 10.1002/WENE.365
    [27] Brancucci Martinez-Anido C, Botor B, Florita AR, et al. (2016) The value of day-ahead solar power forecasting improvement. Sol Energy 129: 192–203. https://doi.org/10.1016/J.SOLENER.2016.01.049 doi: 10.1016/J.SOLENER.2016.01.049
    [28] Inman RH, Pedro HTC, Coimbra CFM (2013) Solar forecasting methods for renewable energy integration. Prog Energy Combust Sci 39: 535–576. https://doi.org/10.1016/J.PECS.2013.06.002 doi: 10.1016/J.PECS.2013.06.002
    [29] Cui M, Zhang J, Hodge BM, et al. (2018) A Methodology for Quantifying Reliability Benefits from Improved Solar Power Forecasting in Multi-Timescale Power System Operations. IEEE T Smart Grid 9: 6897–6908. https://doi.org/10.1109/TSG.2017.2728480 doi: 10.1109/TSG.2017.2728480
    [30] Wang H, Lei Z, Zhang X, et al. (2019) A review of deep learning for renewable energy forecasting. Energy Convers Manage 198: 111799. https://doi.org/10.1016/J.ENCONMAN.2019.111799 doi: 10.1016/J.ENCONMAN.2019.111799
    [31] Aupke P, Kassler A, Theocharis A, et al. (2021) Quantifying Uncertainty for Predicting Renewable Energy Time Series Data Using Machine Learning. Eng Proc 5: 50. https://doi.org/10.3390/ENGPROC2021005050 doi: 10.3390/ENGPROC2021005050
    [32] Rajagukguk RA, Ramadhan RAA, Lee HJ (2020) A Review on Deep Learning Models for Forecasting Time Series Data of Solar Irradiance and Photovoltaic Power. Energies 13: 6623. https://doi.org/10.3390/EN13246623 doi: 10.3390/EN13246623
    [33] SETO 2020—Artificial Intelligence Applications in Solar Energy. Available from: https://www.energy.gov/eere/solar/seto-2020-artificial-intelligence-applications-solar-energy.
    [34] Freitas S, Catita C, Redweik P, et al. (2015) Modelling solar potential in the urban environment: State-of-the-art review. Renewable Sustainable Energy Rev 41: 915–931. https://doi.org/10.1016/J.RSER.2014.08.060 doi: 10.1016/J.RSER.2014.08.060
    [35] Gürtürk M, Ucar F, Erdem M (2022) A novel approach to investigate the effects of global warming and exchange rate on the solar power plants. Energy 239: 122344. https://doi.org/10.1016/J.ENERGY.2021.122344 doi: 10.1016/J.ENERGY.2021.122344
    [36] Gaye B, Zhang D, Wulamu A (2021) Improvement of Support Vector Machine Algorithm in Big Data Background. Math Probl Eng 2021: 5594899. https://doi.org/10.1155/2021/5594899 doi: 10.1155/2021/5594899
    [37] Yogambal Jayalakshmi N, Shankar R, Subramaniam U, et al. (2021) Novel Multi-Time Scale Deep Learning Algorithm for Solar Irradiance Forecasting. Energies 14: 2404. https://doi.org/10.3390/EN14092404 doi: 10.3390/EN14092404
    [38] Benti NE, Chaka MD, Semie AG (2023) Forecasting Renewable Energy Generation with Machine Learning and Deep Learning: Current Advances and Future Prospects. Sustainability 15: 7087. https://doi.org/10.3390/SU15097087 doi: 10.3390/SU15097087
    [39] Li J, Ward JK, Tong J, et al. (2016) Machine learning for solar irradiance forecasting of photovoltaic system. Renewable Energy 90: 542–553. https://doi.org/10.1016/J.RENENE.2015.12.069 doi: 10.1016/J.RENENE.2015.12.069
    [40] Long H, Zhang Z, Su Y (2014) Analysis of daily solar power prediction with data-driven approaches. Appl Energy 126: 29–37. https://doi.org/10.1016/J.APENERGY.2014.03.084 doi: 10.1016/J.APENERGY.2014.03.084
    [41] Jebli I, Belouadha FZ, Kabbaj MI, et al. (2021) Prediction of solar energy guided by pearson correlation using machine learning. Energy 224: 120109. https://doi.org/10.1016/J.ENERGY.2021.120109 doi: 10.1016/J.ENERGY.2021.120109
    [42] Khandakar A, Chowdhury MEH, Kazi MK, et al. (2019) Machine Learning Based Photovoltaics (PV) Power Prediction Using Different Environmental Parameters of Qatar. Energies 12: 2782. https://doi.org/10.3390/EN12142782 doi: 10.3390/EN12142782
    [43] Kim SG, Jung JY, Sim MK (2019) A Two-Step Approach to Solar Power Generation Prediction Based on Weather Data Using Machine Learning. Sustainability 11: 1501. https://doi.org/10.3390/SU11051501 doi: 10.3390/SU11051501
    [44] Gutiérrez L, Patiño J, Duque-Grisales E (2021) A Comparison of the Performance of Supervised Learning Algorithms for Solar Power Prediction. Energies 14: 4424. https://doi.org/10.3390/EN14154424 doi: 10.3390/EN14154424
    [45] Wang Z, Xu Z, Zhang Y, et al. (2020) Optimal Cleaning Scheduling for Photovoltaic Systems in the Field Based on Electricity Generation and Dust Deposition Forecasting. IEEE J Photovolt 10: 1126–1132. https://doi.org/10.1109/JPHOTOV.2020.2981810 doi: 10.1109/JPHOTOV.2020.2981810
    [46] Massaoudi M, Chihi I, Sidhom L, et al. (2021) An Effective Hybrid NARX-LSTM Model for Point and Interval PV Power Forecasting. IEEE Access 9: 36571–36588. https://doi.org/10.1109/ACCESS.2021.3062776 doi: 10.1109/ACCESS.2021.3062776
    [47] Arora I, Gambhir J, Kaur T (2021) Data Normalisation-Based Solar Irradiance Forecasting Using Artificial Neural Networks. Arab J Sci Eng 46: 1333–1343. https://doi.org/10.1007/S13369-020-05140-Y/METRICS doi: 10.1007/S13369-020-05140-Y/METRICS
    [48] Alipour M, Aghaei J, Norouzi M, et al. (2020) A novel electrical net-load forecasting model based on deep neural networks and wavelet transform integration. Energy 205: 118106. https://doi.org/10.1016/J.ENERGY.2020.118106 doi: 10.1016/J.ENERGY.2020.118106
    [49] Zolfaghari M, Golabi MR (2021) Modeling and predicting the electricity production in hydropower using conjunction of wavelet transform, long short-term memory and random forest models. Renewable Energy 170: 1367–1381. https://doi.org/10.1016/J.RENENE.2021.02.017 doi: 10.1016/J.RENENE.2021.02.017
    [50] Li FF, Wang SY, Wei JH (2018) Long term rolling prediction model for solar radiation combining empirical mode decomposition (EMD) and artificial neural network (ANN) techniques. J Renewable Sustainable Energy 10: 013704. https://doi.org/10.1063/1.4999240 doi: 10.1063/1.4999240
    [51] Wang S, Guo Y, Wang Y, et al. (2021) A Wind Speed Prediction Method Based on Improved Empirical Mode Decomposition and Support Vector Machine. IOP Conference Series: Earth and Environmental Science, IOP Publishing. 680: 012012. https://doi.org/10.1088/1755-1315/680/1/012012
    [52] Moreno SR, dos Santos Coelho L (2018) Wind speed forecasting approach based on Singular Spectrum Analysis and Adaptive Neuro Fuzzy Inference System. Renewable Energy 126: 736–754. https://doi.org/10.1016/J.RENENE.2017.11.089 doi: 10.1016/J.RENENE.2017.11.089
    [53] Zhang Y, Le J, Liao X, et al. (2019) A novel combination forecasting model for wind power integrating least square support vector machine, deep belief network, singular spectrum analysis and locality-sensitive hashing. Energy 168: 558–572. https://doi.org/10.1016/J.ENERGY.2018.11.128 doi: 10.1016/J.ENERGY.2018.11.128
    [54] Espinar B, Aznarte JL, Girard R, et al. (2010) Photovoltaic Forecasting: A state of the art. 5th European PV-hybrid and mini-grid conference. OTTI-Ostbayerisches Technologie-Transfer-Institut.
    [55] Moreno-Munoz A, De La Rosa JJG, Posadillo R, et al. (2008) Very short term forecasting of solar radiation. 2008 33rd IEEE Photovoltaic Specialists Conference, San Diego, CA, USA. https://doi.org/10.1109/PVSC.2008.4922587
    [56] Anderson D, Leach M (2004) Harvesting and redistributing renewable energy: on the role of gas and electricity grids to overcome intermittency through the generation and storage of hydrogen. Energy Policy 32: 1603–1614. https://doi.org/10.1016/S0301-4215(03)00131-9 doi: 10.1016/S0301-4215(03)00131-9
    [57] Zhang J, Zhao L, Deng S, et al. (2017) A critical review of the models used to estimate solar radiation. Renewable Sustainable Energy Rev 70: 314–329. https://doi.org/10.1016/J.RSER.2016.11.124 doi: 10.1016/J.RSER.2016.11.124
    [58] Coimbra CFM, Kleissl J, Marquez R (2013) Overview of Solar-Forecasting Methods and a Metric for Accuracy Evaluation. Sol Energy Forecast Resour Assess, 171–194. https://doi.org/10.1016/B978-0-12-397177-7.00008-5 doi: 10.1016/B978-0-12-397177-7.00008-5
    [59] Miller SD, Rogers MA, Haynes JM, et al. (2018) Short-term solar irradiance forecasting via satellite/model coupling. Sol Energy 168: 102–117. https://doi.org/10.1016/J.SOLENER.2017.11.049 doi: 10.1016/J.SOLENER.2017.11.049
    [60] Kumari P, Toshniwal D (2021) Deep learning models for solar irradiance forecasting: A comprehensive review. J Cleaner Prod 318: 128566. https://doi.org/10.1016/J.JCLEPRO.2021.128566 doi: 10.1016/J.JCLEPRO.2021.128566
    [61] Hassan GE, Youssef ME, Mohamed ZE, et al. (2016) New Temperature-based Models for Predicting Global Solar Radiation. Appl Energy 179: 437–450. https://doi.org/10.1016/J.APENERGY.2016.07.006 doi: 10.1016/J.APENERGY.2016.07.006
    [62] Angstrom A (1924) Solar and terrestrial radiation. Report to the international commission for solar research on actinometric investigations of solar and atmospheric radiation. Q J R Meteorol Soc 50: 121–126. https://doi.org/10.1002/QJ.49705021008 doi: 10.1002/QJ.49705021008
    [63] Samuel TDMA (1991) Estimation of global radiation for Sri Lanka. Sol Energy 47: 333–337. https://doi.org/10.1016/0038-092X(91)90026-S doi: 10.1016/0038-092X(91)90026-S
    [64] Ögelman H, Ecevit A, Tasdemiroǧlu E (1984) A new method for estimating solar radiation from bright sunshine data. Sol Energy 33: 619–625. https://doi.org/10.1016/0038-092X(84)90018-5 doi: 10.1016/0038-092X(84)90018-5
    [65] Badescu V, Gueymard CA, Cheval S, et al. (2013) Accuracy analysis for fifty-four clear-sky solar radiation models using routine hourly global irradiance measurements in Romania. Renewable Energy 55: 85–103. https://doi.org/10.1016/J.RENENE.2012.11.037 doi: 10.1016/J.RENENE.2012.11.037
    [66] Mecibah MS, Boukelia TE, Tahtah R, et al. (2014) Introducing the best model for estimation the monthly mean daily global solar radiation on a horizontal surface (Case study: Algeria). Renewable Sustainable Energy Rev 36: 194–202. https://doi.org/10.1016/J.RSER.2014.04.054 doi: 10.1016/J.RSER.2014.04.054
    [67] Hargreaves GH, Samani ZA (1982) Estimating Potential Evapotranspiration. J Irrig Drain Div 108: 225–230. https://doi.org/10.1061/JRCEA4.0001390 doi: 10.1061/JRCEA4.0001390
    [68] Bristow KL, Campbell GS (1984) On the relationship between incoming solar radiation and daily maximum and minimum temperature. Agric For Meteorol 31: 159–166. https://doi.org/10.1016/0168-1923(84)90017-0 doi: 10.1016/0168-1923(84)90017-0
    [69] Chen JL, He L, Yang H, et al. (2019) Empirical models for estimating monthly global solar radiation: A most comprehensive review and comparative case study in China. Renewable Sustainable Energy Rev 108: 91–111. https://doi.org/10.1016/j.rser.2019.03.033 doi: 10.1016/j.rser.2019.03.033
    [70] Chen Y, Zhang S, Zhang W, et al. (2019) Multifactor spatio-temporal correlation model based on a combination of convolutional neural network and long short-term memory neural network for wind speed forecasting. Energy Convers Manage 185: 783–799. https://doi.org/10.1016/j.enconman.2019.02.01 doi: 10.1016/j.enconman.2019.02.01
    [71] Siddiqui TA, Bharadwaj S, Kalyanaraman S (2019) A Deep Learning Approach to Solar-Irradiance Forecasting in Sky-Videos. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), 2166–2174. https://doi.org/10.1109/WACV.2019.00234
    [72] Nie Y, Li X, Paletta Q, et al. (2024) Open-source sky image datasets for solar forecasting with deep learning: A comprehensive survey. Renewable Sustainable Energy Rev 189: 113977. https://doi.org/10.1016/j.rser.2023.113977 doi: 10.1016/j.rser.2023.113977
    [73] SkyImageNet, 2024. Available from: https://github.com/SkyImageNet.
    [74] Brahma B, Wadhvani R (2020) Solar Irradiance Forecasting Based on Deep Learning Methodologies and Multi-Site Data. Symmetry 12: 1–20. https://doi.org/10.3390/sym12111830 doi: 10.3390/sym12111830
    [75] Paletta Q, Terrén-Serrano G, Nie Y, et al. (2023) Advances in solar forecasting: Computer vision with deep learning. Adv Appl Energy 11: 100150. https://doi.org/10.1016/j.adapen.2023.100150 doi: 10.1016/j.adapen.2023.100150
    [76] Ghimire S, Deo RC, Raj N, et al. (2019) Deep solar radiation forecasting with convolutional neural network and long short-term memory network algorithms. Appl Energy 253: 113541. https://doi.org/10.1016/J.APENERGY.2019.113541 doi: 10.1016/J.APENERGY.2019.113541
    [77] Elsaraiti M, Merabet A (2022) Solar Power Forecasting Using Deep Learning Techniques. IEEE Access 10: 31692–31698. https://doi.org/10.1109/ACCESS.2022.3160484 doi: 10.1109/ACCESS.2022.3160484
    [78] Reikard G (2009) Predicting solar radiation at high resolutions: A comparison of time series forecasts. Sol Energy 83: 342–349. https://doi.org/10.1016/J.SOLENER.2008.08.007 doi: 10.1016/J.SOLENER.2008.08.007
    [79] Yang D, Jirutitijaroen P, Walsh WM (2012) Hourly solar irradiance time series forecasting using cloud cover index. Sol Energy 86: 3531–3543. https://doi.org/10.1016/J.SOLENER.2012.07.029 doi: 10.1016/J.SOLENER.2012.07.029
    [80] Jaihuni M, Basak JK, Khan F, et al. (2020) A Partially Amended Hybrid Bi-GRU—ARIMA Model (PAHM) for Predicting Solar Irradiance in Short and Very-Short Terms. Energies 13: 435. https://doi.org/10.3390/EN13020435 doi: 10.3390/EN13020435
    [81] Verbois H, Huva R, Rusydi A, et al. (2018) Solar irradiance forecasting in the tropics using numerical weather prediction and statistical learning. Sol Energy 162: 265–277. https://doi.org/10.1016/j.solener.2018.01.007 doi: 10.1016/j.solener.2018.01.007
    [82] Munkhammar J, van der Meer D, Widén J (2019) Probabilistic forecasting of high-resolution clear-sky index time-series using a Markov-chain mixture distribution model. Sol Energy 184: 688–695. https://doi.org/10.1016/j.solener.2019.04.014 doi: 10.1016/j.solener.2019.04.014
    [83] Dong J, Olama MM, Kuruganti T, et al. (2020) Novel stochastic methods to predict short-term solar radiation and photovoltaic power. Renewable Energy 145: 333–346. https://doi.org/10.1016/j.renene.2019.05.073 doi: 10.1016/j.renene.2019.05.073
    [84] Ahmad T, Zhang D, Huang C (2021) Methodological framework for short-and medium-term energy, solar and wind power forecasting with stochastic-based machine learning approach to monetary and energy policy applications. Energy 231: 120911. https://doi.org/10.1016/j.energy.2021.120911 doi: 10.1016/j.energy.2021.120911
    [85] Box GE, Jenkins GM, Reinsel GC, et al. (2015) Time series analysis: Forecasting and control, John Wiley & Sons.
    [86] Louzazni M, Mosalam H, Khouya A (2020) A non-linear auto-regressive exogenous method to forecast the photovoltaic power output. Sustain Energy Techn 38: 100670. https://doi.org/10.1016/j.seta.2020.100670 doi: 10.1016/j.seta.2020.100670
    [87] Larson DP, Nonnenmacher L, Coimbra CFM (2016) Day-ahead forecasting of solar power output from photovoltaic plants in the American Southwest. Renewable Energy 91: 11–20. https://doi.org/10.1016/j.renene.2016.01.039 doi: 10.1016/j.renene.2016.01.039
    [88] Sharma V, Yang D, Walsh W, et al. (2016) Short term solar irradiance forecasting using a mixed wavelet neural network. Renewable Energy 90: 481–492. https://doi.org/10.1016/J.RENENE.2016.01.020 doi: 10.1016/J.RENENE.2016.01.020
    [89] Kumari P, Toshniwal D (2020) Real-time estimation of COVID-19 cases using machine learning and mathematical models-The case of India. 2020 IEEE 15th International Conference on Industrial and Information Systems, 369–374. https://doi.org/10.1109/ICIIS51140.2020.9342735
    [90] Ahmad MW, Mourshed M, Rezgui Y (2018) Tree-based ensemble methods for predicting PV power generation and their comparison with support vector regression. Energy 164: 465–474. https://doi.org/10.1016/J.ENERGY.2018.08.207 doi: 10.1016/J.ENERGY.2018.08.207
    [91] Wang Z, Wang Y, Zeng R, et al. (2018) Random Forest based hourly building energy prediction. Energy Buildings 171: 11–25. https://doi.org/10.1016/J.ENBUILD.2018.04.008 doi: 10.1016/J.ENBUILD.2018.04.008
    [92] Zou L, Wang L, Lin A, et al. (2016) Estimation of global solar radiation using an artificial neural network based on an interpolation technique in southeast China. J Atmos Sol-Terr Phys 146: 110–122. https://doi.org/10.1016/J.JASTP.2016.05.013 doi: 10.1016/J.JASTP.2016.05.013
    [93] Mellit A, Benghanem M, Kalogirou SA (2006) An adaptive wavelet-network model for forecasting daily total solar-radiation. Appl Energy 83: 705–722. https://doi.org/10.1016/J.APENERGY.2005.06.003 doi: 10.1016/J.APENERGY.2005.06.003
    [94] Çelik Ö, Teke A, Yildirim HB (2016) The optimized artificial neural network model with Levenberg–Marquardt algorithm for global solar radiation estimation in Eastern Mediterranean Region of Turkey. J Cleaner Prod 116: 1–12. https://doi.org/10.1016/J.JCLEPRO.2015.12.082 doi: 10.1016/J.JCLEPRO.2015.12.082
    [95] Rehman S, Mohandes M (2008) Artificial neural network estimation of global solar radiation using air temperature and relative humidity. Energy Policy 36: 571–576. https://doi.org/10.1016/J.ENPOL.2007.09.033 doi: 10.1016/J.ENPOL.2007.09.033
    [96] Gürel AE, Ağbulut Ü, Biçen Y (2020) Assessment of machine learning, time series, response surface methodology and empirical models in prediction of global solar radiation. J Cleaner Prod 277: 122353. https://doi.org/10.1016/J.JCLEPRO.2020.122353 doi: 10.1016/J.JCLEPRO.2020.122353
    [97] Díaz-Gómez J, Parrales A, Á lvarez A, et al. (2015) Prediction of global solar radiation by artificial neural network based on a meteorological environmental data. Desalin Water Treat 55: 3210–3217. https://doi.org/10.1080/19443994.2014.939861 doi: 10.1080/19443994.2014.939861
    [98] Rocha PAC, Fernandes JL, Modolo AB, et al. (2019) Estimation of daily, weekly and monthly global solar radiation using ANNs and a long data set: a case study of Fortaleza, in Brazilian Northeast region. Int J Energy Environ Eng 10: 319–334. https://doi.org/10.1007/S40095-019-0313-0/TABLES/6 doi: 10.1007/S40095-019-0313-0/TABLES/6
    [99] Rezrazi A, Hanini S, Laidi M (2016) An optimisation methodology of artificial neural network models for predicting solar radiation: a case study. Theor Appl Climatol 123: 769–783. https://doi.org/10.1007/s00704-015-1398-x doi: 10.1007/s00704-015-1398-x
    [100] Pang Z, Niu F, O'Neill Z (2020) Solar radiation prediction using recurrent neural network and artificial neural network: A case study with comparisons. Renewable Energy 156: 279–289. https://doi.org/10.1016/J.RENENE.2020.04.042 doi: 10.1016/J.RENENE.2020.04.042
    [101] Toth E, Brath A, Montanari A (2000) Comparison of short-term rainfall prediction models for real-time flood forecasting. J Hydrol 239: 132–147. https://doi.org/10.1016/S0022-1694(00)00344-9 doi: 10.1016/S0022-1694(00)00344-9
    [102] Mamoulis N, Seidl T, Pedersen TB, et al. (2009) Advances in Spatial and Temporal Databases, Springer Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02982-0
    [103] Ren J, Ren B, Zhang Q, et al. (2019) A Novel Hybrid Extreme Learning Machine Approach Improved by K Nearest Neighbor Method and Fireworks Algorithm for Flood Forecasting in Medium and Small Watershed of Loess Region. Water 11: 1848. https://doi.org/10.3390/W11091848 doi: 10.3390/W11091848
    [104] Larose DT, Larose CD (2014) k‐Nearest Neighbor Algorithm. Discovering Knowledge in Data: An Introduction to Data Mining, Second Edition, 149–164. https://doi.org/10.1002/9781118874059.CH7
    [105] Sutton C (2012) Nearest-neighbor methods. WIREs Comput Stat 4: 307–309. https://doi.org/10.1002/WICS.1195 doi: 10.1002/WICS.1195
    [106] Chen JL, Li GS, Xiao BB, et al. (2015) Assessing the transferability of support vector machine model for estimation of global solar radiation from air temperature. Energy Convers Manage 89: 318–329. https://doi.org/10.1016/j.enconman.2014.10.004 doi: 10.1016/j.enconman.2014.10.004
    [107] Shamshirband S, Mohammadi K, Tong CW, et al. (2016) A hybrid SVM-FFA method for prediction of monthly mean global solar radiation. Theor Appl Climatol 125: 53–65.
    [108] Olatomiwa L, Mekhilef S, Shamshirband S, et al. (2015) Potential of support vector regression for solar radiation prediction in Nigeria. Nat Hazards 77: 1055–1068. https://doi.org/10.1007/s11069-015-1641-x doi: 10.1007/s11069-015-1641-x
    [109] Ramedani Z, Omid M, Keyhani A, et al. (2014) Potential of radial basis function based support vector regression for global solar radiation prediction. Renewable Sustainable Energy Rev 39: 1005–1011. https://doi.org/10.1016/J.RSER.2014.07.108 doi: 10.1016/J.RSER.2014.07.108
    [110] Olatomiwa L, Mekhilef S, Shamshirband S, et al. (2015) A support vector machine-firefly algorithm-based model for global solar radiation prediction. Sol Energy 115: 632–644. https://doi.org/10.1016/j.solener.2015.03.015 doi: 10.1016/j.solener.2015.03.015
    [111] Mohammadi K, Shamshirband S, Danesh AS, et al. (2016) Temperature-based estimation of global solar radiation using soft computing methodologies. Theor Appl Climatol 125: 101–112. https://doi.org/10.1007/s00704-015-1487-x doi: 10.1007/s00704-015-1487-x
    [112] Hassan MA, Khalil A, Kaseb S, et al. (2017) Potential of four different machine-learning algorithms in modeling daily global solar radiation. Renewable Energy 111: 52–62. https://doi.org/10.1016/j.renene.2017.03.083 doi: 10.1016/j.renene.2017.03.083
    [113] Quej VH, Almorox J, Arnaldo JA, et al. (2017) ANFIS, SVM and ANN soft-computing techniques to estimate daily global solar radiation in a warm sub-humid environment. J Atmos Sol-Terr Phys 155: 62–70. https://doi.org/10.1016/J.JASTP.2017.02.002 doi: 10.1016/J.JASTP.2017.02.002
    [114] Baser F, Demirhan H (2017) A fuzzy regression with support vector machine approach to the estimation of horizontal global solar radiation. Energy 123: 229–240. https://doi.org/10.1016/j.energy.2017.02.008 doi: 10.1016/j.energy.2017.02.008
    [115] Breiman L (2001) Random forests. Mach Learn 45: 5–32. https://doi.org/10.1023/A:1010933404324 doi: 10.1023/A:1010933404324
    [116] Fernández-Delgado M, Cernadas E, Barro S, et al. (2014) Do we need hundreds of classifiers to solve real world classification problems? J Mach Learn Res 15: 3133–3181.
    [117] Ke G, Meng Q, Finley T, et al. (2017) Lightgbm: A highly efficient gradient boosting decision tree. Adv Neural Inf Proc Syst, 30.
    [118] Wang Y, Pan Z, Zheng J, et al. (2019) A hybrid ensemble method for pulsar candidate classification. Astrophys Space Sci 364: 139 https://doi.org/10.1007/s10509-019-3602-4 doi: 10.1007/s10509-019-3602-4
    [119] Si Z, Yang M, Yu Y, et al. (2021) Photovoltaic power forecast based on satellite images considering effects of solar position. Appl Energy 302: 117514. https://doi.org/10.1016/j.apenergy.2021.117514 doi: 10.1016/j.apenergy.2021.117514
    [120] Chung J, Gulcehre C, Cho K, et al. (2014) Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv: 1412.3555.
    [121] Wang Y, Liao W, Chang Y (2018) Gated Recurrent Unit Network-Based Short-Term Photovoltaic Forecasting. Energies 11: 2163. https://doi.org/10.3390/EN11082163 doi: 10.3390/EN11082163
    [122] Pazikadin AR, Rifai D, Ali K, et al. (2020) Solar irradiance measurement instrumentation and power solar generation forecasting based on Artificial Neural Networks (ANN): A review of five years research trend. Sci Total Environ 715: 136848. https://doi.org/10.1016/j.scitotenv.2020.136848 doi: 10.1016/j.scitotenv.2020.136848
    [123] Wang F, Xuan Z, Zhen Z, et al. (2020) A day-ahead PV power forecasting method based on LSTM-RNN model and time correlation modification under partial daily pattern prediction framework. Energy Convers Manage 212: 112766. https://doi.org/10.1016/j.enconman.2020.112766 doi: 10.1016/j.enconman.2020.112766
    [124] Zhang J, Yan J, Infield D, et al. (2019) Short-term forecasting and uncertainty analysis of wind turbine power based on long short-term memory network and Gaussian mixture model. Appl Energy 241: 229–244. https://doi.org/10.1016/j.apenergy.2019.03.044 doi: 10.1016/j.apenergy.2019.03.044
    [125] Liu H, Mi X, Li Y, et al. (2019) Smart wind speed deep learning based multi-step forecasting model using singular spectrum analysis, convolutional Gated Recurrent Unit network and Support Vector Regression. Renewable Energy 143: 842–854. https://doi.org/10.1016/j.renene.2019.05.039 doi: 10.1016/j.renene.2019.05.039
    [126] Tealab A (2018) Time series forecasting using artificial neural networks methodologies: A systematic review. Future Comput Inf J 3: 334–340. https://doi.org/10.1016/j.fcij.2018.10.003 doi: 10.1016/j.fcij.2018.10.003
    [127] Dong N, Chang JF, Wu AG, et al. (2020) A novel convolutional neural network framework based solar irradiance prediction method. Int J Electr Power Energy Syst 114: 105411. https://doi.org/10.1016/j.ijepes.2019.105411 doi: 10.1016/j.ijepes.2019.105411
    [128] Hinton GE, Srivastava N, Krizhevsky A, et al. (2012) Improving neural networks by preventing co-adaptation of feature detectors.
    [129] Han Z, Zhao J, Leung H, et al. (2021) A Review of Deep Learning Models for Time Series Prediction. IEEE Sens J 21: 7833–7848. https://doi.org/10.1109/JSEN.2019.2923982 doi: 10.1109/JSEN.2019.2923982
    [130] Shi X, Chen Z, Wang H, et al. (2015) Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. Adv Neural Inf Proc Syst, 28.
    [131] Oord A van den, Dieleman S, Zen H, et al. (2016) WaveNet: A Generative Model for Raw Audio. arXiv preprint arXiv: 1609.03499. https://doi.org/10.48550/arXiv.1609.03499
    [132] Bai S, Kolter JZ, Koltun V (2018) An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. https://doi.org/10.48550/arXiv.1803.01271
    [133] Vaswani A, Brain G, Shazeer N, et al. (2017) Attention Is All You Need. arXiv preprint arXiv: 1706.03762.
    [134] Zang H, Liu L, Sun L, et al. (2020) Short-term global horizontal irradiance forecasting based on a hybrid CNN-LSTM model with spatiotemporal correlations. Renewable Energy 160: 26–41. https://doi.org/10.1016/j.renene.2020.05.150 doi: 10.1016/j.renene.2020.05.150
    [135] Qu J, Qian Z, Pei Y (2021) Day-ahead hourly photovoltaic power forecasting using attention-based CNN-LSTM neural network embedded with multiple relevant and target variables prediction pattern. Energy 232: 120996. https://doi.org/10.1016/j.energy.2021.120996 doi: 10.1016/j.energy.2021.120996
    [136] Schmidhuber J, Hochreiter S (1997) Long Short-Term Memory. Neural Comput 9: 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735 doi: 10.1162/neco.1997.9.8.1735
    [137] Venkatraman A, Hebert M, Bagnell J (2015) Improving Multi-Step Prediction of Learned Time Series Models. Proceedings of the AAAI Conference on Artificial Intelligence, 29. https://doi.org/10.1609/aaai.v29i1.9590 doi: 10.1609/aaai.v29i1.9590
    [138] Muhammad, Kennedy J, Lim CW (2022) Machine learning and deep learning in phononic crystals and metamaterials—A review. Mater Today Commun 33: 104606. https://doi.org/10.1016/J.MTCOMM.2022.104606 doi: 10.1016/J.MTCOMM.2022.104606
    [139] Yao G, Lei T, Zhong J (2019) A review of Convolutional-Neural-Network-based action recognition. Pattern Recogn Lett 118: 14–22. https://doi.org/10.1016/J.PATREC.2018.05.018 doi: 10.1016/J.PATREC.2018.05.018
    [140] Akram MW, Li G, Jin Y, et al. (2019) CNN based automatic detection of photovoltaic cell defects in electroluminescence images. Energy 189: 116319. https://doi.org/10.1016/J.ENERGY.2019.116319 doi: 10.1016/J.ENERGY.2019.116319
    [141] Bejani MM, Ghatee M (2021) A systematic review on overfitting control in shallow and deep neural networks. Artif Intell Rev 54: 6391–6438. https://doi.org/10.1007/s10462-021-09975-1 doi: 10.1007/s10462-021-09975-1
    [142] McCann MT, Jin KH, Unser M (2017) Convolutional neural networks for inverse problems in imaging: A review. IEEE Signal Proc Mag 34: 85–95. https://doi.org/10.1109/MSP.2017.2739299 doi: 10.1109/MSP.2017.2739299
    [143] Qian C, Xu B, Chang L, et al. (2021) Convolutional neural network based capacity estimation using random segments of the charging curves for lithium-ion batteries. Energy 227: 120333. https://doi.org/10.1016/J.ENERGY.2021.120333 doi: 10.1016/J.ENERGY.2021.120333
    [144] Liu Y, Guan L, Hou C, et al. (2019) Wind Power Short-Term Prediction Based on LSTM and Discrete Wavelet Transform. Appl Sci 9: 1108. https://doi.org/10.3390/APP9061108 doi: 10.3390/APP9061108
    [145] Husein M, Chung IY (2019) Day-Ahead Solar Irradiance Forecasting for Microgrids Using a Long Short-Term Memory Recurrent Neural Network: A Deep Learning Approach. Energies 12: 1856. https://doi.org/10.3390/EN12101856 doi: 10.3390/EN12101856
    [146] Zhao Z, Chen W, Wu X, et al. (2017) LSTM network: a deep learning approach for short-term traffic forecast. IET Intell Transp Syst 11: 68–75. https://doi.org/10.1049/IET-ITS.2016.0208 doi: 10.1049/IET-ITS.2016.0208
    [147] Suresh V, Janik P, Rezmer J, et al. (2020) Forecasting Solar PV Output Using Convolutional Neural Networks with a Sliding Window Algorithm. Energies 13: 723. https://doi.org/10.3390/EN13030723 doi: 10.3390/EN13030723
    [148] Zameer A, Jaffar F, Shahid F, et al. (2023) Short-term solar energy forecasting: Integrated computational intelligence of LSTMs and GRU. PLoS One 18: e0285410. https://doi.org/10.1371/journal.pone.0285410 doi: 10.1371/journal.pone.0285410
    [149] Bommasani R, Hudson DA, Adeli E, et al. (2021) On the Opportunities and Risks of Foundation Models. arXiv preprint arXiv: 2108.07258. https://doi.org/10.48550/arXiv.2108.07258
    [150] Devlin J (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv: 1810.04805.
    [151] Mann B, Ryder N, Subbiah M, et al. (2020) Language Models are Few-Shot Learners. arXiv preprint arXiv: 2005.14165, 1.
    [152] Radford A, Kim JW, Hallacy C, et al. (2021) Learning Transferable Visual Models from Natural Language Supervision. International conference on machine learning. PMLR.
    [153] Child R, Gray S, Radford A, et al. (2019) Generating Long Sequences with Sparse Transformers. arXiv preprint arXiv: 1904.10509. https://doi.org/10.48550/arXiv.1904.10509
    [154] Kitaev N, Kaiser Ł, Levskaya A (2020) Reformer: The Efficient Transformer. arXiv preprint arXiv: 2001.04451. https://doi.org/10.48550/arXiv.2001.04451
    [155] Beltagy I, Peters ME, Cohan A (2020) Longformer: The Long-Document Transformer. arXiv preprint arXiv: 2004.05150. https://doi.org/10.48550/arXiv.2004.05150
    [156] Wang S, Li BZ, Khabsa M, et al. (2020) Linformer: Self-Attention with Linear Complexity. arXiv preprint arXiv: 2006.04768. https://doi.org/10.48550/arXiv.2006.04768
    [157] Rae JW, Potapenko A, Jayakumar SM, et al. (2020) Compressive Transformers for Long-Range Sequence Modelling. arXiv preprint arXiv: 1911.05507. https://doi.org/10.48550/arXiv.1911.05507
    [158] Dai Z, Yang Z, Yang Y, et al. (2019) Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2978–2988, Florence, Italy. Association for Computational Linguistics. https://doi.org/10.18653/v1/p19-1285
    [159] Zhou H, Zhang S, Peng J, et al. (2021) Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 35: 11106–11115. https://doi.org/10.1609/AAAI.V35I12.17325 doi: 10.1609/AAAI.V35I12.17325
    [160] Hanif MF, Mi J (2024) Harnessing AI for solar energy: Emergence of transformer models. Appl Energy 369: 123541. https://doi.org/10.1016/J.APENERGY.2024.123541 doi: 10.1016/J.APENERGY.2024.123541
    [161] Hussain A, Khan ZA, Hussain T, et al. (2022) A Hybrid Deep Learning-Based Network for Photovoltaic Power Forecasting. Complexity. https://doi.org/10.1155/2022/7040601 doi: 10.1155/2022/7040601
    [162] Vennila C, Titus A, Sudha TS, et al. (2022) Forecasting Solar Energy Production Using Machine Learning. Int J Photoenergy 2022: 7797488. https://doi.org/10.1155/2022/7797488 doi: 10.1155/2022/7797488
    [163] So D, Oh J, Leem S, et al. (2023) A Hybrid Ensemble Model for Solar Irradiance Forecasting: Advancing Digital Models for Smart Island Realization. Electronics 12: 2607. https://doi.org/10.3390/electronics12122607 doi: 10.3390/electronics12122607
    [164] He Y, Liu Y, Shao S, et al. (2019) Application of CNN-LSTM in Gradual Changing Fault Diagnosis of Rod Pumping System. Math Probl Eng 2019: 4203821. https://doi.org/10.1155/2019/4203821 doi: 10.1155/2019/4203821
    [165] Huang CJ, Kuo PH (2018) A Deep CNN-LSTM Model for Particulate Matter (PM2.5) Forecasting in Smart Cities. Sensors 18: 2220. https://doi.org/10.3390/S18072220 doi: 10.3390/S18072220
    [166] Cao K, Kim H, Hwang C, et al. (2018) CNN-LSTM Coupled Model for Prediction of Waterworks Operation Data. J Inf Process Syst 14: 1508–1520. https://doi.org/10.3745/JIPS.02.0104 doi: 10.3745/JIPS.02.0104
    [167] Swapna G, Soman KP, Vinayakumar R (2018) Automated detection of diabetes using CNN and CNN-LSTM network and heart rate signals. Procedia Comput Sci 132: 1253–1262. https://doi.org/10.1016/j.procs.2018.05.041 doi: 10.1016/j.procs.2018.05.041
    [168] Jalali SMJ, Ahmadian S, Kavousi-Fard A, et al. (2022) Automated Deep CNN-LSTM Architecture Design for Solar Irradiance Forecasting. IEEE Trans Syst Man Cybernetics Syst 52: 54–65. https://doi.org/10.1109/TSMC.2021.3093519 doi: 10.1109/TSMC.2021.3093519
    [169] Lim SC, Huh JH, Hong SH, et al. (2022) Solar Power Forecasting Using CNN-LSTM Hybrid Model. Energies 15: 8233. https://doi.org/10.3390/EN15218233 doi: 10.3390/EN15218233
    [170] Covas E (2020) Transfer Learning in Spatial-Temporal Forecasting of the Solar Magnetic Field. Astron Nachr 341: 384–394. https://doi.org/10.1002/ASNA.202013690 doi: 10.1002/ASNA.202013690
    [171] Sheng H, Ray B, Chen K, et al. (2020) Solar Power Forecasting Based on Domain Adaptive Learning. IEEE Access 8: 198580–198590. https://doi.org/10.1109/ACCESS.2020.3034100 doi: 10.1109/ACCESS.2020.3034100
    [172] Ren X, Wang Y, Cao Z, et al. (2023) Feature Transfer and Rapid Adaptation for Few-Shot Solar Power Forecasting. Energies 16: 6211. https://doi.org/10.3390/EN16176211 doi: 10.3390/EN16176211
    [173] Zhou S, Zhou L, Mao M, et al. (2020) Transfer Learning for Photovoltaic Power Forecasting with Long Short-Term Memory Neural Network. 2020 IEEE International Conference on Big Data and Smart Computing (BigComp), Busan, Korea (South), 125–132. https://doi.org/10.1109/BIGCOMP48618.2020.00-87
    [174] Soleymani S, Mohammadzadeh S (2023) Comparative Analysis of Machine Learning Algorithms for Solar Irradiance Forecasting in Smart Grids. arXiv preprint arXiv: 2310.13791. https://doi.org/10.48550/arXiv.2310.13791
    [175] Sutarna N, Tjahyadi C, Oktivasari P, et al. (2023) Machine Learning Algorithm and Modeling in Solar Irradiance Forecasting. 2023 6th International Conference of Computer and Informatics Engineering (IC2IE), Lombok, Indonesia, 221–225. https://doi.org/10.1109/IC2IE60547.2023.10330942
    [176] Bamisile O, Oluwasanmi A, Ejiyi C, et al. (2022) Comparison of machine learning and deep learning algorithms for hourly global/diffuse solar radiation predictions. Int J Energy Res 46: 10052–10073. https://doi.org/10.1002/ER.6529 doi: 10.1002/ER.6529
    [177] Sahaya Lenin D, Teja Reddy R, Velaga V (2023) Solar Irradiance Forecasting Using Machine Learning. 2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), Delhi, India, 1–7. https://doi.org/10.1109/ICCCNT56998.2023.10307660
    [178] Syahab AS, Hermawan A, Avianto D (2023) Global Horizontal Irradiance Prediction using the Algorithm of Moving Average and Exponential Smoothing. JISA 6: 74–81. https://doi.org/10.31326/JISA.V6I1.1649. doi: 10.31326/JISA.V6I1.1649
    [179] Aljanad A, Tan NML, Agelidis VG, et al. (2021) Neural Network Approach for Global Solar Irradiance Prediction at Extremely Short-Time-Intervals Using Particle Swarm Optimization Algorithm. Energies 14: 1213. https://doi.org/10.3390/EN14041213 doi: 10.3390/EN14041213
    [180] Mbah OM, Madueke CI, Umunakwe R, et al. (2022) Extreme Gradient Boosting: A Machine Learning Technique for Daily Global Solar Radiation Forecasting on Tilted Surfaces. J Eng Sci 9: E1–E6. https://doi.org/10.21272/JES.2022.9(2).E1 doi: 10.21272/JES.2022.9(2).E1
    [181] Cha J, Kim MK, Lee S, et al. (2021) Investigation of Applicability of Impact Factors to Estimate Solar Irradiance: Comparative Analysis Using Machine Learning Algorithms. Appl Sci 11: 8533. https://doi.org/10.3390/APP11188533 doi: 10.3390/APP11188533
    [182] Reddy KR, Ray PK (2022) Solar Irradiance Forecasting using FFNN with MIG Feature Selection Technique. 2022 International Conference on Intelligent Controller and Computing for Smart Power (ICICCSP), Hyderabad, India, 01–05. https://doi.org/10.1109/ICICCSP53532.2022.9862335
    [183] Chandola D, Gupta H, Tikkiwal VA, et al. (2020) Multi-step ahead forecasting of global solar radiation for arid zones using deep learning. Procedia Comput Sci 167: 626–635. https://doi.org/10.1016/j.procs.2020.03.329 doi: 10.1016/j.procs.2020.03.329
    [184] Yang Y, Tang Z, Li Z, et al. (2023) Dual-Path Information Fusion and Twin Attention-Driven Global Modeling for Solar Irradiance Prediction. Sensors 23: 7649. https://doi.org/10.3390/S23177469 doi: 10.3390/S23177469
    [185] Meng F, Zou Q, Zhang Z, et al. (2021) An intelligent hybrid wavelet-adversarial deep model for accurate prediction of solar power generation. Energy Rep 7: 2155–2164. https://doi.org/10.1016/J.EGYR.2021.04.019 doi: 10.1016/J.EGYR.2021.04.019
    [186] Kartini UT, Hariyati, Aribowo W, et al. (2022) Development Hybrid Model Deep Learning Neural Network (DL-NN) For Probabilistic Forecasting Solar Irradiance on Solar Cells To Improve Economics Value Added. 2022 Fifth International Conference on Vocational Education and Electrical Engineering (ICVEE), Surabaya, Indonesia, 151–156. https://doi.org/10.1109/ICVEE57061.2022.9930352
    [187] Singla P, Duhan M, Saroha S (2022) A dual decomposition with error correction strategy based improved hybrid deep learning model to forecast solar irradiance. Energy Sources Part A 44: 1583–1607. https://doi.org/10.1080/15567036.2022.2056267 doi: 10.1080/15567036.2022.2056267
    [188] Marinho FP, Rocha PAC, Neto ARR, et al. (2023) Short-Term Solar Irradiance Forecasting Using CNN-1D, LSTM and CNN-LSTM Deep Neural Networks: A Case Study with the Folsom (USA) Dataset. J Sol Energy Eng 145: 041002. https://doi.org/10.1115/1.4056122 doi: 10.1115/1.4056122
    [189] Kumari P, Toshniwal D (2021) Long short term memory-convolutional neural network based deep hybrid approach for solar irradiance forecasting. Appl Energy 295: 117061. https://doi.org/10.1016/j.apenergy.2021.117061 doi: 10.1016/j.apenergy.2021.117061
    [190] Elizabeth Michael N, Mishra M, Hasan S, et al. (2022) Short-Term Solar Power Predicting Model Based on Multi-Step CNN Stacked LSTM Technique. Energies 15: 2150. https://doi.org/10.3390/EN15062150 doi: 10.3390/EN15062150
    [191] Srivastava RK, Gupta A (2023) Short term solar irradiation forecasting using Deep neural network with decomposition methods and optimized by grid search algorithm. E3S Web Conf 405. https://doi.org/10.1051/E3SCONF/202340502011 doi: 10.1051/E3SCONF/202340502011
    [192] Ziyabari S, Zhao Z, Du L, et al. (2023) Multi-Branch ResNet-Transformer for Short-Term Spatio-Temporal Solar Irradiance Forecasting. IEEE Trans Ind Appl 59: 5293–5303. https://doi.org/10.1109/TIA.2023.3285202 doi: 10.1109/TIA.2023.3285202
    [193] Carneiro TC, De Carvalho PCM, Dos Santos HA, et al. (2022) Review on Photovoltaic Power and Solar Resource Forecasting: Current Status and Trends. J Sol Energy Eng 144: 010801. https://doi.org/10.1115/1.4051652 doi: 10.1115/1.4051652
    [194] Chaibi M, Benghoulam ELM, Tarik L, et al. (2021) An Interpretable Machine Learning Model for Daily Global Solar Radiation Prediction. Energies 14: 7367. https://doi.org/10.3390/EN14217367 doi: 10.3390/EN14217367
    [195] Mason L, González AB de, García-Closas M, et al. (2023) Interpretable, non-mechanistic forecasting using empirical dynamic modeling and interactive visualization. PLoS One 18: e0277149. https://doi.org/10.1101/2022.10.21.22281384 doi: 10.1101/2022.10.21.22281384
    [196] Rafati A, Joorabian M, Mashhour E, et al. (2021) High dimensional very short-term solar power forecasting based on a data-driven heuristic method. Energy 219: 119647. https://doi.org/10.1016/J.ENERGY.2020.119647 doi: 10.1016/J.ENERGY.2020.119647
    [197] Wang H, Cai R, Zhou B, et al. (2020) Solar irradiance forecasting based on direct explainable neural network. Energy Convers Manage 226: 113487. https://doi.org/10.1016/J.ENCONMAN.2020.113487 doi: 10.1016/J.ENCONMAN.2020.113487
    [198] Theocharides S, Makrides G, Livera A, et al. (2020) Day-ahead photovoltaic power production forecasting methodology based on machine learning and statistical post-processing. Appl Energy 268: 115023. https://doi.org/10.1016/J.APENERGY.2020.115023 doi: 10.1016/J.APENERGY.2020.115023
  • This article has been cited by:

    1. Kara Combs, Hongjing Lu, Trevor J. Bihl, Transfer Learning and Analogical Inference: A Critical Comparison of Algorithms, Methods, and Applications, 2023, 16, 1999-4893, 146, 10.3390/a16030146
    2. Guanyu Yang, Kaizhu Huang, Rui Zhang, Xi Yang, Instance-Specific Model Perturbation Improves Generalized Zero-Shot Learning, 2024, 36, 0899-7667, 936, 10.1162/neco_a_01639
    3. Xiaodong Feng, Ying Liu, Tuan Kiang Chiew, 2024, Zero-Shot Image Classification: Recent Status and Future Trends, 979-8-3503-4911-5, 609, 10.1109/ICNLP60986.2024.10692717
    4. Chuanjiang Li, Shaobo Li, Yixiong Feng, Konstantinos Gryllias, Fengshou Gu, Michael Pecht, Small data challenges for intelligent prognostics and health management: a review, 2024, 57, 1573-7462, 10.1007/s10462-024-10820-4
    5. Weiguang Zhao, Guanyu Yang, Rui Zhang, Chenru Jiang, Chaolong Yang, Yuyao Yan, Amir Hussain, Kaizhu Huang, Open-Pose 3D zero-shot learning: Benchmark and challenges, 2025, 181, 08936080, 106775, 10.1016/j.neunet.2024.106775
    6. Pasi Fränti, Jun Shen, Chih-Cheng Hung, Applied Computing and Intelligence: A new open access journal, 2024, 4, 2771-392X, 19, 10.3934/aci.2024002
    7. Junbo Yang, Borui Hu, Hanyu Li, Yang Liu, Xinbo Gao, Jungong Han, Fanglin Chen, Xuangou Wu, Dynamic VAEs via semantic-aligned matching for continual zero-shot learning, 2024, 00313203, 111199, 10.1016/j.patcog.2024.111199
    8. Kauê de Moraes Vestena, Silvana Phillipi Camboim, Maria Antonia Brovelli, Daniel Rodrigues dos Santos, Investigating the Performance of Open-Vocabulary Classification Algorithms for Pathway and Surface Material Detection in Urban Environments, 2024, 13, 2220-9964, 422, 10.3390/ijgi13120422
    9. Ersin Aytaç, Mohamed Khayet, Visual Footprint of Separation Through Membrane Distillation on YouTube, 2025, 10, 2306-5729, 24, 10.3390/data10020024
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1802) PDF downloads(133) Cited by(4)

Figures and Tables

Figures(12)  /  Tables(3)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog