
Citation: Tú Nguyen-Dumont, Jenna Stewart, Ingrid Winship, Melissa C. Southey. Rare genetic variants: making the connection with breast cancer susceptibility[J]. AIMS Genetics, 2015, 2(4): 281-292. doi: 10.3934/genet.2015.4.281
[1] | Geovanni Alberto Ruiz-Romero, Carolina Álvarez-Delgado . Effects of estrogens in mitochondria: An approach to type 2 diabetes. AIMS Molecular Science, 2024, 11(1): 72-98. doi: 10.3934/molsci.2024006 |
[2] | Fumiaki Uchiumi, Makoto Fujikawa, Satoru Miyazaki, Sei-ichi Tanuma . Implication of bidirectional promoters containing duplicated GGAA motifs of mitochondrial function-associated genes. AIMS Molecular Science, 2014, 1(1): 1-26. doi: 10.3934/molsci.2013.1.1 |
[3] | Naba Hasan, Waleem Ahmad, Feroz Alam, Mahboob Hasan . Ferroptosis-molecular mechanisms and newer insights into some diseases. AIMS Molecular Science, 2023, 10(1): 22-36. doi: 10.3934/molsci.2023003 |
[4] | Jian Zou, Fulton T. Crews . Glutamate/NMDA excitotoxicity and HMGB1/TLR4 neuroimmune toxicity converge as components of neurodegeneration. AIMS Molecular Science, 2015, 2(2): 77-100. doi: 10.3934/molsci.2015.2.77 |
[5] | Fumiaki Uchiumi, Akira Sato, Masashi Asai, Sei-ichi Tanuma . An NAD+ dependent/sensitive transcription system: Toward a novel anti-cancer therapy. AIMS Molecular Science, 2020, 7(1): 12-28. doi: 10.3934/molsci.2020002 |
[6] | Yutaka Takihara, Ryuji Otani, Takuro Ishii, Shunsuke Takaoka, Yuki Nakano, Kaori Inoue, Steven Larsen, Yoko Ogino, Masashi Asai, Sei-ichi Tanuma, Fumiaki Uchiumi . Characterization of the human IDH1 gene promoter. AIMS Molecular Science, 2023, 10(3): 186-204. doi: 10.3934/molsci.2023013 |
[7] | Amena W. Smith, Swapan K. Ray, Arabinda Das, Kenkichi Nozaki, Baerbel Rohrer, Naren L. Banik . Calpain inhibition as a possible new therapeutic target in multiple sclerosis. AIMS Molecular Science, 2017, 4(4): 446-462. doi: 10.3934/molsci.2017.4.446 |
[8] | Dora Brites . Cell ageing: a flourishing field for neurodegenerative diseases. AIMS Molecular Science, 2015, 2(3): 225-258. doi: 10.3934/molsci.2015.3.225 |
[9] | Giulia Ambrosi, Pamela Milani . Endoplasmic reticulum, oxidative stress and their complex crosstalk in neurodegeneration: proteostasis, signaling pathways and molecular chaperones. AIMS Molecular Science, 2017, 4(4): 424-444. doi: 10.3934/molsci.2017.4.424 |
[10] | Tsuyoshi Inoshita, Yuzuru Imai . Regulation of vesicular trafficking by Parkinson's disease-associated genes. AIMS Molecular Science, 2015, 2(4): 461-475. doi: 10.3934/molsci.2015.4.461 |
With the advent of technologies allowing for large-scale, high throughput data, a much clearer understanding of the genomic mechanisms behind gene regulation have been gained. The scientists found that there are unexpected far more noncoding RNAs comparing with protein-coding genes and, and these noncoding regions play important roles in determining the complexity observed in the human genome [1,2]. Within these noncoding regions, long noncoding RNAs (lncRNAs), which are functionally defined as noncoding regions of RNA that are at least 200 base-pairs in length, have attracted lots of attention. Certain lncRNAs appear to act locally, while others have more distal regulatory effects, even acting across multiple chromosomes [3]. Many studies have identified specific functions of particular lncRNAs, including embryonic mechanisms, cell cycle functions, innate immunity, and disease processes. However, there are still thousands of lncRNAs have no identified functions [1,3,4,5,6]. Some studies have been performed that produce relatively few numbers of lncRNA functions [7], and have shown that the function of lncRNAs is highly cell-type-specific: one lncRNA may inhibit particular genes in one type of cell while promoting the same gene in another. This phenomenon makes it even more difficult to identify lncRNA functions on a large scale. Due to this specificity, researchers propose that future lncRNA studies should be performed on specific cell types to identify particular regulatory mechanisms.
One of the most prominent and intriguing applications of lncRNA regulatory investigation comes from cancer studies [8,9]. It has been shown that lncRNAs appear to have high connectivity with numerous diseases, especially cancer. Because of the highly cell type-specific nature of lncRNA regulatory functions and the irregularity of cancer cell genetic information, studying lncRNA regulation in specific cancer types may provide promising insight into specific genomic regulations of common cancer cells. In a few documented cases, specific lncRNAs have been shown to be significantly differentially expressed in specific cancer types, such as prostate cancer and breast cancer [1]. For these reasons, it seems appropriate to further investigate lncRNA-gene interactions in particular cancer cells.
The wealth of gene expression datasets available provides an opportunity to computationally identify co-expressed gene modules(CEMs), each of which is defined as a highly structured expression pattern on a specific gene set [10,11]. These CEMs tend to be functionally related or co-regulated by the same transcriptional regulatory signals (e.g., transcription factors, lncRNA and so on) under a specific condition or in a particular disease cell type. Overall, successful derivation of the CEMs may grant a higher-level interpretation of large-scale gene expression data, improve functional annotation of condition-specific gene activities, facilitate inference of gene regulatory relationships, hence, provide a better mechanism level understanding of complex diseases.
The computational identification of CEMs can be solved by a biclustering approach [12], which is a two-dimensional data mining technique that simultaneously identifies co-expressed genes under a subset of conditions. a high proportion of enriched biclusters on real datasets. Within this study, we try to identify new lncRNA-gene interactions and transcription factor-lncRNA partnerships from cancer RNA-seq data using a biclustering approach. The biclustering method will allow for the identification of particular expression patterns across multiple datasets, indicating networks of lncRNA and gene interactions. This developed method will also provide a framework for future lncRNA interaction studies. We applied this method on two sets of TCGA breast cancer RNA-seq data to generated CEMs based on known lncRNA-gene interactions. Then, the predicted CEMs are linked to lncRNA by a statistic p-value and the new lncRNA-gene relationship are generated. The evaluation on the predicted results showed that the pipeline can find some target genes for given lncRNA, and meanwhile the performance still has some space to be improved. We further conducted a TF motif analysis on the predicted CEMs and provide potential regulation cooperation between TFs and lncRNAs. The related original data with codes, results and supplementary data can be downloaded on https://github.com/IvesG/sGavin.git.
Two sets of TCGA (The Cancer Genome Atlas) breast cancer RNA-seq data, one from the normal cell (referred as normal data) and the other from tumor cell (referred as tumor data) were downloaded from https://portal.gdc.cancer.gov/. The normal and tumor data consist of 113 and 1091 samples, respectively. And of the 113 normal samples, 112 of them are from the same patient among the tumors. Both datasets contain 60,483 genes, among which there are 19,824 protein-coding genes and 7,399 long intergenic noncoding RNAs (lincRNAs) genes. The RNA-seq data are all Upper Quartile normalized FPKM (UQ-FPKM) values.
A total of 1,081 experimentally validated lncRNA-associated regulatory entries were downloaded from LncReg [13], describing the comprehensive regulatory relationships among 258 lncRNAs and 571 genes. All these relationships were manually collected from PubMed with focus on the data generated by laboratory methods, and can be categorized into up/down/active/inactive based on regulatory relationships or transcription/post-transcription/translation/post-translation based on regulatory mechanisms.
As we focus on lncRNA-gene interactions, the relationships downloaded from LncReg were filtered to retain only relationships describing genes regulated by lncRNAs with specified species information (constrained to Homo sapiens and Mus musculus), resulting 925 relationships in total for the downstream analysis, covering interactions between 309 unique human genes and 103 human lncRNAs, as well as between 199 mouse genes and 100 mouse lncRNAs. It is noteworthy that these 925 relationships include 28 post-transcriptional regulations, 41 post-translational regulations, 714 transcriptional regulations, 23 translational regulations, 1 transcriptional & translational regulation and 118 unspecified relationships.
As the table from LncReg [13] only provides gene symbols, while the RNA-seq dataset uses Ensembl ID as gene's identifiers, we use Ensembl BioMart [14] to match gene symbols with Ensembl IDs for all the genes and lncRNAs. Then we got orthologous genes between mouse and human also using BioMart; we found orthologous human genes for all 199 mouse genes, and 38 overlapped with original human genes. For convenience, we recorded human genes, mouse genes that don't overlap with human genes, human lncRNAs and mouse lncRNAs that don't overlap with human lncRNAs as HG, MG, HL, and ML, respectively.
We combined the normal and tumor RNA-seq dataset together, then extracted expression values for all the HG, MG, HL, ML, protein-coding genes (PC, the remaining protein-coding genes except HG and MG) and lincRNAs (linc, the remaining lincRNA except HL and ML). Taking the genes as rows and the conditions as columns, we obtained the RNA-seq expression matrix on which biclustering will be performed to detect CEMs.
QUBIC is a biclustering analysis tool designed for co-expression analyses of genes based on their gene-expression patterns under multiple conditions. The software can generally identify all statistically significant groups, or biclusters, of genes with similar expression patterns under at least a specific number of experimental conditions, which tend to be more sensitive and more specific than other biclustering tools [15]. We use a quantile-based discretization method of QUBIC to generate a qualitative representing matrix for the RNA-seq expression matrix. Then we extracted the rows of known lncRNA regulated HG and MG from this representing matrix as seed 1 and HG, MG, HL, and ML rows as seed 2. Next bi-clustering analysis was performed on these two seeds to predict co-expressed gene modules (CEMs) in the qualitative representing matrix, respectively.
For an identified CEMs, we calculated the P-value of a bicluster enriched with genes regulated by a lncRNA using the hypergeometric function [16],
$
\operatorname{Pr}(r | N, \mathrm{K}, n) = \frac{\left( \right)\left( \right)}{\left( \right)}
$
|
where r is the number of genes in a CEMs (with size n) that regulated by certain lncRNA, N is the total number of known lncRNA regulated genes in the whole genome, K is the number of genes regulated by that lncRNA in the whole genome.
We assumed that, if the known target genes of a given lncRNA are highly covered by a CEM with a significant p-value, the other genes in this CEM have high possibilities regulated by the given lncRNA. Thus, we used the smallest P-value for all possible lncRNAs as the p-value of the current bicluster and the relationships between lncRNA and genes in the bicluster are predicted.
To evaluate the performance of the new methods on the prediction of new relationships between lncRNA and genes, we randomly separate seed2 into two parts with equal size named seedpart1 and seedpart2, for multiple times. Then bi-clustering analysis will be performed on seedpart2 to predict co-expressed gene modules (CEMs). For seedpart1 we find its part which is covered by co-expressed gene modules (CEMs) from seedpart2. We calculate the cover ratios by the size of seedpart1 to be divided by the size of the covered part by CEMs generated from seedpart2. Also, we calculate the p-values for the coverage rates to present the statistical significance of them.
We choose several significant CEMs with sizes or conditions below 100, to conducted TF motif analysis. The promoter regions of the corresponding genes are inputted into the sub-routine findMotifs.pl of Homer [17], respectively. The script findMotifs.pl can firstly search for the upstream promoter sequences of a certain length automatically, and then perform motif finding on the promoters. For each run of findMotifs.pl on the datasets, we let the program output at most 5 top-ranking motifs, i.e. there will be up to 5 motifs discovered by findMotifs.pl for each CEMs. To evaluate the validity of the discovered motifs, findMotifs.pl automatically compares the similarity between the discovered motif profiles and the motif profiles archived in JASPAR [18] v2018 (http://jaspar.genereg.net/) under its default parameter setting. For each discovered motif having similarity with at least one motif archived in JASPAR, we present its motif logo as well as the information of its most similar motif in JASPAR.
All the known interactions between lncRNAs and genes are showcased in Figure 1A. The related data can be download from https://github.com/IvesG/sGavin.git data/LncReg0419 and more details are written in data/readme.txt. In figure 1A, dark-blue nodes represent LncRNAs, light-blue nodes represent proteins, pink edges represent interactions documented in Homo, green edges represent interactions documented in Mus, orange edges represent interactions documented in both Homo and Mus. Meanwhile, there are some labels on the edges, categorized based on regulatory mechanisms including PTL (post-translational regulation), TC (transcriptional regulation), PTC (post-transcriptional regulation), TL (translational regulation), and NS (not sure). The distribution above is displayed in Figure 1B and nearly three fourth (714/925) of them are identified at the transcriptional level. Other labels on the edges are categorized based on regulatory relationships including down, up, active and inactive. The distribution above is displayed in Figure 1C. The down relationships (575) are more than up relationships (308), and the proportion of active/inactive is scarce (4.5%).
Figure 1E showed the distribution of a number of genes regulated by each lncRNA. It can be found that most lncRNA (~78%) regulate less than 5 genes. To show the specific details of the number of genes regulated by each lncRNA, Figure 1D is made, each point in the Figure 1D reflect the number of lncRNA (horizontal coordinate) that regulate certain number of genes (longitudinal coordinates) e.g. the point with coordinate (4, 14) in Figure 1D indicate that there are 14 lncRNA and each of them regulate 4 genes. The lncRNA that regulate more genes in Figure 1D belongs to the more concentrated parts in Figure 1(A).
With the quantile-based discretization method and biclustering analysis, there are some co-expressed gene modules (CEMs) are found. The details of the way we identify CEMs are showcased in Figure 2C. Figure 2A shows the number of co-expressed gene modules(CEMs) we have got from seed1 and seed2 processed by max(min [19])-based (QUBIC1.0, [15,20]) and KL-based bi-clustering analysis (QUBIC2.0 [21]) respectively. And the distributions of numbers of genes and conditions for each CEM can be found in Figure 2D and Figure 2E. For instance, label seed1_1genes represent that Qubic1.0 is performed on seed1. To better illustrate the distributions, we have constrained the number of each gene and the number of each condition below 100 (in Figure 2B). It is found that the KL-based biclustering method tends to generate CEMs that contain fewer genes while more conditions than max(min [22])-based biclustering method from Figure 2D and Figure 2E. The difference between the results (whether the distribution of genes sizes and condition sizes or the number of CEMs that we have predicted) we get from seed1 and seed2 is subtle.
The proportion of CEMs that have significant P-values (below a pre-selected P-value cutoff) as well as proportions of the number of unique enriched lncRNA in each bicluster that belong to certain categories (i.e., number of lncRNA = 2, 3 or > 3) are calculated and shown in Figure 3A and Figure 3B. In the figures, Seed1_qubic1 represent the proportions from the results obtained using quantile discretization and using max(min [23])-based biclustering on seed 1, seed1_qubic2 represent using quantile discretization and using KL-based biclustering on seed1. In Figure 3A, it can be found that most CEMs have P-value more than 0.001 and seed1_qubic1 seems to have more significant P-value. Constrain P-value below 0.00001 and there is barely CEMs remained (less than 7%). In Figure 3B the majority of (more than70%) CEMs are with enriched lncRNA more than 3 and especially most (around 85%) of CEMs from seed1_qubic1.
For validation, we separated the HG + MG genes into two parts randomly and equally for 10 times and obtained 10 cover ratios correspondingly to check the accuracy of the previously predicted genes. The results of our validation are calculated and shown in Table 1. From Table 1 it can be found that all of the cover ratios are under 25% and the average ratio is 16.25%. We further calculated the p-value of the coverage rates. The results indicated that even the coverage date has a lot of space to be improved, the statistical significance of them are acceptable.
Group | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Ratio | 14.00% | 19.60% | 14.00% | 15.70% | 14.50% | 21.30% | 9.80% | 24.30% | 14.00% | 15.30% |
P-value | 7.58E-07 | 5.14E-14 | 7.58E-07 | 8.11E-09 | 2.56E-07 | 1.24E-16 | 5.37E-03 | 1.27E-21 | 7.58E-07 | 2.65E-08 |
Since lncRNA plays an important role in regulation, they should have cooperation with transcription factor [23,24]. Thus we conduct the analysis about the DNA binding sites of related to CEMs [19,22,25]. As described in the Method section, we choose five CEMs to conducted TF motif analysis. The corresponding gene list files each containing 3, 6, 14, 20 and 21 genes. The predicted motifs and the comparison between them and JASPAR motifs are listed in Table 2, along with the function of the target TFs. In Table 2, the second column has the name of lncRNA related with this CEMs and the p-value of their correlations; The third column contains the motif consensus by Homer; The fourth column provides TF names of the most similar motifs in JASPAR, along with the similarity scores in the fifth column. These TFs may have cooperation with corresponding lncRNAs. In the first column, all the P-value of the LncRNA from the CEM is below 0.01 and the least P-value is from LncRNA HOTAIR. The supplementary table S1 with more details, including and the logo of discovered motifs and the functions of corresponding TFs, can be downloaded by visiting the GitHub link.
lncRNA (p-value) | Homer motifs | JASPAR TFs | Scores | |
1 | FOXCUT 3.6e-3 |
AACCAVTTHDCG | TFCP2 | 0.64 |
TCCTATCACACR | MEIS2 | 0.62 | ||
TTTTHAAAGGGG | CHR | 0.67 | ||
ARTGGTTGTWGA | FOXJ2 | 0.58 | ||
GCAATCTCGC | IRF4 | 0.66 | ||
2 | ANCR 1.1e-3 |
AGGGTGACAG | SPZ1 | 0.80 |
GGTATCTTAC | GATA5 | 0.64 | ||
CTCATAGGAG | GCM1 | 0.65 | ||
TAAGTGAAAG | PRDM1 | 0.86 | ||
CTTTTGGAAC | CHR | 0.65 | ||
3 | 250-280 2.2e-4 |
WYTRTCTTTGCG | RXR | 0.61 |
TCTTACGG | ELK1 | 0.71 | ||
GGCAAGGA | SD | 0.76 | ||
GAGGTATGTT | TEAD1 | 0.70 | ||
TGCCGGGAGCGT | POL | 0.64 | ||
4 | HOXD-AS1 6.1e-3 |
CTCGAGTAGG | PB0114 | 0.63 |
GCCCCCTGCA | PB0076 | 0.74 | ||
ACGYMYATKYCC | GFY | 0.59 | ||
AGCGGGTT | PH | 0.68 | ||
AGGCGCCGCGCC | SP1 | 0.69 | ||
5 | HOTAIR 5e-6 |
TGGCGCAGCGCG | PB | 0.67 |
GTACAACTTT | PB | 0.66 | ||
CMTSTGTCWCYK | NeuroG2 | 0.66 | ||
GTGATCCATT | RHOXF1 | 0.68 | ||
GGTMGRRGTGMW | TBX20 | 0.58 |
In order to further evaluate the biological significance of the identified CEMs, we tested the enrichment of the genes in each CEM in Gene ontology terms and KEGG pathways using clusterProfiler package of R project BioConductor under q-value cutoff 0.05, of which the description of the GO terms and KEGG pathways that the CEMs are enriched in are presented in Table 3 and Table 4 respectively. And the supplementary table S2 with more details, including original and adjusted P-value, proportion of the matched genes, gene's ID, etc., can be downloaded on GitHub link.
LncRNA | ID | Description | q-value |
FOXCUT | GO:0033613 | transmembrane receptor protein tyrosine kinase activity | 1.2937E-02 |
GO:0033613 | activating transcription factor binding | 1.2937E-02 | |
GO:0019199 | transmembrane receptor protein kinase activity | 1.2937E-02 | |
GO:0001085 | RNA polymerase Ⅱ transcription factor binding | 2.0767E-02 | |
250-280 | GO:0003735 | structural constituent of ribosome | 3.1600E-06 |
GO:0003729 | mRNA binding | 1.1140E-03 | |
GO:0008483 | transaminase activity | 1.1140E-03 | |
GO:0048027 | mRNA 5'-UTR binding | 1.1140E-03 | |
GO:0016769 | transferase activity, transferring nitrogenous groups | 1.1140E-03 | |
GO:0045182 | translation regulator activity | 1.5826E-03 | |
GO:0030170 | pyridoxal phosphate binding | 1.5826E-03 | |
GO:0070279 | vitamin B6 binding | 1.5826E-03 | |
GO:0019843 | rRNA binding | 1.5826E-03 | |
GO:0019842 | vitamin binding | 3.1903E-03 | |
HOXD-AS1 | GO:0004714 | transmembrane receptor protein tyrosine kinase activity | 1.2937E-02 |
GO:0033613 | activating transcription factor binding | 1.2937E-02 | |
GO:0019199 | transmembrane receptor protein kinase activity | 1.2937E-02 | |
GO:0001085 | RNA polymerase Ⅱ transcription factor binding | 2.0767E-02 | |
HOTAIR | GO:0005109 | frizzled binding | 1.9097E-03 |
GO:0001227 | transcriptional repressor activity, RNA polymerase Ⅱ transcription regulatory region sequence-specific binding | 1.9097E-03 | |
GO:0001664 | G-protein coupled receptor binding | 1.9686E-03 | |
GO:0001078 | transcriptional repressor activity, RNA polymerase Ⅱ core promoter proximal region sequence-specific binding | 1.1196E-02 | |
GO:0008201 | heparin binding | 1.2885E-02 | |
GO:0005539 | glycosaminoglycan binding | 1.8790E-02 | |
GO:1901681 | sulfur compound binding | 1.9016E-02 | |
GO:0045236 | CXCR chemokine receptor binding | 2.1785E-02 | |
GO:0008301 | DNA binding, bending | 2.1785E-02 | |
GO:0001223 | transcription coactivator binding | 2.1785E-02 | |
GO:0042813 | Wnt-activated receptor activity | 2.1785E-02 | |
GO:0035198 | miRNA binding | 2.2807E-02 | |
GO:0017147 | Wnt-protein binding | 2.5258E-02 | |
GO:1990841 | promoter-specific chromatin binding | 2.5258E-02 | |
GO:0000982 | transcription factor activity, RNA polymerase Ⅱ core promoter proximal region sequence-specific binding | 2.5258E-02 | |
GO:0001221 | transcription cofactor binding | 2.5587E-02 |
LncRNA | ID | Description | q-value |
FOXCUT | hsa05216 | Thyroid cancer | 8.8587E-03 |
hsa04510 | Focal adhesion | 8.8587E-03 | |
hsa05205 | Proteoglycans in cancer | 8.8587E-03 | |
hsa05218 | Melanoma | 1.7683E-02 | |
hsa05214 | Glioma | 1.7683E-02 | |
hsa04151 | PI3K-Akt signaling pathway | 1.8745E-02 | |
hsa05215 | Prostate cancer | 1.8745E-02 | |
hsa01522 | Endocrine resistance | 1.8745E-02 | |
hsa04919 | Thyroid hormone signaling pathway | 2.2317E-02 | |
hsa04152 | AMPK signaling pathway | 2.2317E-02 | |
hsa04068 | FoxO signaling pathway | 2.4442E-02 | |
hsa04550 | Signaling pathways regulating pluripotency of stem cells | 2.5129E-02 | |
hsa05224 | Breast cancer | 2.5129E-02 | |
hsa04218 | Cellular senescence | 2.7471E-02 | |
hsa05225 | Hepatocellular carcinoma | 2.7471E-02 | |
hsa04530 | Tight junction | 2.7471E-02 | |
250-280 | hsa03010 | Ribosome | 2.9900E-05 |
hsa01210 | 2-Oxocarboxylic acid metabolism | 3.7322E-03 | |
hsa00220 | Arginine biosynthesis | 3.7322E-03 | |
hsa00250 | Alanine, aspartate and glutamate metabolism | 4.7849E-03 | |
hsa01230 | Biosynthesis of amino acids | 7.9156E-03 | |
HOXD-AS1 | hsa05216 | Thyroid cancer | 8.8587E-03 |
hsa04510 | Focal adhesion | 8.8587E-03 | |
hsa05205 | Proteoglycans in cancer | 8.8587E-03 | |
hsa05218 | Melanoma | 1.7683E-02 | |
hsa05214 | Glioma | 1.7683E-02 | |
hsa04151 | PI3K-Akt signaling pathway | 1.8745E-02 | |
hsa05215 | Prostate cancer | 1.8745E-02 | |
hsa01522 | Endocrine resistance | 1.8745E-02 | |
hsa04919 | Thyroid hormone signaling pathway | 2.2317E-02 | |
hsa04152 | AMPK signaling pathway | 2.2317E-02 | |
hsa04068 | FoxO signaling pathway | 2.4442E-02 | |
hsa04550 | Signaling pathways regulating pluripotency of stem cells | 2.5129E-02 | |
hsa05224 | Breast cancer | 2.5506E-02 | |
hsa04218 | Cellular senescence | 2.7471E-02 | |
hsa05225 | Hepatocellular carcinoma | 2.7471E-02 | |
hsa04530 | Tight junction | 2.7471E-02 | |
HOTAIR | hsa04310 | Wnt signaling pathway | 5.7295E-03 |
Within this study, we have developed a method for elucidating lncRNA-gene and transcription factor-lncRNA interactions using a biclustering approach. The method was performed on 2 breast cancer RNA-seq datasets from TCGA. The bicluster method allows for the identification of particular expression patterns across multiple datasets, indicating networks of lncRNA and gene interactions. The developed method will also provide a way for future lncRNA interaction studies. Certainly, the predict performance still far from satisfactory, which is not unexpected since we only used RNA-Seq data. Actually, the interaction mechanism between lncRNA and genes are far more complex, and more data should be involved if we want to capture the whole picture of them. We are planning to include some other data, like proteomics and chromatin accessibility information, to improve the prediction. Besides, the evaluation on the relationship between lncRNA and predicted CEMs also has the potential to be improved, e.g. calculating the adjusted P-value or overall P-value in place of the original P-values used in this study. In view of the application, we will work on more specific examples of the regulatory functions of some particular lncRNAs and identify some hypothesized mechanisms of these regulatory functions. Also, the further analysis of the difference of lncRNA related genes between tumor and normal samples could provide more information for studying the process and mechanism of cancer occurrence and development, e.g. determination of the stage of developed tumors, which will be our concern in the future research.
This work was supported by the National Nature Science Foundation of China (NSFC) [61772313 and 61432010], Young Scholars Program of Shandong University [YSPSDU, 2015WLJH19], the Innovation Method Fund of China (2018IM020200), and Shanghai Municipal Science and Technology Major Project (2018SHZDZX01) and ZHANGJIANGLAB. Qin Ma's work was supported by an R01 Award from the National Institute of General Medical Sciences of the National Institutes of Health [GM131399-01]. This work used the Extreme Science and Engineering Discovery Environment (XSEDE), which is supported by the National Science Foundation [ACI-1548562]. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health and the National Science Foundation.
All authors declare no conflicts of interest in this paper.
[1] |
Easton DF, Pharoah PD, Antoniou AC, et al. (2015) Gene-panel sequencing and the prediction of breast-cancer risk. N Engl J Med 372: 2243-2257. doi: 10.1056/NEJMsr1501341
![]() |
[2] |
Desmond A, Kurian AW, Gabree M, et al. (2015) Clinical Actionability of Multigene Panel Testing for Hereditary Breast and Ovarian Cancer Risk Assessment. JAMA Oncol 1: 943-951. doi: 10.1001/jamaoncol.2015.2690
![]() |
[3] |
Campuzano O, Sarquella-Brugada G, Mademont-Soler I, et al. (2014) Identification of Genetic Alterations, as Causative Genetic Defects in Long QT Syndrome, Using Next Generation Sequencing Technology. PLoS One 9: e114894. doi: 10.1371/journal.pone.0114894
![]() |
[4] |
Kapoor NS, Curcio LD, Blakemore CA, et al. (2015) Multigene Panel Testing Detects Equal Rates of Pathogenic BRCA1/2 Mutations and has a Higher Diagnostic Yield Compared to Limited BRCA1/2 Analysis Alone in Patients at Risk for Hereditary Breast Cancer. Ann Surg Oncol 22: 3282-3288. doi: 10.1245/s10434-015-4754-2
![]() |
[5] |
Antoniou A, Pharoah PDP, Narod S, et al. (2003) Average risks of breast and ovarian cancer associated with BRCA1 or BRCA2 mutations detected in case Series unselected for family history: a combined analysis of 22 studies. Am J Hum Genet 72: 1117-1130. doi: 10.1086/375033
![]() |
[6] |
Lovelock PK, Spurdle AB, Mok MT, et al. (2007) Identification of BRCA1 missense substitutions that confer partial functional activity: potential moderate risk variants? Breast Cancer Res 9: R82. doi: 10.1186/bcr1826
![]() |
[7] |
Spurdle AB, Whiley PJ, Thompson B, et al. (2012) BRCA1 R1699Q variant displaying ambiguous functional abrogation confers intermediate breast and ovarian cancer risk. J Med Genet 49: 525-532. doi: 10.1136/jmedgenet-2012-101037
![]() |
[8] |
Michailidou K, Hall P, Gonzalez-Neira A, et al. (2013) Large-scale genotyping identifies 41 new loci associated with breast cancer risk. Nat Genet 45: 353-361, 361e351-352. doi: 10.1038/ng.2563
![]() |
[9] | Meeks HD, Song H, Michailidou K, et al. (2016) BRCA2 Polymorphic Stop Codon K3326X and the Risk of Breast, Prostate, and Ovarian Cancers. J Natl Cancer Inst 108. |
[10] |
Antoniou AC, Casadei S, Heikkinen T, et al. (2014) Breast-cancer risk in families with mutations in PALB2. N Engl J Med 371: 497-506. doi: 10.1056/NEJMoa1400382
![]() |
[11] |
Southey MC, Teo ZL, Dowty JG, et al. (2010) A PALB2 mutation associated with high risk of breast cancer. Breast Cancer Res 12: R109. doi: 10.1186/bcr2796
![]() |
[12] | Southey MC, Teo ZL, Winship I (2013) PALB2 and breast cancer: ready for clinical translation! Appl Clin Genet 6: 43-52. |
[13] |
Teo ZL, Park DJ, Provenzano E, et al. (2013) Prevalence of PALB2 mutations in Australasian multiple-case breast cancer families. Breast Cancer Res 15: R17. doi: 10.1186/bcr3392
![]() |
[14] | Teo ZL, Sawyer SD, James PA, et al. (2013) The incidence of PALB2 c.3113G>A in women with a strong family history of breast and ovarian cancer attending familial cancer centres in Australia. Fam Cancer 12: 587-595. |
[15] |
Wong MW, Nordfors C, Mossman D, et al. (2011) BRIP1, PALB2, and RAD51C mutation analysis reveals their relative importance as genetic susceptibility factors for breast cancer. Breast Cancer Res Treat 127: 853-859. doi: 10.1007/s10549-011-1443-0
![]() |
[16] | Dansonka-Mieszkowska A, Kluska A, Moes J, et al. (2010) A novel germline PALB2 deletion in Polish breast and ovarian cancer patients. BMC Med Genet 11: 20. |
[17] |
Foulkes WD, Ghadirian P, Akbari MR, et al. (2007) Identification of a novel truncating PALB2 mutation and analysis of its contribution to early-onset breast cancer in French-Canadian women. Breast Cancer Res 9: R83. doi: 10.1186/bcr1828
![]() |
[18] |
Erkko H, Xia B, Nikkila J, et al. (2007) A recurrent mutation in PALB2 in Finnish cancer families. Nature 446: 316-319. doi: 10.1038/nature05609
![]() |
[19] |
Tischkowitz M, Capanu M, Sabbaghian N, et al. (2012) Rare germline mutations in PALB2 and breast cancer risk: a population-based study. Hum Mutat 33: 674-680. doi: 10.1002/humu.22022
![]() |
[20] |
Plon SE, Eccles DM, Easton D, et al. (2008) Sequence variant classification and reporting: recommendations for improving the interpretation of cancer susceptibility genetic test results. Hum Mutat 29: 1282-1291. doi: 10.1002/humu.20880
![]() |
[21] | Scott CL, Jenkins MA, Southey MC, et al. (2003) Average age-specific cumulative risk of breast cancer according to type and site of germline mutations in BRCA1 and BRCA2 estimated from multiple-case breast cancer families attending Australian family cancer clinics. Hum Genet 112: 542-551. |
[22] |
Tavtigian SV, Oefner PJ, Babikyan D, et al. (2009) Rare, evolutionarily unlikely missense substitutions in ATM confer increased risk of breast cancer. Am J Hum Genet 85: 427-446. doi: 10.1016/j.ajhg.2009.08.018
![]() |
[23] |
Le Calvez-Kelm F, Lesueur F, Damiola F, et al. (2011) Rare, evolutionarily unlikely missense substitutions in CHEK2 contribute to breast cancer susceptibility: results from a breast cancer family registry case-control mutation-screening study. Breast cancer research : BCR 13: R6. doi: 10.1186/bcr2810
![]() |
[24] | Thusberg J, Vihinen M (2009) Pathogenic or not? And if so, then how? Studying the effects of missense mutations using bioinformatics methods. Hum Mutat 30: 703-714. |
[25] | Zuckerkandl E (1965) [Remarks on the evolution of polynucleotides compared to that of polypeptides]. Bull Soc Chim Biol (Paris) 47: 1729-1730. |
[26] |
Jukes TH, King JL (1971) Deleterious mutations and neutral substitutions. Nature 231: 114-115. doi: 10.1038/231114a0
![]() |
[27] |
Jordan DM, Ramensky VE, Sunyaev SR (2010) Human allelic variation: perspective from protein function, structure, and evolution. Curr Opin Struct Biol 20: 342-350. doi: 10.1016/j.sbi.2010.03.006
![]() |
[28] |
Ng PC, Henikoff S (2001) Predicting deleterious amino acid substitutions. Genome Res 11: 863-874. doi: 10.1101/gr.176601
![]() |
[29] |
Ferrer-Costa C, Gelpi JL, Zamakola L, et al. (2005) PMUT: a web-based tool for the annotation of pathological mutations on proteins. Bioinformatics 21: 3176-3178. doi: 10.1093/bioinformatics/bti486
![]() |
[30] |
Chun S, Fay JC (2009) Identification of deleterious mutations within three human genomes. Genome Res 19: 1553-1561. doi: 10.1101/gr.092619.109
![]() |
[31] |
Adzhubei IA, Schmidt S, Peshkin L, et al. (2010) A method and server for predicting damaging missense mutations. Nat Methods 7: 248-249. doi: 10.1038/nmeth0410-248
![]() |
[32] | Tavtigian SV, Deffenbaugh AM, Yin L, et al. (2006) Comprehensive statistical study of 452 BRCA1 missense substitutions with classification of eight recurrent substitutions as neutral. J Med Genet 43: 295-305. |
[33] |
Bromberg Y, Rost B (2007) SNAP: predict effect of non-synonymous polymorphisms on function. Nucleic Acids Res 35: 3823-3835. doi: 10.1093/nar/gkm238
![]() |
[34] |
Goldgar DE, Easton DF, Deffenbaugh AM, et al. (2004) Integrated evaluation of DNA sequence variants of unknown clinical significance: application to BRCA1 and BRCA2. Am J Hum Genet 75: 535-544. doi: 10.1086/424388
![]() |
[35] |
Easton DF, Deffenbaugh AM, Pruss D, et al. (2007) A systematic genetic assessment of 1,433 sequence variants of unknown clinical significance in the BRCA1 and BRCA2 breast cancer-predisposition genes. Am J Hum Genet 81: 873-883. doi: 10.1086/521032
![]() |
[36] |
Goldgar DE, Easton DF, Byrnes GB, et al. (2008) Genetic evidence and integration of various data sources for classifying uncertain variants into a single model. Hum Mutat 29: 1265-1272. doi: 10.1002/humu.20897
![]() |
[37] |
Spurdle AB, Lakhani SR, Healey S, et al. (2008) Clinical classification of BRCA1 and BRCA2 DNA sequence variants: the value of cytokeratin profiles and evolutionary analysis--a report from the kConFab Investigators. J Clin Oncol 26: 1657-1663. doi: 10.1200/JCO.2007.13.2779
![]() |
[38] |
Spurdle AB, Healey S, Devereau A, et al. (2012) ENIGMA--evidence-based network for the interpretation of germline mutant alleles: an international initiative to evaluate risk and clinical significance associated with sequence variation in BRCA1 and BRCA2 genes. Hum Mutat 33: 2-7. doi: 10.1002/humu.21628
![]() |
[39] |
Vallee MP, Francy TC, Judkins MK, et al. (2012) Classification of missense substitutions in the BRCA genes: a database dedicated to Ex-UVs. Hum Mutat 33: 22-28. doi: 10.1002/humu.21629
![]() |
[40] |
Couch FJ, Rasmussen LJ, Hofstra R, et al. (2008) Assessment of functional effects of unclassified genetic variants. Hum Mutat 29: 1314-1326. doi: 10.1002/humu.20899
![]() |
[41] | Wu K, Hinson SR, Ohashi A, et al. (2005) Functional evaluation and cancer risk assessment of BRCA2 unclassified variants. Cancer Res 65: 417-426. |
[42] |
Mitui M, Nahas SA, Du LT, et al. (2009) Functional and computational assessment of missense variants in the ataxia-telangiectasia mutated (ATM) gene: mutations with increased cancer risk. Hum Mutat 30: 12-21. doi: 10.1002/humu.20805
![]() |
[43] |
Roeb W, Higgins J, King MC (2012) Response to DNA damage of CHEK2 missense mutations in familial breast cancer. Hum Mol Genet 21: 2738-2744. doi: 10.1093/hmg/dds101
![]() |
[44] |
Kato S, Han SY, Liu W, et al. (2003) Understanding the function-structure and function-mutation relationships of p53 tumor suppressor protein by high-resolution missense mutation analysis. Proc Natl Acad Sci U S A 100: 8424-8429. doi: 10.1073/pnas.1431692100
![]() |
[45] |
Iversen ES, Jr., Couch FJ, Goldgar DE, et al. (2011) A computational method to classify variants of uncertain significance using functional assay data with application to BRCA1. Cancer Epidemiol Biomarkers Prev 20: 1078-1088. doi: 10.1158/1055-9965.EPI-10-1214
![]() |
[46] | Guidugli L, Pankratz VS, Singh N, et al. (2013) A classification model for BRCA2 DNA binding domain missense variants based on homology-directed repair activity. Cancer Res 73: 265-275. |
[47] |
Rahman N (2014) Mainstreaming genetic testing of cancer predisposition genes. Clin Med 14: 436-439. doi: 10.7861/clinmedicine.14-4-436
![]() |
[48] | http://apps.ccge.medschl.cam.ac.uk/consortia/bcac/index.html. |
[49] |
French JD, Ghoussaini M, Edwards SL, et al. (2013) Functional variants at the 11q13 risk locus for breast cancer regulate cyclin D1 expression through long-range enhancers. Am J Hum Genet 92: 489-503. doi: 10.1016/j.ajhg.2013.01.002
![]() |
[50] |
Goldgar DE, Healey S, Dowty JG, et al. (2011) Rare variants in the ATM gene and risk of breast cancer. Breast Cancer Res 13: R73. doi: 10.1186/bcr2919
![]() |
[51] |
Gorringe KL, Choong DY, Visvader JE, et al. (2008) BARD1 variants are not associated with breast cancer risk in Australian familial breast cancer. Breast Cancer Res Treat 111: 505-509. doi: 10.1007/s10549-007-9799-x
![]() |
[52] |
Seal S, Thompson D, Renwick A, et al. (2006) Truncating mutations in the Fanconi anemia J gene BRIP1 are low-penetrance breast cancer susceptibility alleles. Nat Genet 38: 1239-1241. doi: 10.1038/ng1902
![]() |
[53] |
Schrader KA, Masciari S, Boyd N, et al. (2011) Germline mutations in CDH1 are infrequent in women with early-onset or familial lobular breast cancers. J Med Genet 48: 64-68. doi: 10.1136/jmg.2010.079814
![]() |
[54] |
Bogdanova N, Schurmann P, Waltes R, et al. (2008) NBS1 variant I171V and breast cancer risk. Breast Cancer Res Treat 112: 75-79. doi: 10.1007/s10549-007-9820-4
![]() |
[55] |
Tommiska J, Seal S, Renwick A, et al. (2006) Evaluation of RAD50 in familial breast cancer predisposition. Int J Cancer 118: 2911-2916. doi: 10.1002/ijc.21738
![]() |
[56] |
Rennert G, Lejbkowicz F, Cohen I, et al. (2012) MutYH mutation carriers have increased breast cancer risk. Cancer 118: 1989-1993. doi: 10.1002/cncr.26506
![]() |
[57] |
Sharif S, Moran A, Huson SM, et al. (2007) Women with neurofibromatosis 1 are at a moderately increased risk of developing breast cancer and should be considered for early screening. J Med Genet 44: 481-484. doi: 10.1136/jmg.2007.049346
![]() |
[58] |
Pradella LM, Evangelisti C, Ligorio C, et al. (2014) A novel deleterious PTEN mutation in a patient with early-onset bilateral breast cancer. BMC Cancer 14: 70. doi: 10.1186/1471-2407-14-70
![]() |
[59] |
Figer A, Kaplan A, Frydman M, et al. (2002) Germline mutations in the PTEN gene in Israeli patients with Bannayan-Riley-Ruvalcaba syndrome and women with familial breast cancer. Clin Genet 62: 298-302. doi: 10.1034/j.1399-0004.2002.620407.x
![]() |
[60] |
Meindl A, Hellebrand H, Wiek C, et al. (2010) Germline mutations in breast and ovarian cancer pedigrees establish RAD51C as a human cancer susceptibility gene. Nat Genet 42: 410-414. doi: 10.1038/ng.569
![]() |
[61] |
Loveday C, Turnbull C, Ruark E, et al. (2012) Germline RAD51C mutations confer susceptibility to ovarian cancer. Nat Genet 44: 475-476; author reply 476. doi: 10.1038/ng.2224
![]() |
[62] | Boardman LA, Couch FJ, Burgart LJ, et al. (2000) Genetic heterogeneity in Peutz-Jeghers syndrome. Hum Mutat 16: 23-30. |
[63] |
Evans DG, Birch JM, Thorneycroft M, et al. (2002) Low rate of TP53 germline mutations in breast cancer/sarcoma families not fulfilling classical criteria for Li-Fraumeni syndrome. J Med Genet 39: 941-944. doi: 10.1136/jmg.39.12.941
![]() |
[64] |
Mouchawar J, Korch C, Byers T, et al. (2010) Population-based estimate of the contribution of TP53 mutations to subgroups of early-onset breast cancer: Australian Breast Cancer Family Study. Cancer Res 70: 4795-4800. doi: 10.1158/0008-5472.CAN-09-0851
![]() |
[65] |
Park DJ, Lesueur F, Nguyen-Dumont T, et al. (2012) Rare mutations in XRCC2 increase the risk of breast cancer. Am J Hum Genet 90: 734-739. doi: 10.1016/j.ajhg.2012.02.027
![]() |
[66] |
Park DJ, Tao K, Le Calvez-Kelm F, et al. (2014) Rare mutations in RINT1 predispose carriers to breast and Lynch syndrome-spectrum cancers. Cancer Discov 4: 804-815. doi: 10.1158/2159-8290.CD-14-0212
![]() |
[67] |
Kiiski JI, Pelttari LM, Khan S, et al. (2014) Exome sequencing identifies FANCM as a susceptibility gene for triple-negative breast cancer. Proc Natl Acad Sci U S A 111: 15172-15177. doi: 10.1073/pnas.1407909111
![]() |
1. | Sen Yang, Yan Wang, Shuangquan Zhang, Xuemei Hu, Qin Ma, Yuan Tian, NCResNet: Noncoding Ribonucleic Acid Prediction Based on a Deep Resident Network of Ribonucleic Acid Sequences, 2020, 11, 1664-8021, 10.3389/fgene.2020.00090 | |
2. | Lijun Dou, Xiaoling Li, Hui Ding, Lei Xu, Huaikun Xiang, Is There Any Sequence Feature in the RNA Pseudouridine Modification Prediction Problem?, 2020, 19, 21622531, 293, 10.1016/j.omtn.2019.11.014 | |
3. | Guangmin Liang, Jin Wu, Lei Xu, A prognosis-related based method for miRNA selection on liver hepatocellular carcinoma prediction, 2021, 91, 14769271, 107433, 10.1016/j.compbiolchem.2020.107433 | |
4. | Hua-Sheng Chiu, Sonal Somvanshi, Ting-Wen Chen, Pavel Sumazin, 2021, Chapter 22, 978-1-0716-1696-3, 263, 10.1007/978-1-0716-1697-0_22 | |
5. | Juexin Wang, Yan Wang, Towards Machine Learning in Molecular Biology, 2020, 17, 1551-0018, 2822, 10.3934/mbe.2020156 | |
6. | Consolata Gakii, Paul O. Mireji, Richard Rimiru, Graph Based Feature Selection for Reduction of Dimensionality in Next-Generation RNA Sequencing Datasets, 2022, 15, 1999-4893, 21, 10.3390/a15010021 |
Group | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Ratio | 14.00% | 19.60% | 14.00% | 15.70% | 14.50% | 21.30% | 9.80% | 24.30% | 14.00% | 15.30% |
P-value | 7.58E-07 | 5.14E-14 | 7.58E-07 | 8.11E-09 | 2.56E-07 | 1.24E-16 | 5.37E-03 | 1.27E-21 | 7.58E-07 | 2.65E-08 |
lncRNA (p-value) | Homer motifs | JASPAR TFs | Scores | |
1 | FOXCUT 3.6e-3 |
AACCAVTTHDCG | TFCP2 | 0.64 |
TCCTATCACACR | MEIS2 | 0.62 | ||
TTTTHAAAGGGG | CHR | 0.67 | ||
ARTGGTTGTWGA | FOXJ2 | 0.58 | ||
GCAATCTCGC | IRF4 | 0.66 | ||
2 | ANCR 1.1e-3 |
AGGGTGACAG | SPZ1 | 0.80 |
GGTATCTTAC | GATA5 | 0.64 | ||
CTCATAGGAG | GCM1 | 0.65 | ||
TAAGTGAAAG | PRDM1 | 0.86 | ||
CTTTTGGAAC | CHR | 0.65 | ||
3 | 250-280 2.2e-4 |
WYTRTCTTTGCG | RXR | 0.61 |
TCTTACGG | ELK1 | 0.71 | ||
GGCAAGGA | SD | 0.76 | ||
GAGGTATGTT | TEAD1 | 0.70 | ||
TGCCGGGAGCGT | POL | 0.64 | ||
4 | HOXD-AS1 6.1e-3 |
CTCGAGTAGG | PB0114 | 0.63 |
GCCCCCTGCA | PB0076 | 0.74 | ||
ACGYMYATKYCC | GFY | 0.59 | ||
AGCGGGTT | PH | 0.68 | ||
AGGCGCCGCGCC | SP1 | 0.69 | ||
5 | HOTAIR 5e-6 |
TGGCGCAGCGCG | PB | 0.67 |
GTACAACTTT | PB | 0.66 | ||
CMTSTGTCWCYK | NeuroG2 | 0.66 | ||
GTGATCCATT | RHOXF1 | 0.68 | ||
GGTMGRRGTGMW | TBX20 | 0.58 |
LncRNA | ID | Description | q-value |
FOXCUT | GO:0033613 | transmembrane receptor protein tyrosine kinase activity | 1.2937E-02 |
GO:0033613 | activating transcription factor binding | 1.2937E-02 | |
GO:0019199 | transmembrane receptor protein kinase activity | 1.2937E-02 | |
GO:0001085 | RNA polymerase Ⅱ transcription factor binding | 2.0767E-02 | |
250-280 | GO:0003735 | structural constituent of ribosome | 3.1600E-06 |
GO:0003729 | mRNA binding | 1.1140E-03 | |
GO:0008483 | transaminase activity | 1.1140E-03 | |
GO:0048027 | mRNA 5'-UTR binding | 1.1140E-03 | |
GO:0016769 | transferase activity, transferring nitrogenous groups | 1.1140E-03 | |
GO:0045182 | translation regulator activity | 1.5826E-03 | |
GO:0030170 | pyridoxal phosphate binding | 1.5826E-03 | |
GO:0070279 | vitamin B6 binding | 1.5826E-03 | |
GO:0019843 | rRNA binding | 1.5826E-03 | |
GO:0019842 | vitamin binding | 3.1903E-03 | |
HOXD-AS1 | GO:0004714 | transmembrane receptor protein tyrosine kinase activity | 1.2937E-02 |
GO:0033613 | activating transcription factor binding | 1.2937E-02 | |
GO:0019199 | transmembrane receptor protein kinase activity | 1.2937E-02 | |
GO:0001085 | RNA polymerase Ⅱ transcription factor binding | 2.0767E-02 | |
HOTAIR | GO:0005109 | frizzled binding | 1.9097E-03 |
GO:0001227 | transcriptional repressor activity, RNA polymerase Ⅱ transcription regulatory region sequence-specific binding | 1.9097E-03 | |
GO:0001664 | G-protein coupled receptor binding | 1.9686E-03 | |
GO:0001078 | transcriptional repressor activity, RNA polymerase Ⅱ core promoter proximal region sequence-specific binding | 1.1196E-02 | |
GO:0008201 | heparin binding | 1.2885E-02 | |
GO:0005539 | glycosaminoglycan binding | 1.8790E-02 | |
GO:1901681 | sulfur compound binding | 1.9016E-02 | |
GO:0045236 | CXCR chemokine receptor binding | 2.1785E-02 | |
GO:0008301 | DNA binding, bending | 2.1785E-02 | |
GO:0001223 | transcription coactivator binding | 2.1785E-02 | |
GO:0042813 | Wnt-activated receptor activity | 2.1785E-02 | |
GO:0035198 | miRNA binding | 2.2807E-02 | |
GO:0017147 | Wnt-protein binding | 2.5258E-02 | |
GO:1990841 | promoter-specific chromatin binding | 2.5258E-02 | |
GO:0000982 | transcription factor activity, RNA polymerase Ⅱ core promoter proximal region sequence-specific binding | 2.5258E-02 | |
GO:0001221 | transcription cofactor binding | 2.5587E-02 |
LncRNA | ID | Description | q-value |
FOXCUT | hsa05216 | Thyroid cancer | 8.8587E-03 |
hsa04510 | Focal adhesion | 8.8587E-03 | |
hsa05205 | Proteoglycans in cancer | 8.8587E-03 | |
hsa05218 | Melanoma | 1.7683E-02 | |
hsa05214 | Glioma | 1.7683E-02 | |
hsa04151 | PI3K-Akt signaling pathway | 1.8745E-02 | |
hsa05215 | Prostate cancer | 1.8745E-02 | |
hsa01522 | Endocrine resistance | 1.8745E-02 | |
hsa04919 | Thyroid hormone signaling pathway | 2.2317E-02 | |
hsa04152 | AMPK signaling pathway | 2.2317E-02 | |
hsa04068 | FoxO signaling pathway | 2.4442E-02 | |
hsa04550 | Signaling pathways regulating pluripotency of stem cells | 2.5129E-02 | |
hsa05224 | Breast cancer | 2.5129E-02 | |
hsa04218 | Cellular senescence | 2.7471E-02 | |
hsa05225 | Hepatocellular carcinoma | 2.7471E-02 | |
hsa04530 | Tight junction | 2.7471E-02 | |
250-280 | hsa03010 | Ribosome | 2.9900E-05 |
hsa01210 | 2-Oxocarboxylic acid metabolism | 3.7322E-03 | |
hsa00220 | Arginine biosynthesis | 3.7322E-03 | |
hsa00250 | Alanine, aspartate and glutamate metabolism | 4.7849E-03 | |
hsa01230 | Biosynthesis of amino acids | 7.9156E-03 | |
HOXD-AS1 | hsa05216 | Thyroid cancer | 8.8587E-03 |
hsa04510 | Focal adhesion | 8.8587E-03 | |
hsa05205 | Proteoglycans in cancer | 8.8587E-03 | |
hsa05218 | Melanoma | 1.7683E-02 | |
hsa05214 | Glioma | 1.7683E-02 | |
hsa04151 | PI3K-Akt signaling pathway | 1.8745E-02 | |
hsa05215 | Prostate cancer | 1.8745E-02 | |
hsa01522 | Endocrine resistance | 1.8745E-02 | |
hsa04919 | Thyroid hormone signaling pathway | 2.2317E-02 | |
hsa04152 | AMPK signaling pathway | 2.2317E-02 | |
hsa04068 | FoxO signaling pathway | 2.4442E-02 | |
hsa04550 | Signaling pathways regulating pluripotency of stem cells | 2.5129E-02 | |
hsa05224 | Breast cancer | 2.5506E-02 | |
hsa04218 | Cellular senescence | 2.7471E-02 | |
hsa05225 | Hepatocellular carcinoma | 2.7471E-02 | |
hsa04530 | Tight junction | 2.7471E-02 | |
HOTAIR | hsa04310 | Wnt signaling pathway | 5.7295E-03 |
Group | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Ratio | 14.00% | 19.60% | 14.00% | 15.70% | 14.50% | 21.30% | 9.80% | 24.30% | 14.00% | 15.30% |
P-value | 7.58E-07 | 5.14E-14 | 7.58E-07 | 8.11E-09 | 2.56E-07 | 1.24E-16 | 5.37E-03 | 1.27E-21 | 7.58E-07 | 2.65E-08 |
lncRNA (p-value) | Homer motifs | JASPAR TFs | Scores | |
1 | FOXCUT 3.6e-3 |
AACCAVTTHDCG | TFCP2 | 0.64 |
TCCTATCACACR | MEIS2 | 0.62 | ||
TTTTHAAAGGGG | CHR | 0.67 | ||
ARTGGTTGTWGA | FOXJ2 | 0.58 | ||
GCAATCTCGC | IRF4 | 0.66 | ||
2 | ANCR 1.1e-3 |
AGGGTGACAG | SPZ1 | 0.80 |
GGTATCTTAC | GATA5 | 0.64 | ||
CTCATAGGAG | GCM1 | 0.65 | ||
TAAGTGAAAG | PRDM1 | 0.86 | ||
CTTTTGGAAC | CHR | 0.65 | ||
3 | 250-280 2.2e-4 |
WYTRTCTTTGCG | RXR | 0.61 |
TCTTACGG | ELK1 | 0.71 | ||
GGCAAGGA | SD | 0.76 | ||
GAGGTATGTT | TEAD1 | 0.70 | ||
TGCCGGGAGCGT | POL | 0.64 | ||
4 | HOXD-AS1 6.1e-3 |
CTCGAGTAGG | PB0114 | 0.63 |
GCCCCCTGCA | PB0076 | 0.74 | ||
ACGYMYATKYCC | GFY | 0.59 | ||
AGCGGGTT | PH | 0.68 | ||
AGGCGCCGCGCC | SP1 | 0.69 | ||
5 | HOTAIR 5e-6 |
TGGCGCAGCGCG | PB | 0.67 |
GTACAACTTT | PB | 0.66 | ||
CMTSTGTCWCYK | NeuroG2 | 0.66 | ||
GTGATCCATT | RHOXF1 | 0.68 | ||
GGTMGRRGTGMW | TBX20 | 0.58 |
LncRNA | ID | Description | q-value |
FOXCUT | GO:0033613 | transmembrane receptor protein tyrosine kinase activity | 1.2937E-02 |
GO:0033613 | activating transcription factor binding | 1.2937E-02 | |
GO:0019199 | transmembrane receptor protein kinase activity | 1.2937E-02 | |
GO:0001085 | RNA polymerase Ⅱ transcription factor binding | 2.0767E-02 | |
250-280 | GO:0003735 | structural constituent of ribosome | 3.1600E-06 |
GO:0003729 | mRNA binding | 1.1140E-03 | |
GO:0008483 | transaminase activity | 1.1140E-03 | |
GO:0048027 | mRNA 5'-UTR binding | 1.1140E-03 | |
GO:0016769 | transferase activity, transferring nitrogenous groups | 1.1140E-03 | |
GO:0045182 | translation regulator activity | 1.5826E-03 | |
GO:0030170 | pyridoxal phosphate binding | 1.5826E-03 | |
GO:0070279 | vitamin B6 binding | 1.5826E-03 | |
GO:0019843 | rRNA binding | 1.5826E-03 | |
GO:0019842 | vitamin binding | 3.1903E-03 | |
HOXD-AS1 | GO:0004714 | transmembrane receptor protein tyrosine kinase activity | 1.2937E-02 |
GO:0033613 | activating transcription factor binding | 1.2937E-02 | |
GO:0019199 | transmembrane receptor protein kinase activity | 1.2937E-02 | |
GO:0001085 | RNA polymerase Ⅱ transcription factor binding | 2.0767E-02 | |
HOTAIR | GO:0005109 | frizzled binding | 1.9097E-03 |
GO:0001227 | transcriptional repressor activity, RNA polymerase Ⅱ transcription regulatory region sequence-specific binding | 1.9097E-03 | |
GO:0001664 | G-protein coupled receptor binding | 1.9686E-03 | |
GO:0001078 | transcriptional repressor activity, RNA polymerase Ⅱ core promoter proximal region sequence-specific binding | 1.1196E-02 | |
GO:0008201 | heparin binding | 1.2885E-02 | |
GO:0005539 | glycosaminoglycan binding | 1.8790E-02 | |
GO:1901681 | sulfur compound binding | 1.9016E-02 | |
GO:0045236 | CXCR chemokine receptor binding | 2.1785E-02 | |
GO:0008301 | DNA binding, bending | 2.1785E-02 | |
GO:0001223 | transcription coactivator binding | 2.1785E-02 | |
GO:0042813 | Wnt-activated receptor activity | 2.1785E-02 | |
GO:0035198 | miRNA binding | 2.2807E-02 | |
GO:0017147 | Wnt-protein binding | 2.5258E-02 | |
GO:1990841 | promoter-specific chromatin binding | 2.5258E-02 | |
GO:0000982 | transcription factor activity, RNA polymerase Ⅱ core promoter proximal region sequence-specific binding | 2.5258E-02 | |
GO:0001221 | transcription cofactor binding | 2.5587E-02 |
LncRNA | ID | Description | q-value |
FOXCUT | hsa05216 | Thyroid cancer | 8.8587E-03 |
hsa04510 | Focal adhesion | 8.8587E-03 | |
hsa05205 | Proteoglycans in cancer | 8.8587E-03 | |
hsa05218 | Melanoma | 1.7683E-02 | |
hsa05214 | Glioma | 1.7683E-02 | |
hsa04151 | PI3K-Akt signaling pathway | 1.8745E-02 | |
hsa05215 | Prostate cancer | 1.8745E-02 | |
hsa01522 | Endocrine resistance | 1.8745E-02 | |
hsa04919 | Thyroid hormone signaling pathway | 2.2317E-02 | |
hsa04152 | AMPK signaling pathway | 2.2317E-02 | |
hsa04068 | FoxO signaling pathway | 2.4442E-02 | |
hsa04550 | Signaling pathways regulating pluripotency of stem cells | 2.5129E-02 | |
hsa05224 | Breast cancer | 2.5129E-02 | |
hsa04218 | Cellular senescence | 2.7471E-02 | |
hsa05225 | Hepatocellular carcinoma | 2.7471E-02 | |
hsa04530 | Tight junction | 2.7471E-02 | |
250-280 | hsa03010 | Ribosome | 2.9900E-05 |
hsa01210 | 2-Oxocarboxylic acid metabolism | 3.7322E-03 | |
hsa00220 | Arginine biosynthesis | 3.7322E-03 | |
hsa00250 | Alanine, aspartate and glutamate metabolism | 4.7849E-03 | |
hsa01230 | Biosynthesis of amino acids | 7.9156E-03 | |
HOXD-AS1 | hsa05216 | Thyroid cancer | 8.8587E-03 |
hsa04510 | Focal adhesion | 8.8587E-03 | |
hsa05205 | Proteoglycans in cancer | 8.8587E-03 | |
hsa05218 | Melanoma | 1.7683E-02 | |
hsa05214 | Glioma | 1.7683E-02 | |
hsa04151 | PI3K-Akt signaling pathway | 1.8745E-02 | |
hsa05215 | Prostate cancer | 1.8745E-02 | |
hsa01522 | Endocrine resistance | 1.8745E-02 | |
hsa04919 | Thyroid hormone signaling pathway | 2.2317E-02 | |
hsa04152 | AMPK signaling pathway | 2.2317E-02 | |
hsa04068 | FoxO signaling pathway | 2.4442E-02 | |
hsa04550 | Signaling pathways regulating pluripotency of stem cells | 2.5129E-02 | |
hsa05224 | Breast cancer | 2.5506E-02 | |
hsa04218 | Cellular senescence | 2.7471E-02 | |
hsa05225 | Hepatocellular carcinoma | 2.7471E-02 | |
hsa04530 | Tight junction | 2.7471E-02 | |
HOTAIR | hsa04310 | Wnt signaling pathway | 5.7295E-03 |