Enter
The example is human gene expression scale.
1. Species selection
2. Analyze data
In this example, in the predefined gene set file, the gene name is the symbol number of the gene, so the symbol number of gene is selected for the analysis data.
3. Predefined gene set files
3.1 Predefined gene files
A predefined set is a set of genes with unknown functions to be analyzed. The specific requirements of the file are as follows: the input form file must be in TXT format. You can choose to open the data in Excel and save it as "Text file (*.txt)".The first column is Gene ID, followed by the sample name and corresponding expression quantity. Only two groups of samples can be analyzed.‘Tab’ is used to separate Numbers and columns. The sample file is as follows:
Gene Name |
ALL_1 |
ALL_2 |
ALL_3 |
AML_1 |
AML_2 |
AML_3 |
TACC2 |
|
|
|
|
|
|
DYT1 |
|
|
|
|
|
|
32385_at |
|
|
|
|
|
|
3.2 Group files
A grouping file is used to describe the grouping of samples in a predefined gene set. The specific requirements of the document are as follows:
The input table file must be in TXT format. You can choose to open the data in Excel and save it as "Text File (*.txt)"The first column of the file is the sample name, and the second column is the group name. Note: You will still need to fill out this document (one sample for each group) even if there is no duplicate experiment. Sample files are as follows:‘Tab’ is used to separate Numbers and columns. The sample file is as follows:
ALL-1 |
ALL |
ALL_2 |
ALL |
ALL_3 |
ALL |
AML_1 |
AML |
AML_2 |
AML |
AML_3 |
AML |
3.3 Compare group files
Define grouping information. The specific requirements of the document are as follows:
The input table file must be in txt format. You can choose to open the data in Excel and save it as "Text file (*.txt)".
The default is that the control group is first and the treatment group is second, such as the first column is ALL (control) and the second column is AML (treatment).When comparing to the treatment group compare to the control group. Sample files are as follows:
ALL |
AML |
4 Parameter selection
In this example,all parameters are ticked either recommended or default.
Out put
According to the input file and selected parameters, the program outputs the original analysis results of GSEA software. You can view the ES figure and leading edge of the significantly enriched top n.
1.Form
Table: GSEA results summary
Upregulated in class:The gene set is highly expressed in a certain group (eg: the table indicates high expression in the ALL group)
GeneSet:When the score is compared, the set of genes is generated beforehand
Enrichment Score(ES):The enrichment fraction of a custom gene set in a pregenerated gene set
Normalized Enrichment Score:The normalized ES value
Nominal p-value:P value obtained by substitution test to judge the reliability of the result
FDR q-value:P value corrected by FDR method after multiple hypothesis testing
FWER p-V alue:P value after Bonferonni correction
2. ES illustration
Above the red area:The change curve of ES value in the process of accumulation
Among the red area:The position of a member the target gene set (marked by black bars) in all gene sequences
Below box area:The true value of the index (log2 value of the multiple of difference here), in terms of the genes sorted by index from high to low.
3. Interpretation of GSEA detailed results
Column 1: Gene name
Column 2: Gene names from gene sets
Column 5: The position of the gene in the custom sequencing list
Column 6: the ratio of the sequencing quantity of the gene, such as foldchange value
Column 7: accumulated ES value Column
Column8: Whether the gene belongs to the core gene, "Yes" refers to the gene that has made a major contribution to the ES of the gene set
4. Interpretation of heat map results
The figure shows the distribution of the expression of the gene under the gene set in all samples.Each column represents a sample, each row represents a gene, and the gene expression goes from low to high, and the color expression goes from blue to red.
5. Interpretation of table results
Column 1: Gene set name
Column 2: Link, copy to web page to open details of available gene sets
Column 3: Number of genes in the gene set
Columns 4~5: enriched fraction ES value and corrected ES value (Z score method)
Column 6: Original P value
Column 7: p value corrected by FDR method, that is, Q value
Column 8: Names of genes that are enriched in this gene set
To understand the principle and detailed analysis of GSEA tools, please click
https://www.omicshare.com/forum/forum.php?mod=viewthread&tid=5044&highlight=GSEA
1. Function
The differential genes screened by the fixed threshold method tend to screen out the ones with weak changes, so that the number of differential genes is very small and the contribution of a certain gene set to the phenotype cannot be clearly defined in various pathways. GSEA analysis of gene prediction filter (GeneSetEnrichmentAnalysis), can be more comprehensive for a functional unit (pathway, GOterm or other) adjustment to explain.
Dynamic GSEA analysis not only retains the original ES diagram, but also integrates the heat map below the ES diagram to display the distribution of predefined gene sets and ES scores in different functional units (channels, goterms or others) at the same time, so as to allow users to quickly select target functional units.
2. Species selection
Users can check species based on the species they study.
3.Data analyze
In the analysis of GSEA by Genedenovo, the genetic data type supports two methods: gene ID number and gene Symbol number. Users can choose the input method according to their personal needs.
4. Database selection
GSEA is a gene set - based enrichment analysis method. A gene set is a set of genes that are previously classified into different sets of genes based on function or some other principle. A gene set can be all the genes in a pathway or GO term. Gene Denovo offers different set of genes to choose from depending on the analysis provided by the user.
Due to the different species included in each database, different databases will be retrieved when different species are checked. When the user inputs the gene ID, KEGG, Go, DO and Reactome, can be checked.
Which DO database is mainly used to describe gene function and disease related information, biological medicine purpose is to provide a consistent terminology and human disease phenotype characteristics and related medical vocabulary disease concept, when user input species to the person, the user can check the DO database, for the remaining species not checked when DO database.
The Reactome database is a collection of human responses and biological pathways. Currently, the database has been updated to include animal species such as mice and rats, but plant species are still not included. Therefore, the database cannot be checked when selecting plant species.
When the user enters a gene Symbol number, the user can select it in the M Sig DB database as needed. M Sig DB database is a gene set database provided by GSEA official website, which contains 8 classification modes:
H: A supergene set consisting of multiple known gene sets;
C1: Contains a collection of genes corresponding to the different cytoband regions on each human chromosome.Secondary classification based on non-chromosomal coding
C2: Known database gene set;
C3: A collection of genes including miRNA target genes and transcription factor binding regions;
C4: A collection of genes that are predicted by computer software, mainly those associated with cancer
C5: GO gene set
C6: oncogene set
C7: Immune gene set
5. Selection of analysis parameters
5.1 Predefine gene set sequencing method
The following methods can be selected to calculate the numerical value of each gene and sequence the genes according to the numerical value:
Normalized difference/standard deviation correction:
T-test: t test between groups:
Ratio_of_Classes:The ratio of the expression quantity between groups, difference multiple:
Log2 _ ratio _ of _ class: the multiple of expression difference:
5.2 Gene set range
If the number of genes in the functional gene set is less than (default 15) or greater than (default 500), the threshold value will be filtered out and excluded from the analysis
5.3 Output ES graph number
According to the set digital output analysis ES diagram, the default output significantly enriched the results of the first 20 channels. GSEA software uses p-value < 0.05 and q-value < 0.25 as significance thresholds by default. For faster analysis results, the custom range is set to less than 100.
6. Modification
6.1 Parameter adjustment
To change different databases to show ES diagram of predefined gene sets in different databases. Click
6.2 ES overview dynamic diagram
To realize the interactive function of the chart. Users can click to adjust the graph.
mean zoom in the graphics.
mean narrow the graphics. This change is only for the view effect and has no effect on the size of the exported graph.
mean global modification of graphics, personalized modification of graphic details, including color, title, font, graphic transparency.
mean export the graphics, user can modify the image format (SVG format, PNG format) and set the size of the graphics (unit: PX). Click "Graphics Preview" to preview the graphics. Click "Download Pictures" to download the images to the default download location of the browser.
6.3 Graphics preview
1. The change curve of the pre-defined gene set in the process of ES value accumulation in different functional units (pathway, GOterm or other), and different functional units are distinguished by colors.
2. Full name of functional unit (channel, GOterm or other)
3. The positions of predefined gene set members in functional units are shown below the figure