不容错过! 会员充值优惠今晚24点结束
普氏分析工具新上线!
极坐标柱状图工具新上线!
不容错过! 会员充值优惠今晚24点结束

gsea















choosefile*   example


choosefile*   example


choosefile*   example




-







Documentation

To understand the principle and detailed analysis of GSEA tools, please click

https://www.omicshare.com/forum/forum.php?mod=viewthread&tid=5044&highlight=GSEA

1. Function

Traditional enrichment of hypergeometric inspection requires significant difference gene set data. However, the differential genes screened by the fixed threshold method tend to screen out the ones with weak changes, so that the number of differential genes is very small and the contribution of a certain gene set to the phenotype cannot be clearly defined in various pathways.

GSEA analysis of gene prediction filter (Gene Set Enrichment Analysis), can effectively make up for the enrichment of traditional analysis of minor gene digging, according to the problems such as insufficient effective information more fully to a functional unit (pathways, GO term or other) adjustment to explain.

2. Species type

3. Analyze data

In the analysis of GSEA by Gene Denovo, the genetic data type supports two methods: gene ID and gene Symbol. Users can choose the input method according to their personal needs.

4. Database selection

GSEA is a gene set - based enrichment analysis method. A gene set is a set of genes that are previously classified into different sets of genes based on function or some other principle. A gene set can be all the genes in a pathway or GO term. Gene Denovo offers different set of genes to choose from depending on the analysis provided by the user.

Due to the different species included in each database, different databases will be retrieved when different species are checked. When the user inputs the gene ID, KEGG, GO, DO and Reactome, can be checked.

Which DO database is mainly used to describe gene function and disease related information, biological medicine purpose is to provide a consistent terminology and human disease phenotype characteristics and related medical vocabulary disease concept, when user input species to the person, the user can check the DO database, for the remaining species not checked when DO database.

The Reactome database is a collection of human responses and biological pathways. Currently, the database has been updated to include animal species such as mice and rats, but plant species are still not included. Therefore, the database cannot be checked when selecting plant species.

When the user enters a gene Symbol number, the user can select it in the M Sig DB database as needed. M Sig DB database is a gene set database provided by GSEA official website, which contains 8 classification modes:

H: A supergene set consisting of multiple known gene sets;

C1: Contains a collection of genes corresponding to the different cytoband regions on each human chromosome.Secondary classification based on non-chromosomal coding

C2: Known database gene set;

C3: A collection of genes including miRNA target genes and transcription factor binding regions;

C4: A collection of genes that are predicted by computer software, mainly those associated with cancer

C5: GO gene set

C6: oncogene set

C7: Immune gene set

5. Selection of analysis parameters

5.1 Predefine gene set sequencing method

 

The following methods can be selected to calculate the numerical value of each gene and sequence the genes according to the numerical value:

Normalized difference/standard deviation correction:

T-test: t test between groups:

Ratio_of_Classes:The ratio of the expression quantity between groups, difference multiple:

Log2 _ ratio _ of _ class: the multiple of expression difference:

5.2 Gene set range

If the number of genes in the functional gene set is less than (default 15) or greater than (default 500), the threshold value will be filtered out and excluded from the analysis

5.3 Output ES graph number

According to the set digital output analysis ES diagram, the default output significantly enriched the results of the first 20 channels. GSEA software uses p-value < 0.05 and q-value < 0.25 as significance thresholds by default. For faster analysis results, the custom range is set to less than 100.

 

Enter

The example is human gene expression scale.

1. Species selection

 

2. Analyze data

In this example, in the predefined gene set file, the gene name is the symbol number of the gene, so the symbol number of gene is selected for the analysis data.

   

 

3. Predefined gene set files

3.1 Predefined gene files

A predefined set is a set of genes with unknown functions to be analyzed. The specific requirements of the file are as follows: the input form file must be in TXT format. You can choose to open the data in Excel and save it as "Text file (TAB delimited)(*.txt)".

The first column is Gene ID, followed by the sample name and corresponding expression quantity. Only two groups of samples can be analyzed.

‘Tab’ is used to separate Numbers and columns. The sample file is as follows

(In this example, Signal2Noise is selected for sequencing the predefined gene set)

Gene Name

ALL_1

ALL_2

ALL_3

AML_1

AML_2

AML_3

TACC2

           

DYT1

           

32385_at

           

3.2 Group files

A grouping file is used to describe the grouping of samples in a predefined gene set. The specific requirements of the document are as follows:

The input table file must be in TXT format. You can choose to open the data in Excel and save it as "Text File (*.txt)"

The first column of the file is the sample name, and the second column is the group name. Note: You will still need to fill out this document (one sample for each group) even if there is no duplicate experiment. Sample files are as follows:

‘Tab’ is used to separate Numbers and columns. The sample file is as follows:

ALL_1

ALL

ALL_2

ALL

ALL_3

ALL

AML_1

AML

AML_2

AML

AML_3

AML

3.3 Compare group files

Define grouping information. The specific requirements of the document are as follows:

The input table file must be in txt format. You can choose to open the data in Excel and save it as "Text file(*.txt)".

The default is that the control group is first and the treatment group is second, such as the first column is ALL (control) and the second column is AML (treatment).When comparing to the treatment group compare to the control group. Sample files are as follows:

ALL

AML

 

 

4. Parameter selection

    In this example, all parameters are ticked either recommended or default.

 

Output

 

Out put

According to the input file and selected parameters, the program outputs the original analysis results of GSEA software. You can view the ES figure and leading edge of the significantly enriched top n. Click in the folder ,view the summary results in web form.

1 Form reading

Table: GSEA results summary

Upregulated in class:The gene set is highly expressed in a certain group (eg: the table indicates high expression in the ALL group)

GeneSet:When the score is compared, the set of genes is generated beforehand

Enrichment Score(ES):The enrichment fraction of a custom gene set in a pregenerated gene set

Normalized Enrichment Score:The normalized ES value

Nominal p-value:P value obtained by substitution test to judge the reliability of the result

FDR q-value:P value corrected by FDR method after multiple hypothesis testing

FWER p-V alue:P value after Bonferonni correction

2 ES illustration

Above the red area:The change curve of ES value in the process of accumulation

Among the red area:The position of a member the target gene set (marked by black bars) in all gene sequences

Below box area:The true value of the index (log2 value of the multiple of difference here), in terms of the genes sorted by index from high to low.

 

 

3 Interpretation of GSEA detailed results

Column 1: Gene name

Column 2: Gene names from gene sets

Column 5: The position of the gene in the custom sequencing list

Column 6: the ratio of the sequencing quantity of the gene, such as foldchange value

Column 7: accumulated ES value Column

8: Whether the gene belongs to the core gene, "Yes" refers to the gene that has made a major contribution to the ES of the gene set

 

4 Interpretation of heat map results

The figure shows the distribution of the expression of the gene under the gene set in all samples.

Each column represents a sample, each row represents a gene, and the gene expression goes from low to high, and the color expression goes from blue to red.

5 Interpretation of table results

Column 1: Gene set name

Column 2: Link, copy to web page to open details of available gene sets

Column 3: Number of genes in the gene set

Columns 4~5: enriched fraction ES value and corrected ES value (Z score method)

Column 6: Original P value

Column 7: p value corrected by FDR method, that is, Q value

Column 8: Names of genes that are enriched in this gene set