  chisq

1. Function:

Chi-square test is a very versatile method of hypothesis testing for counting data. It belongs to the category of non-parametric testing. It uses the Pearson Chi-square test algorithm to calculate. It mainly compares two or more groups and two categorical variables. The basic idea of ​​correlation analysis is to compare the degree of agreement between actual observations and theoretical inferred values ​​or the problem of goodness of fit. This tool can analyze the relevance of two or more groups based on the input data.

2. Scope of application:

Perform correlation analysis on the count data. For example, you want to know whether the expression of a trait gene in different cancers is significantly different, and whether the recurrence ratio of cancer patients after surgery is related to gender, age, tumor part, surgical method and other factors. For scientific issues such as relevance, statistical tests are carried out for individual counts.

3. Applicable conditions:

(1) Random sample data

(2) The theoretical frequency of the chi-square test cannot be too small. The comparison of two independent samples can be divided into the following three situations:

① All theoretical numbers T ≥ 5, and the total sample size n ≥ 40, use the Pearson chi-square test

② If the theoretical number 1 ＜ T ＜ 5, and n ≥ 40, use the continuity correction chi-square test

③ If there is a theoretical number T <1, or n <40, use Fisher’s test

(3) Chi-square test application conditions of R×C table (R ≥ 3, C ≥ 3):

① Cells with theoretical number T <5 in the R×C table cannot exceed 1/5;

② There can be no cells with theoretical number T <1

4. Input:

Note: The row names are different groups, and the column names are categorical variables, mainly to test whether a categorical variable is significantly different in different groups. The tool supports 2×2 tables, 2×c tables, and r×c tables.

E.g:

Test whether there are significant differences in the expression of a trait gene in different cancers

 Positive Negative Small Cell Lung Cancer 12 24 Non-small cell lung cancer 56 18

If there are multiple chi-square tests to be run, you can click "Add File" and select the data to be tested. This tool supports uploading up to 20 files. At the same time, the file upload error, you can click "reduce files" to delete the wrong file.

5. Output:

There are two output files, one is the result of each test file, and the other is a summary table of all test results, the result file named ChisqResults.txt. The output file has 5 columns, of which

The first column is the table name

The second column is df, which is the degree of freedom

The third column is the chi-square value

The fourth column is pvalue, which is the probability of false positives. Generally, pvalue is less than 0.05 if it is significant

The fifth column is the significance mark, * means significant (P <0.05), ** means more significant (P <0.01), *** means extremely significant (P <0.001), --- means not significant (P> 0.05) )

1. Sample data

Example 1:

To test whether there is a difference in gene expression of functional gene EPCAM between small cell lung cancer and non-small cell lung cancer.

Sample data:

 EPCAM Positive Negative Small Cell Lung Cancer 12 24 Non-small cell lung cancer 6 18

Example 2:

To test whether there is a difference in the gene expression of the functional gene CEACAM6 between small cell lung cancer and non-small cell lung cancer.

Sample data:

 CEACAM6 Positive Negative Small Cell Lung Cancer 12 24 Non-small cell lung cancer 56 18

2. Parameter setting 3. Result output

The contents of the ChisqResults.txt file are as follows: The analysis results showed that there was no difference in the expression of EPCAM gene in the two cancers, Pvalue=0.49 > 0.05

The expression of CEACAM6 gene in the two cancers is significantly different, Pvalue=1.79e-05 < 0.05.