Estimate effect size of GO over-representation — estimate_go

This is a crude function to estimate the effect size of GO over-representation i.e. we know a term is over-represented, but we want to estimate the effect size/how over-represented it is. This function should be run after get_enriched_go.

estimate_go_overrep(obj, pwf, gene2cat)

Arguments

obj: data.frame containing goseq results as generated by get_enriched_go or goseq.
pwf: data.frame as used in get_enriched_go or goseq.
gene2cat: data.frame as used in get_enriched_go or goseq.

Value

Returns obj with an extra column added called adj_overrep. This column is calculated for each GO term by:

numDEInCat / numInCat / (avgTermWeight / avgNonTermWeight) / (totalDEFeatures / totalFeatures)

where:

numDEInCat is the number of differentially expressed genes (aka. proteins) assigned to that GO term.
numInCat is the total number of genes (aka. proteins) annotated to that GO term.
avgTermWeight is the average pwf$pwf value for all the differentially expressed genes that were assigned to that GO term.
avgNonTermWeight is the average pwf$pwf for all the other genes supplied in pwf.
totalDEFeatures is the total number of differentially expressed genes indicated in pwf.
totalFeatures is the total number of genes indicated in pwf, i.e. the number of rows.