The Gene Ontology (GO) (Gene Ontology Consortium 2008) is a multidisciplinary initiative created with the aim of providing a controlled vocabulary of terms for describing and annotating gene product data. GO is a component of the Open Biological and Biomedical Ontologies (OBO) for shared use of vocabularies across different biological and medical domains.
Go covers three domains:
- Cellular component (C), which corresponds to the parts of a cell or its extracellular environment.
- Molecular function (F), that collects the elemental activities of a gene product at the molecular level, such as binding or catalysis.
- Biological process (P), which describes operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units, i.e. cells, tissues, organs and organisms.
By default, the device processing BLAST outputs available in the GPRO pipeline for “BLAST and HMM search Pipeline plus GO-annotation” automatically adds GO annotations to your BLAST results. However, if you do not need to perform BLAST searches, you can use this tool for adding GO terms plus KEGG enzyme codes (EC) to your data, provided that they are row-to-row summarized in a CSV and accompanied by at least an additional column with sequence IDs (such as those of GenBank, Uniprot, Interpro, etc.) that GPRO can process and appropriately associate them with respective GO IDs and terms.
The EC is a number assigned to a type of enzyme according to a scheme of standardized enzyme nomenclature found in ENZYME, the enzyme nomenclature database, and KEGG: Kyoto Enclyclopaedia of Genes and Genomes. InterProScan is an integrated database of predictive protein signatures (Quevillon et al. 2005) used for the classification and automatic annotation of proteins and genomes, available at EBI. To read more about GO initiative, go to geneontology.org.
The Clusters of Orthologous Groups (COGs) of prokaryotic proteins and its eukaryotic-specific version (EuKaryotic Orthologous Groups, KOG) (Tatusovet et al. 2003) are two collections of proteins classified in ortholog groups of different species (or paralogs derived from duplication of a single gene within a genome).
Append GO terms
Once a CSV containing your gene data has been uploaded to the worksheet, you can add new columns containing the GO terms, GO IDs, Enzyme Codes (ECs) and InterProScan IDs by clicking on the tab “Append GO terms”.
Evidence code weights
GPRO follows an algorithm of GO annotation inspired in the one previously applied by BLAST2GO (Conesa et al. 2005). Using this tab, you can configure distinct weigths to the evidence codes of your GO annotation.
GO Annotation results can also be visualized as directed acyclic graph (DAG) by selecting the option Display Graph available in the submenu of the Annotation Tab. By clicking on any GO term within the DAG.
GO depth statistics
By selecting the tab “Annotation” -> right submenu “GO Depth statistics” the tool allows you to obtain Bar or Pie chart figures constructed based on any of the three “Cellular component”, “Biological process” or “Molecular function” domains with distinct filters considering number of sequences, distance decay, node score, DAG level, graphic type and colour. A graphical representation will appear in the working space layout of GPRO (below). By right clicking on the image, you can export it as an image or as a matrix table in a csv for further graphical representation with any other tool (e.g. Excel).