The Universal Protein Resource (UniProt) is a well known database for protein sequences and cross reference data.
The UniProt is composed by: 1- UniProt Knowledgebase (UniProtKB), 2- UniProt Reference Clusters (UniRef) and 3- UniProt Archive (UniParc).
U-MAGE integrates data from UniProtKB, without fragmented sequences and UniRef50.
UniRef50 clusters are build from UniRef90, which is build from UniRef100 clusters. The number associated to each numemclature refers to protein similarity between recruited protein and the seed sequence.
For more information reguarding UniRef creation, please go to UniRef Web page or click here.
The Gene Ontology (GO) is an important project in bioinformatics with main purpose of standardizing the representation of genes and gene products along species. GO provides an up to date and controlled vocabulary of terms divided into 3 main hierarchies: Function, Process and component.
U-MAGE considers the propagation of only Funcional GO terms with "is_a" relationships and no 'NOT' Qualifier.
UniProtKB database have specialists from the Gene Ontology Annotation (GOA) project propagating GO terms to UniProtKB identifiers in well-studied model organisms by manual or electronic methods, assigning evidence codes for each annotation.
Among all available evidence codes, only Inferred from Electronic Annotation (IEA) is not assigned by a curator. Manually-assigned evidence codes are grouped in four categories: experimental, computational analysis, author statements, and curatorial statements.
A guide for all displayed GO evidence codes can be find here.
PANTHER(Protein Analysis Through Evolutionary Relationships) is a curated database with genes functions from 82 organims also classified by expert biologists.
One of the classifications used is based on terms derived from GO. PANTHER protein data can be linked to the UniProtKB database.
Local matrices for UniRef50 entrie were built storing BLAST coverage values (alignment size / protein size) in order to have bi-directional values for each alignment, all against all alignments. In other words, they represent how much each protein in an UniRef50 is related to all the others.
We constructed matrices for all UniRef50 with any sequence(s) from all 82 Panther organims (version 8.0).
For a list of all 82 organims, please click here.
For a given UniProt_ac input, related UniRef50 entries above an optional coverage cutoff are selected and their GO terms compared.
Figure 1 below is showing a portion of UniRef50_Q9BYK8 matrix, highlighting in ocean the cells at first row where Q9BYK8 coverage compared to four other sequences is higher than 80%. Additionaly the highlighted cells in orange at first column represents the opposite relation, which is, the coverage of all other sequences compared to Q9BYK8 above 80% coverage.
The rows are used for selecting sequences and build the first table at Results page, and columns to build the second table.
The fisrt table shows GO terms from input sequence respectively absent all others above given coverage cutoff.
The second table shows GO terms present in other sequences but absent in input sequence, when formers are related to latter above given coverage cutoff.
Whenever no Functional GO terms are found, the sequence UniProt_ac identifier will be displayed separately.
All divergent Functional GO terms among sequences will be displayed and colors used to identify GO terms from the same hierarchy.
Blue lines are diplayed above GO terms in both tables whenever it is a leaf-most term. A leaf-most term is determined by comparisons between all GO terms associated the query UniProt sequence, and no more specific term was found, but it does not mean that this term has no child(en) at complete GO hierarchy.
Green lines are diplayed beneath GO terms in first table and beneath UniProt_ac in second tables whenever the term was manually-assigned (IXP | IDA | IPI | IMP | IGI | IEP evidence codes).
Letter 'P' from "Parent" is displayed whenever more general terms for the one represented in the column are present and a complete list is visible with mouse over. Complementarily, letter 'C' from "Child" is displayed whenever more specific terms are present, indicating that this propagation may not be essential. Again, complete list is visible with mouse over. See Figure 2.
Figure 1: Example matrix UniRef50_Q9BYK8. GO terms propagation 'to' ocean and 'from' orange considering 80% coverage cutoff.
Figure 2: U-MAGE propagation example for 6 functional hierarchies. Blue lines above GO cell indicate leaf-most GO term and green lines below cell indicate terms manually-assigned. GO terms colors represents same direct hierarchy. A: Example of first table at result page. GO:0005102 is not leaf-most but was manually-assigned; GO:0005131 and GO:0005143 are leaf-most but computationally-assigned; GO:0019901 and GO:0042169 are leaf-most and manually-assigned; B: Example of second table at result page. All GO terms, GO:0043560, GO:0031702, GO:0033130, GO:0051428, GO: 0008022 and GO:0043548 are leaf-most and manually-assigned for Q62689. Pink GO terms have the same Parent term "Receptor binding" at both tables.
Developed by Rafael Guedes
Questions please send to firstname.lastname@example.org
Web page better viewed with Google chrome