- What is GTOP ?
GTOP is a database consisting of data
analyses of proteins identified by various genome projects. This database
mainly uses sequence homology analyses and features extensive utilization
of information on three-dimensional structures.
We use the following methods:
- Prediction of 3D structure
Sequence homology search of PDB, using REVERSE PSI-BLAST.
- Functional predictions (family classifications)
Sequence homology search of Swiss-Prot, a well-annotated
sequence database, with the use of BLAST.
- Other analytical methods
We are also carrying out the following analyses:
- Motif Analysis(PROSITE)
- Family classification(Pfam)
- Prediction of transmembrane helix domains(SOSUI)
- Prediction of coiled-coil regions(Multicoil)
- Repetitive sequence analysis(RepAlign)
Please take a look at the Organism section for a list of the species.
- Results of analysis
For each gene, a screen display like the one below is presented.
This is the figure of the Homo Sapiens ENSG00000196876.3 gene
(
view the ENSG00000196876.3 page)
- Top of the view
At the top of the view, a species name (1), the gene name used in GTOP (2)
, annotation derived from the genome project (3), Swiss-Prot entry (4) are shown.
- Homologs --- 5
The presence or absence of homologs in the organisms analyzed by GTOP is shown. They are divided into 3 kingdoms, i.e., archaebacteria, eubacteria,
eukaryotes, and viruses. Each fraction represents [the number of species with homologs] / [the total number of species]. Upon clicking on of the kingdom names, a list of species
and homologs is sppear.
- Icons for 3D structures --- 6
The small icons in the illustration are structures of
representative proteins in the superfamilies that the protein is predicted
to belong. The numeral to the right of each icon is the SCOP family number.
- Bar-Display --- 7
- SECSTR: Predicted secondary structures based on Reverse PSI-BLAST are presented. --- A
Alpha-helices are shown in magenta,while beta sheets are painted yellow.
- PSIPRED: The secondary structure prediction by PSIPRED is as that of SecStr. --- B
Alpha-helices are shown in magenta,while beta sheets are painted yellow.
- DISOPRED: Intrinsically disordered regions predicted by the DISOPRED program. --- C
Disordered regions are shown in gray.
- BLT:PDB: Aligned regions by BLAST from PDB. --- D
The region of similarity, the PDB code, the E-value, and the sequence identity are shown.
In this example, the 550-785 residue region of thrA has similarity to PDB:1ebfA with an E-value and identity of 2e-36 and 36.8%, respectively.
- RPS:PDB: Regions Predicted by Reverse PSI-BLAST from the PDB database. --- E
The regions of similarity, the PDB code, the E-value, and the sequence identity are shown.
- RPS:SCOP: Regions predicted by Reverse PSI-BLAST from the SCOP database. --- F
The regions of similarity, the PDB code, the E-value, and the sequence identity are shown.
- HMM:SCOP: Regions predicted by Hidden Markov Model from the SCOP database. The regions of similarity, the PDB code, the E-value, and the sequence identity are shown. --- G
The regions of similarity, PDB code, SCOP code, distribution of SCOP family in the genomes (crickable asterisk), E-value, and sequence identity are shown.
- RPS:PFM: Pfam domains predicted by Reverse PSI-BLAST from the PFAM database. --- H
The regions of similarity, the PFAM code, PFAM description, the E-value, and the sequence identity are shown.
- HMM:PFM: Pfam domains predicted by Hidden Markov Modeling. The regions of similarity, the PFAM code, PFAM description, the E-value, the sequence identity (%), and the fraction of the PFAM domain aligned are shown. --- I
The regions of similarity, PFAM code, distribution of PFAM domain in the genomes (crickable asterisk), Pfam name, E-value, sequence identity, and length of similarity region / length of Pfam domain are shown.
- BLT:SWISS: Aligned regions by BLAST from Swiss-prot. --- J
The regions of similarity, the Swiss-Prot code, the E-value, and the sequence identity are shown.
- PROS: Regions Predicted by Prosite search. --- K
Information of the PROSITE motif is shown.
- TM: The transmembrane helix domain(s) predicted by SOSUI. --- L
- COIL: Coiled-coil region(s)s. --- M
- REPEAT: Repeat sequences. --- N
- SEG: Low complexity regions predicted by SEG. --- O
- EXONS: Exon boundaries. The total number of exons and the region of each exon with phase are shown in brackets. --- P
- Bottom of the view
- Function:
Functional classifications by Gene Ontology are displayed. --- 8
- SeqInfo:
Information of the bar - display is shown on the amino acid sequence. --- 9
- AminoSeq:
The amino acid sequence in the FASTA format. --- 10
- See neighboring genes:
A list of neighboring ORFs is displayed. This
information is useful in considering operons. --- 11
- Links:
Links to other related databases. --- 12
- Abbreviations:
Abbreviations used in this display. They are also used in the master file (see below) as headers. --- 13
- Viewing predicted 3D structure
If you click the PDB code, 1b3qA in the above example, you can view
a structure alignment
(view the 1b3qA alignment).
To display the corresponding 3D structure, you have three choices:
images, Chime plug-in, and rasmol.
To view predicted 3Dstructures using Chime plug-in and rasmol, you must configure
your own computer. The procedures are described
in the page How to view 3D structure.
When information on the exon-intron structures is available, marks in different colors are displayed
to distinguish exons.


When the status of [Exon Display] is [ON], the three-dimensional structure is displayed in the same color scheme
as in the alignment screen.
- Master file
In the GTOP database, analytical results are stored as master-files. The information of the master file is shown after the
graphical representation. Please refer to What is master file ? for the details.
- Search engines
Full text search against the master file is available. In the ORGANISM box, you can select a kingdom or viruses,
as well as a particular organism. As many as three query words can be specified.
Entries with all of the specified words are searched. Wildcards can be used to conduct search with partial words.
The following four kinds of search can also be conducted:
- Advanced Keyword Search
You can specify the organism(s) in which to carry out this search.
- Family Ranking Search
You can get a list of frequently appearing families in each organism,
in terms of SCOP, Pfam, Prosite, or Membrane.
- Sequence homology search
You can enter an amino acid sequence to search all the
protein sequences in GTOP for it`s homologs.
- Sequence Text Search
You can enter an amino acid sequence to search all the
protein sequences in GTOP for those containing the exact sequence.
- Summary page
From the summary page of various statistical data of whole genomes
you can obtain a list of all the proteins, that of proteins whose
structures are predicted by PSI-BLAST, and
those of the numbers of
predicted structures belonging to SCOP
, Pfam and ProSite
families. Links to corresponding genes can be found.
- Miscellaneous