The results in the tables below report the performance of PROTEUS2's combined predictors as compared to other programs.
However, since each of the four 1D predictors used both
de novo predictions and homology-based methods, we also assessed:
- the performance of the de novo predictors alone,
- the performance of the homology-based structure predictors alone, and
- the performance of the combined predictors.
When measuring the performance of any 3D-to-2D mapping prediction, the standard approach is to iteratively remove each
sequence from the database and to perform the prediction with that sequence. This prevents simply predicting the structure
of the query protein using the query itself.
It is also important to report the per-residue accuracy (Q2), as well as the percentage of query proteins that returned an answer
(coverage).
For signal peptide, transmembrane helix and transmembrane beta-barrel predictions, we evaluated both the per-residue prediction
accuracy (Q2) as well as the ability of the predictors to correctly identify proteins with or without these structural features
(sensitivity/specificity).
Secondary structure predictions of soluble proteins were evaluated using only the Q3 and SOV scores.
An assessment of PROTEUS2's performance for homology modeling was also performed and compared with structures generated by
3D-JigSaw and SWISS-MODEL.
Table 1a: To assess PROTEUS2's signal peptide prediction, a data set of 2587 complete protein sequences
with experimentally confirmed signal peptides as well as a data set of 16,618 cytoplasmic proteins (with no signal
peptides in their sequence) was extracted from the PPT-DB (1). The complete list of 2587 proteins with signal peptide
annotation (in pseudo-FASTA format) is accessible here (
Gram+,
Gram-,
Eukaryotic). The complete list of 16,618 non-signal
containing protein sequences (in FASTA format) is accessible
here. The signal peptide set
included proteins from each of the three major classes of organisms (Gram+, Gram-, and Eukaryote). PROTEUS2 was compared
against SubLoc (2) and SignalP 3.0 (3), using their default values, by calculating the per-residue prediction accuracy (Q2).
Signal Peptide Prediction Performance (PPT-DB SPdb data set) |
Program or Server |
Q2 (Gram-) |
Q2 (Gram-+) |
PROTEUS2 | 95% | 94% |
SubLoc | 91% | 86% |
SignalP 3.0 | 96% | 97% |
Table 1b: To assess transmembrane helix prediction accuracy, PROTEUS2 was assessed against the 2247 proteins
(globular and transmembrane) used in TMH-Benchmark (4). The complete list of 2247 proteins with membrane annotations is
available
here. In this assessment we compared the performance of PROTEUS2 to TMHMM (5), HMMTOP (6),
DAS (7) using the per-residue prediction accuracy (Q2), as well as by measuring the number of false positives (number of proteins
identified as having a transmembrane region by the program but actually being non-membrane proteins).
Transmembrane Helix Prediction Performance (TMH Benchmark test set) |
Program or Server |
Q2 |
# False positives |
PROTEUS2 | 91% | 0 |
TMHMM | 80% | 1 |
HMMTOP | 80% | 6 |
DAS | 72% | 16 |
Table 1c: Because there is some uncertainty in the transmembrane assignments for some of the TMH Benchmark's
high-resolution data set, we performed a second evaluation using the experimentally confirmed data set of transmembrane
helices derived from PPT-DB (1). Specifically, a data set of 275 complete protein sequences with experimentally confirmed
transmembrane helices was extracted from the PPT-DB (1). Acting as a negative control, a data set of 16,618 globular
(non-membrane proteins) was also extracted from the PPT-DB (1). The complete list of 275 proteins with membrane helix
annotations (in pseudo-FASTA format) is accessible
here. The complete
list of 16,618 non-membrane protein sequences (in FASTA format) is accessible
here. In this
assessment, we compared the performance of PROTEUS2 to TMHMM (5) using the per-residue prediction accuracy (Q2), as well as
by measuring the number of false positives (number of proteins identified as having a transmembrane region by the program
but actually being non-membrane proteins).
Transmembrane Helix Prediction Performance (PPT-DB-TMH test set) |
Program or Server |
Q2 |
# False neg. (TMH vs. glob) |
PROTEUS2 | 87% | 0 |
TMHMM | 82% | 8 |
Table 1d: The assessment of transmembrane beta-barrel detection was done using an experimentally
determined set of 49 transmembrane beta-barrel and 16,618 water-soluble, globular proteins obtained from PPT-DB (1).
The complete list of 49 proteins with membrane barrel annotation (in pseudo-FASTA format) is accessible
here. The complete list of 16,618 non-barrel protein sequences
(in FASTA format) is accessible
here. PROTEUS2 was compared against TMB-Hunt (8) for
its ability to identify membrane barrel from non-membrane proteins using sensitivity and specificity measures.
Transmembrane Beta Barrel Detection Performance (PPT-DB "All" protein data set) |
Program or Server |
Sensitivity |
Specificity |
PROTEUS2 | 100% | 100% |
TMB-Hunt | 78% | 99% |
Table 1e: The assessment of transmembrane beta-barrel beta sheet detection was done using a set of
49 experimentally determined transmembrane beta-barrel proteins obtained from PPT-DB (1). The complete list of 49
proteins with membrane barrel annotation (in pseudo-FASTA format) is available
here.
In this assessment, PROTEUS2 was compared against Pred-TMBB (9) using the per-residue prediction accuracy (Q2).
Transmembrane Beta Strand Prediction Performance (PPT-DB -TMB test set) |
Program or Server |
Q2 |
Combined (PROTEUS2) | 86% |
PRED-TMBB | 73% |
*
The number in parentheses indicates the percentage of sequences in the sample that had matching homologs in the database (coverage).
Table 1f: The assessment of PROTEUS2's performance on globular proteins or non-membrane secondary structure
prediction was done using two approaches: 1) through a "blind" test and comparison on the latest EVA (10) training set
(1644 sequence-unique proteins) and 2) through analysis of 125 randomly chosen proteins that were recently solved by X-ray
and NMR. The complete list of 1644 proteins with sequence and secondary structure annotation (in pseudo-FASTA format) is
accessible
here. The complete list of 125 randomly selected protein
sequences with sequence and secondary structure annotation (in pseudo-FASTA format) is accessible
here. The 125 protein set was chosen to simulate a more realistic case of
predicting the secondary structure of sequences found in a proteome (which tend not to be sequence-unique). In both cases,
the Q3 and SOV scores were calculated for each protein in the test sets. Results were compared to Porter (11), PSIPred (12),
PHD (13), and JNET (14).
Non-membrane Secondary Structure Prediction Performance (EVA Test Set) |
Program or Server |
Q3 (%) |
SOV (%) |
PROTEUS2 | 81 | 82 |
Porter | 77 | 76 |
JNET | 72 | 73 |
PSIPred | 77 | 78 |
Non-membrane Secondary Structure Prediction Performance (Test Set of 125) |
Program or Server |
Q3 (%) |
SOV (%) |
PROTEUS2 | 88 | 90 |
Porter | 76 | 81 |
JNET | 73 | 77 |
PSIPred | 76 | 78 |
Table 1g: The assessment of PROTEUS2's performance on homology modeling was made by comparing the program
to 3D JigSaw (15) and Swiss-Model (16). 37 proteins with sequence identities ranging from 21.2% to 99.2% were modeled
using PROTEUS2 and 3D JigSaw (using default parameters). In the second case, 33 proteins with similar sequence identity
ranges were modeled using PROTEUS2 and Swiss-Model (also using default parameters). In each case, identical template
structures were used for the pairwise comparisons. The resulting structures were compared using backbone RMSD and all-atom
RMSD values measured after quaternion superposition.
Homology Modeling Performance |
Program or Server |
RMSD All (Å) |
RMSD CA (Å) |
PROTEUS2 | 1.83 | 0.99 |
Swiss-Model | 1.62 | 0.86 |
3D-JigSaw | 1.94 | 0.97 |
-
Wishart, D.S., Arndt, D., Berjanskii, M., Guo, A.C., Shi, Y., Shrivastava, S., Zhou. J., Zhou, Y. and Lin, G. (2008)
PPT-DB: the protein property prediction and testing database.
Nucleic Acids Res. 36 (Database issue), D222-229.
-
Hua, S. and Sun, Z. (2001)
Support vector machine approach for protein subcellular localization prediction.
Bioinformatics. 17 721-728.
-
Bendtsen, J.D., Nielsen, H., von Heijne, G. and Brunak, S. (2004)
Improved prediction of signal peptides: SignalP 3.0.
J. Mol. Biol. 340, 783-795.
-
Kernytsky, A. and Rost, B. (2003)
Static benchmarking of membrane helix predictions.
Nucleic Acids Res. 31, 3642-3654.
-
Krogh, A., Larsson, B., von Heijne, G. and Sonnhammer, E.L. (2001)
Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.
J. Mol. Biol. 305, 567-580.
-
Tusnády, G.E. and Simon, I. (2001)
The HMMTOP transmembrane topology prediction server.
Bioinformatics. 17, 849-850.
-
Cserzö, M., Wallin, E., Simon, I., von Heijne, G. and Elofsson, A. (1997)
Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: the dense alignment surface method.
Protein Eng. 10, 673-676.
-
Garrow, A.G., Agnew, A. and Westhead, D.R. (2005)
TMB-Hunt: a web server to screen sequence sets for transmembrane beta-barrel proteins.
Nucleic Acids Res. 33 (Web Server issue), W188-192.
-
Bagos, P.G., Liakopoulos, T.D., Spyropoulos, I.C. and Hamodrakas, S.J. (2004)
PRED-TMBB: a web server for predicting the topology of beta-barrel outer membrane proteins.
Nucleic Acids Res. 32 (Web Server issue), W400-404.
-
Eyrich, V.A., Marti-Renom, M.A., Przybylski, D., Madhusudhan, M.S., Fiser, A., Pazos, F., Valencia, A., Sali, A. and Rost, B. (2001)
EVA: continuous automatic evaluation of protein structure prediction servers.
Bioinformatics. 17, 1242-1243.
-
Pollastri, G. and McLysaght, A. (2005)
Porter: a new, accurate server for protein secondary structure prediction.
Bioinformatics. 21, 1719-1720.
-
Jones, D.T. (1999)
Protein secondary structure prediction based on position-specific scoring matrices.
J. Mol. Biol. 292M, 195-202.
-
Rost, B., Sander, C. and Schneider, R. (1994)
PHD--an automatic mail server for protein secondary structure prediction.
Comput Appl Biosci. 10, 53-60.
-
Cuff, J.A. and Barton, G.J. (2000)
Application of multiple sequence alignment profiles to improve protein secondary structure prediction.
Proteins, 40, 502-511.
-
Schwede, T., Kopp, .J, Guex, N. and Peitsch, M.C. (2003) SWISS-MODEL: An automated protein homology-modeling server.
Nucleic Acids Res. 31, 3381-3385.
-
Bates, P.A., Kelley, L.A., MacCallum, R.M. and Sternberg, M.J.E. (2001)
Enhancement of Protein Modelling by Human Intervention in Applying the Automatic Programs 3D-JIGSAW and 3D-PSSM.
Proteins. Suppl 5, 39-46.
Table 2: Signal peptide prediction performance assessed using: 1) the percent coverage for
the 3D-2D mapping process, and 2) the Q2 score, which is the per-residue prediction accuracy. The percent
coverage indicates the percentage of query sequences in the SPDb data set that had sufficiently good quality
homologues (after the query sequence was removed from the database) to predict the presence and location of
a signal peptide. For this assessment a data set of 2587 complete protein sequences with experimentally
confirmed signal was extracted from the PPT-DB (SPdb data set). The signal peptide set included proteins
from each of the three major classes of organisms (Gram+, Gram- and Eukaryotes). In this assessment, we
compared the performance of PROTEUS2 to SignalP 3.0.
Program or Server |
Q2 (Gram+) (%) |
Q2 (Gram-) (%) |
Q2 (Eukaryotes) (%) |
3D-2D Mapping* | 93.5 (42.4) | 94.4 (43.1) | 93.6 (76.3) |
PredictSP | 94.0 | 95.0 | 77.4 |
Combined (PROTEUS2) | 93.7 | 94.6 | 90.0 |
SignalP 3.0 | 96.0 | 97.0 | 99.0 |
*
The number in parentheses indicates the percentage of sequences in the sample that had matching homologs in the database (coverage).
Table 3: Transmembrane beta-barrel prediction performance assessed using: 1) the percent
coverage for the 3D-2D mapping process, and 2) the Q2 score, which is the per-residue prediction accuracy.
The percent coverage indicates the percentage of query sequences in the PPT-DB transmembrane barrel data
set that had sufficiently good quality homologues (after the query sequence was removed from the database)
to predict the presence and location of membrane beta strands. The assessment of transmembrane beta-barrel
detection was done using a set of 49 experimentally determined transmembrane beta-barrel proteins extracted
from PPT-DB. In this assessment we compared the performance of PROTEUS2 to PRED-TMBB.
Program or Server |
Q2 (%) |
3D-2D Mapping* | 93.2 (56.1) |
Jury-of-Experts/PROTEUS2 | 76.0 |
Combined (PROTEUS2) | 85.7 |
PRED-TMBB | 73.0 |
*
The number in parentheses indicates the percentage of sequences in the sample that had matching homologs in the database (coverage).
Table 4: Transmembrane helix prediction performance assessed using: 1) the percent coverage
for the 3D-2D mapping process, and 2) the Q2 score, which is the per-residue prediction accuracy. The percent
coverage indicates the percentage of query sequences in the PPT-DB transmembrane helix data set that had
sufficiently good quality homologues (after the query sequence was removed from the database) to predict the
presence and location of membrane helices. The assessment of transmembrane alpha-helix detection was done using
a set of 275 experimentally determined transmembrane alpha-helical proteins extracted from PPT-DB. In this
assessment we compared the performance of PROTEUS2 to TMHMM.
Program or Server |
Q2 (%) |
3D-2D Mapping* | 90.5 (61.8) |
TMHMM | 82.2 |
Combined (PROTEUS2) | 87.3 |
*
The number in parentheses indicates the percentage of sequences in the sample that had matching homologs in the database (coverage).
Table 5: The sensitivity-specificity comparison between different predictors and PROTEUS2.
In this assessment the ability of different programs to identify whether a protein was globular, contained
a signal peptide, contained a transmembrane beta-barrel, or contained a transmembrane helix was tested.
In total 16618 globular proteins, 49 membrane beta-barrel proteins, 275 membrane alpha-helix proteins and
2587 SPdb proteins were extracted from the PPT-DB and submitted to different predictors. For the transmembrane
beta-barrel test, 49+16618 proteins were analyzed and classified (by TMB-Hunt or PROTEUS2) as being a beta
barrel protein or a globular protein. For the Transmembrane helix test, 275+16618 proteins were analyzed
and classified (by TMHMM or PROTEUS2) as being a transmembrane helix protein or a globular protein. For the
signal peptide test, 2587+16618 proteins were analyzed and classified (by SignalP or PROTEUS2) as having a
signal peptide or not.
Sensitivity-Specificity Detection to Identify Transmembrane Beta-Barrel Proteins |
Program or Server |
Sensitivity |
Specificity |
TMB-Hunt | 78% | 99% |
PROTEUS2 | 100% | 100% |
Sensitivity-Specificity Detection to Identify Transmembrane Alpha-Helix Proteins |
Program or Server |
Sensitivity (%) |
Specificity (%) |
TMHMM | 97% | 94% |
PROTEUS2 | 100% | 100% |
Sensitivity-Specificity Detection to Identify Signal Peptides |
Program or Server |
Sensitivity (%) |
Specificity (%) |
PredictSP | 66.4% | 97.2% |
PROTEUS2 | 100% | 100% |
Table 6: An assessment of PROTEUS2's performance for homology modeling. In this test,
33 proteins with sequence identity ranges from 21%-99% were modeled using PROTEUS2 and SWISS-MODEL
(also using default parameters). Identical template structures were used for each structure generation
attempt by each program. The resulting structures were compared using backbone RMSD and all-atom RMSD.
PDB ID |
RMSD |
Identity |
Swiss Model |
Homodeller |
Query |
Template |
All atoms |
CA |
All atoms |
CA |
% |
1PVAA | 1RRO | 1.84 | 0.78 | 1.82 | 0.77 | 47.22 |
1B7DA | 1CN2 | 3.09 | 2.45 | 3.84 | 3.27 | 44.26 |
132L | 1LSY | 0.91 | 0.26 | 0.91 | 0.26 | 99.22 |
1A3D | 4BP2 | 2.78 | 1.08 | 3.47 | 2.76 | 58.12 |
1AAPA | 8PTI | 2.54 | 1.09 | 2.47 | 1.08 | 42.86 |
1CBS | 1CBI | 1.60 | 0.98 | 2.27 | 0.97 | 77.21 |
1CTF | 1DD3 | 1.51 | 0.62 | 1.49 | 0.90 | 70.59 |
1ROPA | 1GTO | 1.11 | 0.52 | 1.12 | 0.52 | 96.43 |
1RZL | 1JTB | 2.74 | 2.31 | 3.17 | 2.33 | 62.64 |
3IL8 | 1MI2 | 3.32 | 2.33 | 3.37 | 2.30 | 42.65 |
2WRPR | 1JHG | 0.55 | 0.19 | 1.98 | 0.19 | 99.01 |
1POH | 1PTF | 2.04 | 1.21 | 2.14 | 1.20 | 35.29 |
1NHKR | 1NDC | 1.49 | 0.78 | 1.94 | 1.28 | 45.83 |
1BPT | 1AAP | 1.17 | 0.37 | 1.29 | 0.37 | 44.64 |
1PZA | 1PMY | 1.49 | 0.83 | 1.43 | 0.82 | 45.00 |
1THBA | 1PBXA | 1.28 | 0.85 | 1.35 | 0.85 | 49.65 |
5HVPB | 1IVPA | 1.72 | 0.76 | 1.89 | 0.78 | 48.48 |
1CRB | 1OPBC | 1.35 | 0.62 | 1.51 | 0.63 | 56.39 |
1FKF | 1YAT | 1.34 | 0.44 | 1.72 | 0.50 | 57.94 |
1PVAA | 1CDP | 1.53 | 0.67 | 1.45 | 0.68 | 62.04 |
1MRJ | 1MOM | 1.16 | 0.54 | 1.12 | 0.54 | 65.04 |
1CAD | 8RXNA | 1.11 | 0.53 | 1.69 | 0.67 | 65.38 |
1TADB | 1GIA | 1.54 | 1.02 | 1.60 | 1.02 | 69.35 |
1HSAA | 2VAAA | 1.51 | 0.85 | 1.5 | 0.85 | 72.63 |
1DHFA | 1DR7 | 1.42 | 0.62 | 1.59 | 0.63 | 75.27 |
8DFR | 2DHFA | 1.29 | 0.57 | 1.28 | 0.57 | 75.27 |
1HNA | 3GSTB | 1.2 | 0.65 | 1.31 | 0.66 | 75.58 |
1ALA | 1AVR | 0.87 | 0.40 | 0.85 | 0.41 | 77.85 |
4P2P | 2BPP | 1.66 | 0.56 | 2.01 | 0.87 | 84.55 |
135L | 1HHL | 1.11 | 0.58 | 1.03 | 0.58 | 86.82 |
1EMY | 1YMC | 1.26 | 0.39 | 1.21 | 0.40 | 87.58 |
2CHF | 1CHN | 1.91 | 1.16 | 1.93 | 1.15 | 97.62 |
1FAFA | 1GH6A | 3.00 | 2.08 | 3.72 | 2.53 | 31.65 |
1ETB1 | 1TTCA | 0.65 | 0.23 | 0.78 | 0.22 | 98.31 |
| | | | | | |
Mean | 1.62 | 0.86 | 1.83 | 0.99 | 66.13 |
Table 7: An assessment of PROTEUS2's performance for homology modeling. In this test,
37 proteins with sequence identity ranges from 21%-99% were modeled using PROTEUS2 and 3D JigSaw (also
using default parameters). Identical template structures were used for each structure generation attempt
by each program. The resulting structures were compared using backbone RMSD and all-atom RMSD.
PDB ID |
RMSD |
Identity |
3D JigSaw |
Homodeller |
Query |
Template |
All atoms |
CA |
All atoms |
CA |
% |
132LA | 1IORA | 1.37 | 0.15 | 1.33 | 0.14 | 98.45 |
135LA | 1IORA | 1.41 | 0.15 | 1.29 | 0.13 | 93.02 |
1A3DA | 1L8SA | 1.57 | 0.61 | 3.18 | 2.77 | 57.14 |
1AAPA | 1BPIA | 1.99 | 0.42 | 2.23 | 0.40 | 44.64 |
1ALAA | 1AVRA | 1.09 | 0.41 | 1.35 | 0.41 | 77.85 |
1AZRA | 1JVOL | 1.17 | 0.33 | 0.55 | 0.33 | 97.66 |
1B7DA | 1CN2A | 2.37 | 1.39 | 3.83 | 3.27 | 44.26 |
1CADA | 8RXNA | 1.42 | 0.54 | 1.92 | 0.65 | 65.38 |
1CBSA | 1CBIA | 1.66 | 0.98 | 2.36 | 0.97 | 77.21 |
1CRBA | 1CBIA | 2.72 | 1.72 | 2.99 | 1.81 | 41.04 |
1CTFA | 1DD3A | 1.44 | 0.74 | 1.78 | 0.89 | 70.59 |
1DHFA | 1DR7A | 1.63 | 0.64 | 1.81 | 0.63 | 75.27 |
1EMYA | 1YMCA | 1.23 | 0.36 | 1.48 | 0.37 | 87.58 |
1ETB1 | 1ETA1 | 5.56 | 3.23 | 1.23 | 0.28 | 99.15 |
1FAFA | 1GH6A | 3.56 | 2.77 | 3.51 | 2.52 | 31.65 |
1FKFA | 1FKKA | 1.02 | 0.35 | 1.31 | 0.35 | 97.2 |
1HNAA | 4GTUH | 1.63 | 0.73 | 1.69 | 0.73 | 82.49 |
1HSAA | 2BCKD | 1.68 | 0.56 | 1.75 | 0.57 | 88.77 |
1MRJA | 1MOMA | 1.42 | 0.55 | 1.38 | 0.54 | 65.04 |
1NHKR | 1PKUC | 1.55 | 0.80 | 2.04 | 1.55 | 45.14 |
1NOAA | 1AKPA | 3.30 | 2.57 | 3.58 | 3.13 | 38.05 |
1POHA | 1PTFA | 2.06 | 1.13 | 2.28 | 1.21 | 35.29 |
1PVAA | 1CDPA | 1.25 | 0.42 | 1.62 | 0.67 | 62.04 |
1PZAA | 1PMYA | 1.45 | 0.83 | 1.68 | 0.85 | 45.00 |
1ROPA | 1GTOA | 2.13 | 1.36 | 2.19 | 1.37 | 96.43 |
1RZLA | 1JTBA | 2.73 | 2.09 | 3.36 | 2.34 | 62.64 |
1TADB | 1GIAA | 1.71 | 0.85 | 1.73 | 0.85 | 69.35 |
1THBA | 1FHJC | 1.18 | 0.45 | 1.40 | 0.44 | 82.98 |
2CHFA | 1CHNA | 1.89 | 1.14 | 1.81 | 1.15 | 97.62 |
2CROA | 2OR1R | 2.38 | 0.67 | 1.92 | 0.68 | 52.38 |
2GBPA | 3GBPA | 3.72 | 3.51 | 4.13 | 3.51 | 94.43 |
2LALA | 2B7YC | 1.24 | 0.36 | 1.09 | 0.36 | 82.32 |
2OZ9R | 1JHGA | 1.61 | 0.18 | 1.98 | 0.19 | 99.01 |
2YCCA | 1YEBA | 1.58 | 0.33 | 1.25 | 0.34 | 89.81 |
3IL8A | 1MSHB | 3.40 | 2.09 | 2.80 | 1.56 | 44.12 |
4AZUA | 1JVOL | 1.12 | 0.24 | 0.48 | 0.24 | 99.22 |
4P2PA | 2BPPA | 1.66 | 0.55 | 2.31 | 0.81 | 84.55 |
5PTIA | 1P2MD | 1.92 | 0.59 | 1.36 | 0.41 | 96.55 |
| | | | | | |
Mean | 1.94 | 0.97 | 2.00 | 1.04 | 72.93 |