PROTEUS Structure Prediction Server 2.0
Comprehensive 2D and 3D structure predictions
 

Evaluation Details

The results in the tables below report the performance of PROTEUS2's combined predictors as compared to other programs. However, since each of the four 1D predictors used both de novo predictions and homology-based methods, we also assessed:
  1. the performance of the de novo predictors alone,
  2. the performance of the homology-based structure predictors alone, and
  3. the performance of the combined predictors.

When measuring the performance of any 3D-to-2D mapping prediction, the standard approach is to iteratively remove each sequence from the database and to perform the prediction with that sequence. This prevents simply predicting the structure of the query protein using the query itself. It is also important to report the per-residue accuracy (Q2), as well as the percentage of query proteins that returned an answer (coverage). For signal peptide, transmembrane helix and transmembrane beta-barrel predictions, we evaluated both the per-residue prediction accuracy (Q2) as well as the ability of the predictors to correctly identify proteins with or without these structural features (sensitivity/specificity). Secondary structure predictions of soluble proteins were evaluated using only the Q3 and SOV scores. An assessment of PROTEUS2's performance for homology modeling was also performed and compared with structures generated by 3D-JigSaw and SWISS-MODEL.


Table 1a: To assess PROTEUS2's signal peptide prediction, a data set of 2587 complete protein sequences with experimentally confirmed signal peptides as well as a data set of 16,618 cytoplasmic proteins (with no signal peptides in their sequence) was extracted from the PPT-DB (1). The complete list of 2587 proteins with signal peptide annotation (in pseudo-FASTA format) is accessible here (Gram+, Gram-, Eukaryotic). The complete list of 16,618 non-signal containing protein sequences (in FASTA format) is accessible here. The signal peptide set included proteins from each of the three major classes of organisms (Gram+, Gram-, and Eukaryote). PROTEUS2 was compared against SubLoc (2) and SignalP 3.0 (3), using their default values, by calculating the per-residue prediction accuracy (Q2).
Signal Peptide Prediction Performance (PPT-DB SPdb data set)
Program or Server Q2 (Gram-) Q2 (Gram-+)
PROTEUS295%94%
SubLoc91%86%
SignalP 3.096%97%

Table 1b: To assess transmembrane helix prediction accuracy, PROTEUS2 was assessed against the 2247 proteins (globular and transmembrane) used in TMH-Benchmark (4). The complete list of 2247 proteins with membrane annotations is available here. In this assessment we compared the performance of PROTEUS2 to TMHMM (5), HMMTOP (6), DAS (7) using the per-residue prediction accuracy (Q2), as well as by measuring the number of false positives (number of proteins identified as having a transmembrane region by the program but actually being non-membrane proteins).
Transmembrane Helix Prediction Performance (TMH Benchmark test set)
Program or Server Q2 # False positives
PROTEUS291%0
TMHMM80%1
HMMTOP80%6
DAS72%16

Table 1c: Because there is some uncertainty in the transmembrane assignments for some of the TMH Benchmark's high-resolution data set, we performed a second evaluation using the experimentally confirmed data set of transmembrane helices derived from PPT-DB (1). Specifically, a data set of 275 complete protein sequences with experimentally confirmed transmembrane helices was extracted from the PPT-DB (1). Acting as a negative control, a data set of 16,618 globular (non-membrane proteins) was also extracted from the PPT-DB (1). The complete list of 275 proteins with membrane helix annotations (in pseudo-FASTA format) is accessible here. The complete list of 16,618 non-membrane protein sequences (in FASTA format) is accessible here. In this assessment, we compared the performance of PROTEUS2 to TMHMM (5) using the per-residue prediction accuracy (Q2), as well as by measuring the number of false positives (number of proteins identified as having a transmembrane region by the program but actually being non-membrane proteins).
Transmembrane Helix Prediction Performance (PPT-DB-TMH test set)
Program or Server Q2 # False neg. (TMH vs. glob)
PROTEUS287%0
TMHMM82%8

Table 1d: The assessment of transmembrane beta-barrel detection was done using an experimentally determined set of 49 transmembrane beta-barrel and 16,618 water-soluble, globular proteins obtained from PPT-DB (1). The complete list of 49 proteins with membrane barrel annotation (in pseudo-FASTA format) is accessible here. The complete list of 16,618 non-barrel protein sequences (in FASTA format) is accessible here. PROTEUS2 was compared against TMB-Hunt (8) for its ability to identify membrane barrel from non-membrane proteins using sensitivity and specificity measures.
Transmembrane Beta Barrel Detection Performance (PPT-DB "All" protein data set)
Program or Server Sensitivity Specificity
PROTEUS2100%100%
TMB-Hunt78%99%

Table 1e: The assessment of transmembrane beta-barrel beta sheet detection was done using a set of 49 experimentally determined transmembrane beta-barrel proteins obtained from PPT-DB (1). The complete list of 49 proteins with membrane barrel annotation (in pseudo-FASTA format) is available here. In this assessment, PROTEUS2 was compared against Pred-TMBB (9) using the per-residue prediction accuracy (Q2).
Transmembrane Beta Strand Prediction Performance (PPT-DB -TMB test set)
Program or Server Q2
Combined (PROTEUS2)86%
PRED-TMBB73%
* The number in parentheses indicates the percentage of sequences in the sample that had matching homologs in the database (coverage).

Table 1f: The assessment of PROTEUS2's performance on globular proteins or non-membrane secondary structure prediction was done using two approaches: 1) through a "blind" test and comparison on the latest EVA (10) training set (1644 sequence-unique proteins) and 2) through analysis of 125 randomly chosen proteins that were recently solved by X-ray and NMR. The complete list of 1644 proteins with sequence and secondary structure annotation (in pseudo-FASTA format) is accessible here. The complete list of 125 randomly selected protein sequences with sequence and secondary structure annotation (in pseudo-FASTA format) is accessible here. The 125 protein set was chosen to simulate a more realistic case of predicting the secondary structure of sequences found in a proteome (which tend not to be sequence-unique). In both cases, the Q3 and SOV scores were calculated for each protein in the test sets. Results were compared to Porter (11), PSIPred (12), PHD (13), and JNET (14).
Non-membrane Secondary Structure Prediction Performance (EVA Test Set)
Program or Server Q3 (%) SOV (%)
PROTEUS28182
Porter7776
JNET7273
PSIPred7778
Non-membrane Secondary Structure Prediction Performance (Test Set of 125)
Program or Server Q3 (%) SOV (%)
PROTEUS28890
Porter7681
JNET7377
PSIPred7678

Table 1g: The assessment of PROTEUS2's performance on homology modeling was made by comparing the program to 3D JigSaw (15) and Swiss-Model (16). 37 proteins with sequence identities ranging from 21.2% to 99.2% were modeled using PROTEUS2 and 3D JigSaw (using default parameters). In the second case, 33 proteins with similar sequence identity ranges were modeled using PROTEUS2 and Swiss-Model (also using default parameters). In each case, identical template structures were used for the pairwise comparisons. The resulting structures were compared using backbone RMSD and all-atom RMSD values measured after quaternion superposition.
Homology Modeling Performance
Program or Server RMSD All (Å) RMSD CA (Å)
PROTEUS21.830.99
Swiss-Model1.620.86
3D-JigSaw1.940.97

References for Table 1
  1. Wishart, D.S., Arndt, D., Berjanskii, M., Guo, A.C., Shi, Y., Shrivastava, S., Zhou. J., Zhou, Y. and Lin, G. (2008) PPT-DB: the protein property prediction and testing database. Nucleic Acids Res. 36 (Database issue), D222-229.
  2. Hua, S. and Sun, Z. (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics. 17 721-728.
  3. Bendtsen, J.D., Nielsen, H., von Heijne, G. and Brunak, S. (2004) Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783-795.
  4. Kernytsky, A. and Rost, B. (2003) Static benchmarking of membrane helix predictions. Nucleic Acids Res. 31, 3642-3654.
  5. Krogh, A., Larsson, B., von Heijne, G. and Sonnhammer, E.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567-580.
  6. Tusnády, G.E. and Simon, I. (2001) The HMMTOP transmembrane topology prediction server. Bioinformatics. 17, 849-850.
  7. Cserzö, M., Wallin, E., Simon, I., von Heijne, G. and Elofsson, A. (1997) Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: the dense alignment surface method. Protein Eng. 10, 673-676.
  8. Garrow, A.G., Agnew, A. and Westhead, D.R. (2005) TMB-Hunt: a web server to screen sequence sets for transmembrane beta-barrel proteins. Nucleic Acids Res. 33 (Web Server issue), W188-192.
  9. Bagos, P.G., Liakopoulos, T.D., Spyropoulos, I.C. and Hamodrakas, S.J. (2004) PRED-TMBB: a web server for predicting the topology of beta-barrel outer membrane proteins. Nucleic Acids Res. 32 (Web Server issue), W400-404.
  10. Eyrich, V.A., Marti-Renom, M.A., Przybylski, D., Madhusudhan, M.S., Fiser, A., Pazos, F., Valencia, A., Sali, A. and Rost, B. (2001) EVA: continuous automatic evaluation of protein structure prediction servers. Bioinformatics. 17, 1242-1243.
  11. Pollastri, G. and McLysaght, A. (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics. 21, 1719-1720.
  12. Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292M, 195-202.
  13. Rost, B., Sander, C. and Schneider, R. (1994) PHD--an automatic mail server for protein secondary structure prediction. Comput Appl Biosci. 10, 53-60.
  14. Cuff, J.A. and Barton, G.J. (2000) Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins, 40, 502-511.
  15. Schwede, T., Kopp, .J, Guex, N. and Peitsch, M.C. (2003) SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res. 31, 3381-3385.
  16. Bates, P.A., Kelley, L.A., MacCallum, R.M. and Sternberg, M.J.E. (2001) Enhancement of Protein Modelling by Human Intervention in Applying the Automatic Programs 3D-JIGSAW and 3D-PSSM. Proteins. Suppl 5, 39-46.

Table 2: Signal peptide prediction performance assessed using: 1) the percent coverage for the 3D-2D mapping process, and 2) the Q2 score, which is the per-residue prediction accuracy. The percent coverage indicates the percentage of query sequences in the SPDb data set that had sufficiently good quality homologues (after the query sequence was removed from the database) to predict the presence and location of a signal peptide. For this assessment a data set of 2587 complete protein sequences with experimentally confirmed signal was extracted from the PPT-DB (SPdb data set). The signal peptide set included proteins from each of the three major classes of organisms (Gram+, Gram- and Eukaryotes). In this assessment, we compared the performance of PROTEUS2 to SignalP 3.0.
Program or Server Q2 (Gram+) (%) Q2 (Gram-) (%) Q2 (Eukaryotes) (%)
3D-2D Mapping*93.5 (42.4)94.4 (43.1)93.6 (76.3)
PredictSP94.095.077.4
Combined (PROTEUS2)93.794.690.0
SignalP 3.096.097.099.0
* The number in parentheses indicates the percentage of sequences in the sample that had matching homologs in the database (coverage).

Table 3: Transmembrane beta-barrel prediction performance assessed using: 1) the percent coverage for the 3D-2D mapping process, and 2) the Q2 score, which is the per-residue prediction accuracy. The percent coverage indicates the percentage of query sequences in the PPT-DB transmembrane barrel data set that had sufficiently good quality homologues (after the query sequence was removed from the database) to predict the presence and location of membrane beta strands. The assessment of transmembrane beta-barrel detection was done using a set of 49 experimentally determined transmembrane beta-barrel proteins extracted from PPT-DB. In this assessment we compared the performance of PROTEUS2 to PRED-TMBB.
Program or Server Q2 (%)
3D-2D Mapping*93.2 (56.1)
Jury-of-Experts/PROTEUS276.0
Combined (PROTEUS2)85.7
PRED-TMBB73.0
* The number in parentheses indicates the percentage of sequences in the sample that had matching homologs in the database (coverage).

Table 4: Transmembrane helix prediction performance assessed using: 1) the percent coverage for the 3D-2D mapping process, and 2) the Q2 score, which is the per-residue prediction accuracy. The percent coverage indicates the percentage of query sequences in the PPT-DB transmembrane helix data set that had sufficiently good quality homologues (after the query sequence was removed from the database) to predict the presence and location of membrane helices. The assessment of transmembrane alpha-helix detection was done using a set of 275 experimentally determined transmembrane alpha-helical proteins extracted from PPT-DB. In this assessment we compared the performance of PROTEUS2 to TMHMM.
Program or Server Q2 (%)
3D-2D Mapping*90.5 (61.8)
TMHMM82.2
Combined (PROTEUS2)87.3
* The number in parentheses indicates the percentage of sequences in the sample that had matching homologs in the database (coverage).

Table 5: The sensitivity-specificity comparison between different predictors and PROTEUS2. In this assessment the ability of different programs to identify whether a protein was globular, contained a signal peptide, contained a transmembrane beta-barrel, or contained a transmembrane helix was tested. In total 16618 globular proteins, 49 membrane beta-barrel proteins, 275 membrane alpha-helix proteins and 2587 SPdb proteins were extracted from the PPT-DB and submitted to different predictors. For the transmembrane beta-barrel test, 49+16618 proteins were analyzed and classified (by TMB-Hunt or PROTEUS2) as being a beta barrel protein or a globular protein. For the Transmembrane helix test, 275+16618 proteins were analyzed and classified (by TMHMM or PROTEUS2) as being a transmembrane helix protein or a globular protein. For the signal peptide test, 2587+16618 proteins were analyzed and classified (by SignalP or PROTEUS2) as having a signal peptide or not.
Sensitivity-Specificity Detection to Identify Transmembrane Beta-Barrel Proteins
Program or Server Sensitivity Specificity
TMB-Hunt78%99%
PROTEUS2100%100%
Sensitivity-Specificity Detection to Identify Transmembrane Alpha-Helix Proteins
Program or Server Sensitivity (%) Specificity (%)
TMHMM97%94%
PROTEUS2100%100%
Sensitivity-Specificity Detection to Identify Signal Peptides
Program or Server Sensitivity (%) Specificity (%)
PredictSP66.4%97.2%
PROTEUS2100%100%

Table 6: An assessment of PROTEUS2's performance for homology modeling. In this test, 33 proteins with sequence identity ranges from 21%-99% were modeled using PROTEUS2 and SWISS-MODEL (also using default parameters). Identical template structures were used for each structure generation attempt by each program. The resulting structures were compared using backbone RMSD and all-atom RMSD.
PDB ID RMSD Identity
Swiss Model Homodeller
Query Template All atoms CA All atoms CA %
1PVAA1RRO1.840.781.820.7747.22
1B7DA1CN23.092.453.843.2744.26
132L1LSY0.910.260.910.2699.22
1A3D4BP22.781.083.472.7658.12
1AAPA8PTI2.541.092.471.0842.86
1CBS1CBI1.600.982.270.9777.21
1CTF1DD31.510.621.490.9070.59
1ROPA1GTO1.110.521.120.5296.43
1RZL1JTB2.742.313.172.3362.64
3IL81MI23.322.333.372.3042.65
2WRPR1JHG0.550.191.980.1999.01
1POH1PTF2.041.212.141.2035.29
1NHKR1NDC1.490.781.941.2845.83
1BPT1AAP1.170.371.290.3744.64
1PZA1PMY1.490.831.430.8245.00
1THBA1PBXA1.280.851.350.8549.65
5HVPB1IVPA1.720.761.890.7848.48
1CRB1OPBC1.350.621.510.6356.39
1FKF1YAT1.340.441.720.5057.94
1PVAA1CDP1.530.671.450.6862.04
1MRJ1MOM1.160.541.120.5465.04
1CAD8RXNA1.110.531.690.6765.38
1TADB1GIA1.541.021.601.0269.35
1HSAA2VAAA1.510.851.50.8572.63
1DHFA1DR71.420.621.590.6375.27
8DFR2DHFA1.290.571.280.5775.27
1HNA3GSTB1.20.651.310.6675.58
1ALA1AVR0.870.400.850.4177.85
4P2P2BPP1.660.562.010.8784.55
135L1HHL1.110.581.030.5886.82
1EMY1YMC1.260.391.210.4087.58
2CHF1CHN1.911.161.931.1597.62
1FAFA1GH6A3.002.083.722.5331.65
1ETB11TTCA0.650.230.780.2298.31
Mean1.620.861.830.9966.13

Table 7: An assessment of PROTEUS2's performance for homology modeling. In this test, 37 proteins with sequence identity ranges from 21%-99% were modeled using PROTEUS2 and 3D JigSaw (also using default parameters). Identical template structures were used for each structure generation attempt by each program. The resulting structures were compared using backbone RMSD and all-atom RMSD.
PDB ID RMSD Identity
3D JigSaw Homodeller
Query Template All atoms CA All atoms CA %
132LA1IORA1.370.151.330.1498.45
135LA1IORA1.410.151.290.1393.02
1A3DA1L8SA1.570.613.182.7757.14
1AAPA1BPIA1.990.422.230.4044.64
1ALAA1AVRA1.090.411.350.4177.85
1AZRA1JVOL1.170.330.550.3397.66
1B7DA1CN2A2.371.393.833.2744.26
1CADA8RXNA1.420.541.920.6565.38
1CBSA1CBIA1.660.982.360.9777.21
1CRBA1CBIA2.721.722.991.8141.04
1CTFA1DD3A1.440.741.780.8970.59
1DHFA1DR7A1.630.641.810.6375.27
1EMYA1YMCA1.230.361.480.3787.58
1ETB11ETA15.563.231.230.2899.15
1FAFA1GH6A3.562.773.512.5231.65
1FKFA1FKKA1.020.351.310.3597.2
1HNAA4GTUH1.630.731.690.7382.49
1HSAA2BCKD1.680.561.750.5788.77
1MRJA1MOMA1.420.551.380.5465.04
1NHKR1PKUC1.550.802.041.5545.14
1NOAA1AKPA3.302.573.583.1338.05
1POHA1PTFA2.061.132.281.2135.29
1PVAA1CDPA1.250.421.620.6762.04
1PZAA1PMYA1.450.831.680.8545.00
1ROPA1GTOA2.131.362.191.3796.43
1RZLA1JTBA2.732.093.362.3462.64
1TADB1GIAA1.710.851.730.8569.35
1THBA1FHJC1.180.451.400.4482.98
2CHFA1CHNA1.891.141.811.1597.62
2CROA2OR1R2.380.671.920.6852.38
2GBPA3GBPA3.723.514.133.5194.43
2LALA2B7YC1.240.361.090.3682.32
2OZ9R1JHGA1.610.181.980.1999.01
2YCCA1YEBA1.580.331.250.3489.81
3IL8A1MSHB3.402.092.801.5644.12
4AZUA1JVOL1.120.240.480.2499.22
4P2PA2BPPA1.660.552.310.8184.55
5PTIA1P2MD1.920.591.360.4196.55
Mean1.940.972.001.0472.93