Documentation for PROTEUS2 Structure Prediction Server

The results in the tables below report the performance of PROTEUS2's combined predictors as compared to other programs. However, since each of the four 1D predictors used both de novo predictions and homology-based methods, we also assessed:

the performance of the de novo predictors alone,
the performance of the homology-based structure predictors alone, and
the performance of the combined predictors.

When measuring the performance of any 3D-to-2D mapping prediction, the standard approach is to iteratively remove each sequence from the database and to perform the prediction with that sequence. This prevents simply predicting the structure of the query protein using the query itself. It is also important to report the per-residue accuracy (Q2), as well as the percentage of query proteins that returned an answer (coverage). For signal peptide, transmembrane helix and transmembrane beta-barrel predictions, we evaluated both the per-residue prediction accuracy (Q2) as well as the ability of the predictors to correctly identify proteins with or without these structural features (sensitivity/specificity). Secondary structure predictions of soluble proteins were evaluated using only the Q3 and SOV scores. An assessment of PROTEUS2's performance for homology modeling was also performed and compared with structures generated by 3D-JigSaw and SWISS-MODEL.

Table 1a: To assess PROTEUS2's signal peptide prediction, a data set of 2587 complete protein sequences with experimentally confirmed signal peptides as well as a data set of 16,618 cytoplasmic proteins (with no signal peptides in their sequence) was extracted from the PPT-DB (1). The complete list of 2587 proteins with signal peptide annotation (in pseudo-FASTA format) is accessible here (Gram+, Gram-, Eukaryotic). The complete list of 16,618 non-signal containing protein sequences (in FASTA format) is accessible here. The signal peptide set included proteins from each of the three major classes of organisms (Gram+, Gram-, and Eukaryote). PROTEUS2 was compared against SubLoc (2) and SignalP 3.0 (3), using their default values, by calculating the per-residue prediction accuracy (Q2).

Signal Peptide Prediction Performance (PPT-DB SPdb data set)
Program or Server	Q2 (Gram-)	Q2 (Gram-+)
PROTEUS2	95%	94%
SubLoc	91%	86%
SignalP 3.0	96%	97%

Table 1b: To assess transmembrane helix prediction accuracy, PROTEUS2 was assessed against the 2247 proteins (globular and transmembrane) used in TMH-Benchmark (4). The complete list of 2247 proteins with membrane annotations is available here. In this assessment we compared the performance of PROTEUS2 to TMHMM (5), HMMTOP (6), DAS (7) using the per-residue prediction accuracy (Q2), as well as by measuring the number of false positives (number of proteins identified as having a transmembrane region by the program but actually being non-membrane proteins).

Transmembrane Helix Prediction Performance (TMH Benchmark test set)
Program or Server	Q2	# False positives
PROTEUS2	91%	0
TMHMM	80%	1
HMMTOP	80%	6
DAS	72%	16

Table 1c: Because there is some uncertainty in the transmembrane assignments for some of the TMH Benchmark's high-resolution data set, we performed a second evaluation using the experimentally confirmed data set of transmembrane helices derived from PPT-DB (1). Specifically, a data set of 275 complete protein sequences with experimentally confirmed transmembrane helices was extracted from the PPT-DB (1). Acting as a negative control, a data set of 16,618 globular (non-membrane proteins) was also extracted from the PPT-DB (1). The complete list of 275 proteins with membrane helix annotations (in pseudo-FASTA format) is accessible here. The complete list of 16,618 non-membrane protein sequences (in FASTA format) is accessible here. In this assessment, we compared the performance of PROTEUS2 to TMHMM (5) using the per-residue prediction accuracy (Q2), as well as by measuring the number of false positives (number of proteins identified as having a transmembrane region by the program but actually being non-membrane proteins).

Transmembrane Helix Prediction Performance (PPT-DB-TMH test set)
Program or Server	Q2	# False neg. (TMH vs. glob)
PROTEUS2	87%	0
TMHMM	82%	8

Table 1d: The assessment of transmembrane beta-barrel detection was done using an experimentally determined set of 49 transmembrane beta-barrel and 16,618 water-soluble, globular proteins obtained from PPT-DB (1). The complete list of 49 proteins with membrane barrel annotation (in pseudo-FASTA format) is accessible here. The complete list of 16,618 non-barrel protein sequences (in FASTA format) is accessible here. PROTEUS2 was compared against TMB-Hunt (8) for its ability to identify membrane barrel from non-membrane proteins using sensitivity and specificity measures.

Transmembrane Beta Barrel Detection Performance (PPT-DB "All" protein data set)
Program or Server	Sensitivity	Specificity
PROTEUS2	100%	100%
TMB-Hunt	78%	99%

Table 1e: The assessment of transmembrane beta-barrel beta sheet detection was done using a set of 49 experimentally determined transmembrane beta-barrel proteins obtained from PPT-DB (1). The complete list of 49 proteins with membrane barrel annotation (in pseudo-FASTA format) is available here. In this assessment, PROTEUS2 was compared against Pred-TMBB (9) using the per-residue prediction accuracy (Q2).

Transmembrane Beta Strand Prediction Performance (PPT-DB -TMB test set)
Program or Server	Q2
Combined (PROTEUS2)	86%
PRED-TMBB	73%

* The number in parentheses indicates the percentage of sequences in the sample that had matching homologs in the database (coverage).

Table 1f: The assessment of PROTEUS2's performance on globular proteins or non-membrane secondary structure prediction was done using two approaches: 1) through a "blind" test and comparison on the latest EVA (10) training set (1644 sequence-unique proteins) and 2) through analysis of 125 randomly chosen proteins that were recently solved by X-ray and NMR. The complete list of 1644 proteins with sequence and secondary structure annotation (in pseudo-FASTA format) is accessible here. The complete list of 125 randomly selected protein sequences with sequence and secondary structure annotation (in pseudo-FASTA format) is accessible here. The 125 protein set was chosen to simulate a more realistic case of predicting the secondary structure of sequences found in a proteome (which tend not to be sequence-unique). In both cases, the Q3 and SOV scores were calculated for each protein in the test sets. Results were compared to Porter (11), PSIPred (12), PHD (13), and JNET (14).

Non-membrane Secondary Structure Prediction Performance (EVA Test Set)
Program or Server	Q3 (%)	SOV (%)
PROTEUS2	81	82
Porter	77	76
JNET	72	73
PSIPred	77	78
Non-membrane Secondary Structure Prediction Performance (Test Set of 125)
Program or Server	Q3 (%)	SOV (%)
PROTEUS2	88	90
Porter	76	81
JNET	73	77
PSIPred	76	78

Table 1g: The assessment of PROTEUS2's performance on homology modeling was made by comparing the program to 3D JigSaw (15) and Swiss-Model (16). 37 proteins with sequence identities ranging from 21.2% to 99.2% were modeled using PROTEUS2 and 3D JigSaw (using default parameters). In the second case, 33 proteins with similar sequence identity ranges were modeled using PROTEUS2 and Swiss-Model (also using default parameters). In each case, identical template structures were used for the pairwise comparisons. The resulting structures were compared using backbone RMSD and all-atom RMSD values measured after quaternion superposition.

Homology Modeling Performance
Program or Server	RMSD All (Å)	RMSD CA (Å)
PROTEUS2	1.83	0.99
Swiss-Model	1.62	0.86
3D-JigSaw	1.94	0.97

Wishart, D.S., Arndt, D., Berjanskii, M., Guo, A.C., Shi, Y., Shrivastava, S., Zhou. J., Zhou, Y. and Lin, G. (2008) PPT-DB: the protein property prediction and testing database. Nucleic Acids Res. 36 (Database issue), D222-229.
Hua, S. and Sun, Z. (2001) Support vector machine approach for protein subcellular localization prediction. Bioinformatics. 17 721-728.
Bendtsen, J.D., Nielsen, H., von Heijne, G. and Brunak, S. (2004) Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol. 340, 783-795.
Kernytsky, A. and Rost, B. (2003) Static benchmarking of membrane helix predictions. Nucleic Acids Res. 31, 3642-3654.
Krogh, A., Larsson, B., von Heijne, G. and Sonnhammer, E.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567-580.
Tusnády, G.E. and Simon, I. (2001) The HMMTOP transmembrane topology prediction server. Bioinformatics. 17, 849-850.
Cserzö, M., Wallin, E., Simon, I., von Heijne, G. and Elofsson, A. (1997) Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: the dense alignment surface method. Protein Eng. 10, 673-676.
Garrow, A.G., Agnew, A. and Westhead, D.R. (2005) TMB-Hunt: a web server to screen sequence sets for transmembrane beta-barrel proteins. Nucleic Acids Res. 33 (Web Server issue), W188-192.
Bagos, P.G., Liakopoulos, T.D., Spyropoulos, I.C. and Hamodrakas, S.J. (2004) PRED-TMBB: a web server for predicting the topology of beta-barrel outer membrane proteins. Nucleic Acids Res. 32 (Web Server issue), W400-404.
Eyrich, V.A., Marti-Renom, M.A., Przybylski, D., Madhusudhan, M.S., Fiser, A., Pazos, F., Valencia, A., Sali, A. and Rost, B. (2001) EVA: continuous automatic evaluation of protein structure prediction servers. Bioinformatics. 17, 1242-1243.
Pollastri, G. and McLysaght, A. (2005) Porter: a new, accurate server for protein secondary structure prediction. Bioinformatics. 21, 1719-1720.
Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292M, 195-202.
Rost, B., Sander, C. and Schneider, R. (1994) PHD--an automatic mail server for protein secondary structure prediction. Comput Appl Biosci. 10, 53-60.
Cuff, J.A. and Barton, G.J. (2000) Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Proteins, 40, 502-511.
Schwede, T., Kopp, .J, Guex, N. and Peitsch, M.C. (2003) SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res. 31, 3381-3385.
Bates, P.A., Kelley, L.A., MacCallum, R.M. and Sternberg, M.J.E. (2001) Enhancement of Protein Modelling by Human Intervention in Applying the Automatic Programs 3D-JIGSAW and 3D-PSSM. Proteins. Suppl 5, 39-46.

Table 2: Signal peptide prediction performance assessed using: 1) the percent coverage for the 3D-2D mapping process, and 2) the Q2 score, which is the per-residue prediction accuracy. The percent coverage indicates the percentage of query sequences in the SPDb data set that had sufficiently good quality homologues (after the query sequence was removed from the database) to predict the presence and location of a signal peptide. For this assessment a data set of 2587 complete protein sequences with experimentally confirmed signal was extracted from the PPT-DB (SPdb data set). The signal peptide set included proteins from each of the three major classes of organisms (Gram+, Gram- and Eukaryotes). In this assessment, we compared the performance of PROTEUS2 to SignalP 3.0.

Program or Server	Q2 (Gram+) (%)	Q2 (Gram-) (%)	Q2 (Eukaryotes) (%)
3D-2D Mapping*	93.5 (42.4)	94.4 (43.1)	93.6 (76.3)
PredictSP	94.0	95.0	77.4
Combined (PROTEUS2)	93.7	94.6	90.0
SignalP 3.0	96.0	97.0	99.0

* The number in parentheses indicates the percentage of sequences in the sample that had matching homologs in the database (coverage).

Table 3: Transmembrane beta-barrel prediction performance assessed using: 1) the percent coverage for the 3D-2D mapping process, and 2) the Q2 score, which is the per-residue prediction accuracy. The percent coverage indicates the percentage of query sequences in the PPT-DB transmembrane barrel data set that had sufficiently good quality homologues (after the query sequence was removed from the database) to predict the presence and location of membrane beta strands. The assessment of transmembrane beta-barrel detection was done using a set of 49 experimentally determined transmembrane beta-barrel proteins extracted from PPT-DB. In this assessment we compared the performance of PROTEUS2 to PRED-TMBB.

Program or Server	Q2 (%)
3D-2D Mapping*	93.2 (56.1)
Jury-of-Experts/PROTEUS2	76.0
Combined (PROTEUS2)	85.7
PRED-TMBB	73.0

* The number in parentheses indicates the percentage of sequences in the sample that had matching homologs in the database (coverage).

Table 4: Transmembrane helix prediction performance assessed using: 1) the percent coverage for the 3D-2D mapping process, and 2) the Q2 score, which is the per-residue prediction accuracy. The percent coverage indicates the percentage of query sequences in the PPT-DB transmembrane helix data set that had sufficiently good quality homologues (after the query sequence was removed from the database) to predict the presence and location of membrane helices. The assessment of transmembrane alpha-helix detection was done using a set of 275 experimentally determined transmembrane alpha-helical proteins extracted from PPT-DB. In this assessment we compared the performance of PROTEUS2 to TMHMM.

Program or Server	Q2 (%)
3D-2D Mapping*	90.5 (61.8)
TMHMM	82.2
Combined (PROTEUS2)	87.3

* The number in parentheses indicates the percentage of sequences in the sample that had matching homologs in the database (coverage).

Table 5: The sensitivity-specificity comparison between different predictors and PROTEUS2. In this assessment the ability of different programs to identify whether a protein was globular, contained a signal peptide, contained a transmembrane beta-barrel, or contained a transmembrane helix was tested. In total 16618 globular proteins, 49 membrane beta-barrel proteins, 275 membrane alpha-helix proteins and 2587 SPdb proteins were extracted from the PPT-DB and submitted to different predictors. For the transmembrane beta-barrel test, 49+16618 proteins were analyzed and classified (by TMB-Hunt or PROTEUS2) as being a beta barrel protein or a globular protein. For the Transmembrane helix test, 275+16618 proteins were analyzed and classified (by TMHMM or PROTEUS2) as being a transmembrane helix protein or a globular protein. For the signal peptide test, 2587+16618 proteins were analyzed and classified (by SignalP or PROTEUS2) as having a signal peptide or not.

Sensitivity-Specificity Detection to Identify Transmembrane Beta-Barrel Proteins
Program or Server	Sensitivity	Specificity
TMB-Hunt	78%	99%
PROTEUS2	100%	100%
Sensitivity-Specificity Detection to Identify Transmembrane Alpha-Helix Proteins
Program or Server	Sensitivity (%)	Specificity (%)
TMHMM	97%	94%
PROTEUS2	100%	100%
Sensitivity-Specificity Detection to Identify Signal Peptides
Program or Server	Sensitivity (%)	Specificity (%)
PredictSP	66.4%	97.2%
PROTEUS2	100%	100%

Table 6: An assessment of PROTEUS2's performance for homology modeling. In this test, 33 proteins with sequence identity ranges from 21%-99% were modeled using PROTEUS2 and SWISS-MODEL (also using default parameters). Identical template structures were used for each structure generation attempt by each program. The resulting structures were compared using backbone RMSD and all-atom RMSD.

PDB ID		RMSD				Identity
PDB ID		Swiss Model		Homodeller		Identity
Query	Template	All atoms	CA	All atoms	CA	%
1PVAA	1RRO	1.84	0.78	1.82	0.77	47.22
1B7DA	1CN2	3.09	2.45	3.84	3.27	44.26
132L	1LSY	0.91	0.26	0.91	0.26	99.22
1A3D	4BP2	2.78	1.08	3.47	2.76	58.12
1AAPA	8PTI	2.54	1.09	2.47	1.08	42.86
1CBS	1CBI	1.60	0.98	2.27	0.97	77.21
1CTF	1DD3	1.51	0.62	1.49	0.90	70.59
1ROPA	1GTO	1.11	0.52	1.12	0.52	96.43
1RZL	1JTB	2.74	2.31	3.17	2.33	62.64
3IL8	1MI2	3.32	2.33	3.37	2.30	42.65
2WRPR	1JHG	0.55	0.19	1.98	0.19	99.01
1POH	1PTF	2.04	1.21	2.14	1.20	35.29
1NHKR	1NDC	1.49	0.78	1.94	1.28	45.83
1BPT	1AAP	1.17	0.37	1.29	0.37	44.64
1PZA	1PMY	1.49	0.83	1.43	0.82	45.00
1THBA	1PBXA	1.28	0.85	1.35	0.85	49.65
5HVPB	1IVPA	1.72	0.76	1.89	0.78	48.48
1CRB	1OPBC	1.35	0.62	1.51	0.63	56.39
1FKF	1YAT	1.34	0.44	1.72	0.50	57.94
1PVAA	1CDP	1.53	0.67	1.45	0.68	62.04
1MRJ	1MOM	1.16	0.54	1.12	0.54	65.04
1CAD	8RXNA	1.11	0.53	1.69	0.67	65.38
1TADB	1GIA	1.54	1.02	1.60	1.02	69.35
1HSAA	2VAAA	1.51	0.85	1.5	0.85	72.63
1DHFA	1DR7	1.42	0.62	1.59	0.63	75.27
8DFR	2DHFA	1.29	0.57	1.28	0.57	75.27
1HNA	3GSTB	1.2	0.65	1.31	0.66	75.58
1ALA	1AVR	0.87	0.40	0.85	0.41	77.85
4P2P	2BPP	1.66	0.56	2.01	0.87	84.55
135L	1HHL	1.11	0.58	1.03	0.58	86.82
1EMY	1YMC	1.26	0.39	1.21	0.40	87.58
2CHF	1CHN	1.91	1.16	1.93	1.15	97.62
1FAFA	1GH6A	3.00	2.08	3.72	2.53	31.65
1ETB1	1TTCA	0.65	0.23	0.78	0.22	98.31

Mean		1.62	0.86	1.83	0.99	66.13

Table 7: An assessment of PROTEUS2's performance for homology modeling. In this test, 37 proteins with sequence identity ranges from 21%-99% were modeled using PROTEUS2 and 3D JigSaw (also using default parameters). Identical template structures were used for each structure generation attempt by each program. The resulting structures were compared using backbone RMSD and all-atom RMSD.

PDB ID		RMSD				Identity
PDB ID		3D JigSaw		Homodeller		Identity
Query	Template	All atoms	CA	All atoms	CA	%
132LA	1IORA	1.37	0.15	1.33	0.14	98.45
135LA	1IORA	1.41	0.15	1.29	0.13	93.02
1A3DA	1L8SA	1.57	0.61	3.18	2.77	57.14
1AAPA	1BPIA	1.99	0.42	2.23	0.40	44.64
1ALAA	1AVRA	1.09	0.41	1.35	0.41	77.85
1AZRA	1JVOL	1.17	0.33	0.55	0.33	97.66
1B7DA	1CN2A	2.37	1.39	3.83	3.27	44.26
1CADA	8RXNA	1.42	0.54	1.92	0.65	65.38
1CBSA	1CBIA	1.66	0.98	2.36	0.97	77.21
1CRBA	1CBIA	2.72	1.72	2.99	1.81	41.04
1CTFA	1DD3A	1.44	0.74	1.78	0.89	70.59
1DHFA	1DR7A	1.63	0.64	1.81	0.63	75.27
1EMYA	1YMCA	1.23	0.36	1.48	0.37	87.58
1ETB1	1ETA1	5.56	3.23	1.23	0.28	99.15
1FAFA	1GH6A	3.56	2.77	3.51	2.52	31.65
1FKFA	1FKKA	1.02	0.35	1.31	0.35	97.2
1HNAA	4GTUH	1.63	0.73	1.69	0.73	82.49
1HSAA	2BCKD	1.68	0.56	1.75	0.57	88.77
1MRJA	1MOMA	1.42	0.55	1.38	0.54	65.04
1NHKR	1PKUC	1.55	0.80	2.04	1.55	45.14
1NOAA	1AKPA	3.30	2.57	3.58	3.13	38.05
1POHA	1PTFA	2.06	1.13	2.28	1.21	35.29
1PVAA	1CDPA	1.25	0.42	1.62	0.67	62.04
1PZAA	1PMYA	1.45	0.83	1.68	0.85	45.00
1ROPA	1GTOA	2.13	1.36	2.19	1.37	96.43
1RZLA	1JTBA	2.73	2.09	3.36	2.34	62.64
1TADB	1GIAA	1.71	0.85	1.73	0.85	69.35
1THBA	1FHJC	1.18	0.45	1.40	0.44	82.98
2CHFA	1CHNA	1.89	1.14	1.81	1.15	97.62
2CROA	2OR1R	2.38	0.67	1.92	0.68	52.38
2GBPA	3GBPA	3.72	3.51	4.13	3.51	94.43
2LALA	2B7YC	1.24	0.36	1.09	0.36	82.32
2OZ9R	1JHGA	1.61	0.18	1.98	0.19	99.01
2YCCA	1YEBA	1.58	0.33	1.25	0.34	89.81
3IL8A	1MSHB	3.40	2.09	2.80	1.56	44.12
4AZUA	1JVOL	1.12	0.24	0.48	0.24	99.22
4P2PA	2BPPA	1.66	0.55	2.31	0.81	84.55
5PTIA	1P2MD	1.92	0.59	1.36	0.41	96.55

Mean		1.94	0.97	2.00	1.04	72.93