Untitled Document
 
     
Untitled Document

UNIX
        SGI IRIX

HOMOLOGY MODELING
        USES & BACKGROUND
        BASIC GUIDE
        FOLD ASSIGNMENT
        TEMPLATE & ALIGNMENT
        BUILDING THE MODEL
        REFINEMENT & EVALUATION

DOCKING
        USES
        BACKGROUND
        SETTING UP THE SYSTEM
        ACCESSING THE RESULTS

RATIONAL DRUG DESIGN
        USES
        BACKGROUND

MOLECULAR DYNAMICS
        USES
        BACKGROUND
        SETTING UP THE SYSTEM
        ACCESSING THE RESULTS

 

 

 

Homology Modeling

The following document gives some indepth information about homology modeling. A modeling tutorial using DS Modeling (Accelrys) can be found here.

STEP 4: Model evaluation

The source of errors in comparative modelling is mainly due to the lack of templates and the decrease in sequence identity between the target and the templates. These errors are split in five categories:

  • Errors in side-chain packing . They are mainly due to the divergence of sequences and critical when occurring in regions involved in the protein function.
  • Shifts of correctly aligned residues . Also they are produced by the divergence of the templates, where the overall fold remains but the scaffold has been locally displaced.
  • Regions without template .This is produced in local regions where the target sequence can not be aligned to any of the parents with known structure. These regions belong to the SVRs, and the structure is derived from general databases, hence increasing the conformational diversity that implies the largest errors on the model.
  • Errors due to misalignments . This is produced by a shift on the alignment between the target sequence and the templates an are the worst source of errors, because up to date they are difficult to be detected. One way of detection is by using multiple alignments including sequences without known structure. However, if the misalignment is produced only in the target sequence the multiple alignment is useless. The best way to detect these errors is by check of the final model and further refinement.
  • Errors produced by incorrect templates . This problem appears when using distantly related sequences (templates with less than 25% identity) and it is also a difficult problem although it is clearly detected. This represents a difficult problem only for models for which no other homolog templates can be used. Unfortunately, distinguishing between errors produced for a model based on an incorrect alignment with the correct template (previous error 4) and errors produced for a model based on an incorrect template is difficult..
The evaluation of a model is critical for testing and suggesting the best and most accurate model or models. Additionally, the environment can have an important influence on the accuracy of the model, particularly if the protein structure is coordinated to metals or the template used is involved in a complex with other molecular compounds . Two criteria are used to filter the models : 1) based on energetic approaches; and 2) based on experimental data. On the first step, the model is checked to preserve the correct stereochemistry of a protein polymer. This is done with programs like PROCHECK , AQUA , SQUID or WHATCHECK and it can be fixed by using optimization programs based on molecular mechanics like CHARMM , GROMOS , AMBER , X-PLOR or WHAT IF. This implies a final refinement step on the modelling that has to be taken cautiously, mainly because the optimization is done in the wrong environment (i.e. with no solvation, no ions and not necessarily meaningful conformation for side-chains). This refinement is meant to simply remove drastic and local clashes and is done by a few cycles (100-1000) of steepest descent or conjugate gradient minimization runs until achieving convergence . The next step on the evaluation is the assessment of the fold which includes the order and length of the secondary structure elements and the use of energetic profiles introduced by statistical criteria extracted from the structure domain classifications. This implies that the structure will have a particular Z-score calculated by means of fold prediction methodologies indicating those regions wrongly modelled (according to statistical means). The programs VERIFY3D , PROSAII , HARMONY or ANOLEA are among those implementing this approach. In summary, these methods compare the modelled conformation with respect to the expected or standard structure on the X-ray solved protein structures. Although some criticism is introduced at this point, it is reasonably that individual contributions of each residue to the overall energy vary widely. Therefore it seems that there should not be a correlation between wrongly modelled regions and the amount of mean force potential on the region. Still, some applications have proved the use of this method by combination with additional information (secondary structure) to refine the models. The work of Aloy et al. is a clear example where mean force potentials detect wrongly modelled regions and suggest a method to improve the model building by: 1) distinguishing the wrongly modelled regions; 2) selecting the best model between several candidates; and 3) selecting a candidate refined structure after inclusion of additional information (i.e. secondary structure).

Finally, the recent work of Lazaridis and Karplus , shows the improvement on the classical molecular mechanics calculation of the energy by including solvation (environmental) terms to detect wrongly modelled regions. Consequently, the criticism on the potential of mean force can not be applied to this approach that did perform as well as statistical functions in discriminating correct and misfolded models .

The experimental evaluation of the model can only be done by site directed mutagenesis or additional information which is not commonly obtained. One way to escape the experiment is by using the knowledge obtained from a highly spread multiple alignments of related sequences introducing the following conditions:

  • From such a multiple alignment there are observed conserved regions shared on all sequences, hence reducing the length of SCRs. The surviving common structural regions produced from an extensive refinement of the structural alignment often include the active sites plus additional core secondary structure elements that appear to lend structural support to the binding site. These regions conserve the stereo-specific interactions involved in ligand binding and catalysis. The structural knowledge of these regions, as well as the support for their presence, grow in importance with the development of structural genomics and introduces a mechanism to evaluate the modelled structure that has to agree with these findings.
  • Those residues with no conservation and mutually involved by a 3D interaction or functionality will present clear correlation in its mutation. Cases of correlated mutations have been deeply studied because of its use on ab initio folding and fold prediction, and in cases where sequences have diverged enough it is possible to use the correlation to evaluate the correct modeling of the target conformation.
 

REFERENCES

N. Alexandrov and R. Luethy. (1998). Alignment algorithm for homology modeling and threading. Protein Sci 7, 254-258.

B. Al-Lazikani, A. Lesk and C. Chothia. (1997). Standard conformations for the canonical structures of immunoglobulins. J. Mol. Biol. 273, 927-948.

P. Aloy, J. Mas, M. Martí-Renom, E. Querol, F. Avilés and B. Oliva. (2000). Refinement of modelled structures by knowledge based energy profiles and secondary structure prediction: Application to the Human Procarboxypeptidase A2. J Comput-Aided Molec. Des. 14, 83-92.

S. Altschul, T. Madden, A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D. Lipman. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402.

T. Attwood. (2000). The Babel of Bioinformatics. Science 290, 471-473.

A. Bairoch and R. Apweiler. (1997). The SWISS-PROT protein sequence data bank and its supplement TrEMBL. Nucleic Acid Res. 25, 31-36.

G. Barton and M. Sternberg. (1987). A strategy for the rapid multiple alignmentof protein sequences; confidence levels from tertiary structure comparisons. J. Mol. Biol. 198, 327-337.

A. Bateman, E. Birney, R. Durbin, S. Eddy, K. Howe and E. Sonnhammer. (2000). The Pfam protein family database. Nucleic Acid Res. 28, 263-266.

P. Bates and M. Sternberg. (1998). From Sequence to Structure. Protein Structure Prediction: A practical approach (M. Sternberg, Ed.), Oxford Univ. Press, Oxford,UK.

P. A. Bates and M. Sternberg. (1999). Model building by comparison at CASP3: Using expert knowledge and computer automation. Proteins: Struct., Func. and Gene. Suppl. 3, 47-54.

D. Bowie, J. U. Luthy and D. Eisenberg. (1991). A method to identify protein sequences that fold into a known-3D structure. Science 253, 164-170.

B. Brooks, R. Bruccoleri, B. Olafson, D. States, S. Swaminathan and M. Karplus. (1983). CHARMM: a program for macromolecular energy minimization and dynamics calculations. J. Comp. Chem. 4, 187-217.

R. Bruccoleri and M. Karplus. (1987). Prediction of the foldingof short polypetide segments by uniform conformational sampling. Biopolymers 26, 137-138.

A. Brünger. (1992). X-PLOR: A system for X-ray crystallography and NMR. Yale University Press, New haven.

V. Collura, J. Higo and J. Garnier. (1993). Modeling of protein loops by simulated annealing. Protein Sci. 2, 1502-1510.

R. Copley and P. Bork. (2000). Homology among ba8 barrels: implications for the evolution of metabolic pathways. J. Mol. Biol. 303, 627-640.

C. Chothia, A. Lesk, A. Tramontano, M. Levitt, S. Smith-Gill, G. Air, S. Sheriff, E. Padlan, D. Davies, W. Tulip, P. Colman, S. Spinelli, P. Alzari and R. Poljak. (1989). Conformations of Immunoglobulin Hypervariable Regions. Nature 342, 877-883.

S. Chung and S. Subbiah. (1996). A structural explanation for the twilight zone of protein sequence homology. Structure 4, 1123-1127.

C. Deane, Q. Kaas and T. Blundell. (2001). SCORE: predicting the core of protein models. Bioinformatics 17, 541-550.

R. Dima, J. Banavar and A. Maritan. (2000). Scoring functions in protein folding and design. Protein Sci. 9, 812-819.

F. S. Domingues, W. A. Koppensteiner, M. jaritz, A. Prlic, C. Weichenberger, M. Wiederstein, H. Floeckner, P. lackner and M. Sippl. (1999). Sustained performance of knwoledge-based potentials in fold recognition. Proteins: Struct., Func. & Gene. Suppl. 3, 112-120.

L. Donate, S. Rufino, L. Canard and T. Blundell. (1996). Conformational analysis and clustering of short and medium size loops connecting regular secondary structures. A database for modelling and prediction. Proteins Sci. 5, 2600-2616.

M. Dudeck, K. Ramnarayan and J. Ponder. (1998). Protein structure prediction using a combination of sequence homology and global energy minimization: II. Energy functions. J. Comp. Chem. 19, 548-573.

S. Eddy. (1998). Profile hidden markov models. Bioinformatics 14, 755-763.

K. Fidelis, P. Stern, D. Bacon and J. Moult. (1994). Comparison of systematic search and database methods fro constructing segments of protein structure. Protein Eng. 7, 953-960.

D. Fischer and D. Eisenberg. (1996). Protein fold recognition using sequence-derived predictions. Protein Science 5, 947-955.

A. Fiser, R. Do and A. Sali. (2000). Modeling of loops in protein structures. Protein Sci. 9, 1753-1773.

I. Friedberg, T. Kaplan and H. Margalit. (2000). Evaluation of Psi/Blast algnment accuracy in comparison to structural alignments. Protein Sci 9, 2278-2284.

D. W. Gatchell, S. Dennis and S. Vajda. (2000). Discrimination of Near-native Protein Structures from Misfolded Models by Empirical Free Energy Functions. Proteins: Struct., Func. & Gene. 41, 518-534.

C. Geourjon, C. Combet, C. Blanchet and G. Deleague. (2001). Identification of related proteins with weak sequence identity using secondary structure information. Protein Sci. 10, 788-797.

O. Gotoh. (1996). Significant inprovement in accuracy of multiple sequence alignments by iterative refinements assessed by reference to structural alignments. J. Mol. Biol. 264, 823-838.

J. Greer. (1990). Comparative modeling methods: application to the family of the mammalian serine proteases. Proteins: Struc. Func. and Gene. 7, 317-334.

W. v. Gunsteren, S. Billeter, A. Eising, P. Hünenberger, P. Früger, A. Mark, W. Scott and I. Tironi. (1996). Biomolecular Simulation: The GROMOS96 Manual and User Guide. Verlag der Fachvereine, Zürich.

R. Hooft, G. Vriend and C. Sander. (1996). Verification of protein structures: side-chain planarity. J. Appl. Crystallogr. 29, 714-716.

X. Huang and W. Miller. (1991). A time-efficient linear-space local similarity algorithm. Advan. Appl.Math. 12, 337-357.

J. Irving, J. Whisstock and A. Lesk. (2001). Protein structural alignments and functional genomics. Proteins: struc. Func and Gene. 42, 378-382.

L. Jaroszewski, L. Rychlewski and A. Godzik. (2000). Improving the quality of twilight-zone alignments. Protein Sci. 9, 1487-1496.

A. Jennings, C. Edge and M. Sternberg. (2001). An approach to improving multiple alignments of protein sequences using predicted secondary structure. Protein Eng. 14, 227-231.

D. Jones. (1999). GenTHREADER: an efficient and reliable protein fold recognition method for genomicsequences. J. Mol. Biol. 287, 797-815.

T. A. Jones and S. Thirup. (1986). Using known substructures in protein model building and crystallography. EMBO J. 5, 819-822.

K. Karplus, C. Barrett, M. Cline, M. Diekhans, L. Grate and R. Hughey. (1999). Predicting proteins tructure using only sequence information. Proteins: Struc. Func. and Gene. Suppl 3, 121-125.

L. A. Kelley, R. M. MacCallum and M. Sternberg. (2000). Enhanced genome annotation using structural profiles in the program 3D-PSSM. J. Mol. Biol. 299, 499-520.

A. Kidera. (1995). Enhanced conformational sampling in Monte carlo simulations of proteins: Applications to a constrained peptide. Proc. Natl. Acad. Sci. USA 92, 9886-9889.

P. Koehl and M. Delarue. (1995). A self-consistent mean field approach to simultneous gap closure and side-chain positioning in protein homology modeling. Nat. Struct. Biol. 2, 163-170.

P. Koehl and M. Delarue. (1996). Mean-field minimization methods for biological macromolecules. Curr. Opin. Struct. Biol. 6, 222-226.

R. Laskowski, M. MacArthur and J. Thornton. (1998). Validation of Protein models derived from experiment. Curr. Opin. Struct. Biol. 5, 631-639.

T. Lazaridis and M. Karplus. (1999). Discrimination of the native from misfolded protein models with an energy function including implicit solvation. J. Mol. Biol. 288, 477-487.

J. U. Luthy, D. Bowie and D. Eisenberg. (1992). Assesment of protein models with three dimensional profiles. Nature 356, 83-85.

A. Martin, J. Cheetham and A. Rees. (1989). Modeling antibody hypervariable loops: a combined algorithm. Proc. Natl. Acad. Sci. USA 86, 9268-9272.

A. Martin and J. Thornton. (1996). Structural Families in Loops of Homologous Proteins: Automatic Classification, Modelling and Application to Antibodies. J.Mol.Biol. 263, 800-815.

M. Martí-Renom, J. Mas, P. Aloy, E. Querol, F. Aviles and B. Oliva. (1998). Statistical Analysis of the loop-geometry on a non-redundant database of proteins. J Mol. Mod. 4, 347-354.

M. A. Martí-Renom, A. Stuart, A. Fisher, R. Sánchez, F. Melo and A. Sali. (2000). Comparative protein structure modeling of genes and genomes. Ann. Rev. Biophys. Biomolec. Struc. 29, 291-325.

C. Mattos, G. Petsko and M. Karplus. (1994). Analysis of two residue turns in proteins. J.Mol. Biol. 238, 733-747.

M. McGregor, S. Islam and M. Sternberg. (1987). Analysis of the relationship between side-chain conformation and secondary structure in globular proteins. J. Mol. Biol. 198, 295-310.

F. Melo and E. Feytmans. (1997). Novel knowledge-based mean force potential at atomic level. J. Mol. Biol. 267, 207-222.

F. Melo and E. Feytmans. (1998). Assessing protein structures with a non local atomic interaction energy. J. Mol. Biol. 277, 1141-1152.

V. Morea, A. Tramontano, M. Rustici, C. Chothia and A. Lesk. (1998). Conformations of the third hypervariable region in the VH domain of immunoglobulins. J. Mol. Biol. 275, 265-294.

B. Morgenstern. (1999). Dialign2: improvement of the segment-to-segemnt approach to multiple sequence alignment. Bioinformatics 15, 211-218

J. Moult and M. James. (1986). An algorithm for determiningthe conformation of polypeptide segments in proteins by systematic search. Proteins: Struc. Func. and Gene. 1, 156-163.

N. Nakajima, J. Higo and A. Kidera. (2000). Free energy landscapes of peptides by enhanced conformational sampling. J. Mol Biol. 296, 197-216.

C. Notredame, D. Higgins and J. Heringa. (2000). T-Coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205-217.

T. Oldfield. (1992). Squid: a program for the analysis and display of data from crystallography and molecular dynamics. J. Mol. Graph. 10, 247-252.

B. Oliva, P. Bates, E. Querol, F. Avilés and M. Sternberg. (1997). An automatic Classification of the structure of protein loops. J. Mol. Biol. 266, 814-830.

B. Oliva, P. Bates, E. Querol, F. Avilés and M. Sternberg. (1998). Automated Classification of Antibody Complementarity Determining Region 3 of the Heavy Chain (H3) Loops into Canonical Forms and Its Application to Protein Structure Prediction. J. Mol. Biol.(279), 1193-1210.

O. Olmea, B. Rost and A. Valencia. (1999). Effective use of sequence correlation and conservation in fold recognition. J. Mol. Biol. 293, 1221-1239.

A. Panchenko, A. marchler-Bauer and S. H. Bryant. (2000). Combination of threading potentials and sequence profiles improves fold recognition. J. Mol. Biol. 296, 1319-1331.

K. Pawlowski, A. Bierzynski and A. Godzik. (1996). Structural diversity in a family of homologous proteins. J. Mol. Biol. 258, 349-366.

W. Pearson. (1996). Effective protein sequence comparison. Meth. Enz. 266, 227-258.

W. Pearson and D. Lipman. (1988). Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. USA 85, 2444-2448.

R. Petrella, T. Lazaridis and M. Karplus. (1998). Protein sidechain conformer prediction: a test of the energy function. Folding and Design 3, 353-377.

C. Rapp and R. Friesner. (1999). Prediction of loop geometries using a generalyzed Born model of solvation effect. Proteins: Struc., Func. and Gene. 35, 173-183.

C. Ring and F. Cohen. (1994). Conformational sampling of loop structures using genetic algorithm. Isr. J. Chem. 34, 245-252.

D. Rosenbach and R. Rosenfeld. (1995). Simultaneous modeling of multiple loops in proteins. Protein Sci. 4, 496-505.

B. Rost. (1999). Twilight zone of proteins sequence alignments. Protein Eng. 12, 85-94.

S. Rufino, L. Donate, L. Canard and T. Blundell. (1997). Predicting the Conformational Class of Short and Medium Size Loops Connecting Regular Secondary Structures: Application to Comparative Modelling. J. Mol. Biol. 267, 352-367.

R. Russell, M. Saqi, R. Sayle, P. Bates and M. Sternberg. (1997). Recognition of analogous and homologous protein folds: analysis of sequence and structure conservation. J Mo.l Biol. 269, 423-439.

R. Russell, P. Sasieni and M. Sternberg. (1998). Supersites within superfolds. Binding site similarity in the absence of homology. J. Mol. Biol. 282, 903-918.

L. Rychlewski, L. Jaroszewski, L. Weizhong and A. Godzik. (2000). Comparison of sequence profiles. Structural prediction with no structure information. Protein Sci. 8, 232-241.

G. Salem, E. Hutchinson, C. orengo and J. Thornton. (1999). Correlation of observed Fold frequency with the ocurrence of local structural motifs. J. Mol. Biol. 287, 969-981.

A. Sali and T. Blundell. (1993). Comparative protein modeling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815.

R. Sánchez, U. Pieper, F. Melo, N. Eswar, M. Martí-Renom, M. Madhusudhan, N. Mirkovic and A. Sali. (2000). Protein Structure Modeling for Structural Genomics. Nature Struct. Biol. Suppl. November, 986-990.

R. Sánchez and A. Sali. (1997). Advances in comparative protein structure modeling. Curr. Opin. Struct. Biol. 7, 206-214.

R. Sánchez and A. Sali. (1997). Evaluation of comparative protein structure modeling by MODELLER-3. Proteins: Struc. Func. and Gene. Suppl 1, 50-58.

M. Saqi, R. Russell and M. Sternberg. (1999). Misleading local sequence alignment: implications for comparative modelling. Protein Eng. 11, 627-630.

J. Sauder, J. Arthur and R. Dunbrack. (2000). Large-scale comparisson of protein sequence alignment algorithms with structure alignments. Proteins: Struc. Func. and Gene. 40, 6-22.

P. Shenkin, D. Yarmush, R. Fine, H. Wang and C. levinthal. (1987). Predicting antibody hypervariable loop conformation: I. Ensembles of random conformation fro ring-like structures. Biopolymers 26, 2053-2085.

H. Shirai, A. Kidera and H. Nakamura. (1999). H3-rules: identification of CDR-H3 structures in antibodies. FEBS Letters 455, 188-197.

M. Sippl. (1993). Recognition of errors in three-dimensional structures of proteins. Proteins: Struc. Func. and Gene. 17, 355-362.

K. Smith and B. Honig. (1994). Evaluation of the conformational free energies of loops in proteins. Proteins: Struc. Func. and Gene. 18, 119-132.

T. Smith and M. Waterman. (1981). Identification of common molecular subsequences. J. Mol. Biol. 147, 195-197.

M. Sternberg, P. Bates, L. Kelley and R. MacCallum. (1999). Progress in proteins structure prediction: assesment of CASP3. Curr. Opin. Struct. Biol. 9, 368-373.

M. Sutcliffe, F. Hayes and T. Blundell. (1987). Knowledge-based modeling of homologous proteins, part II: rules for the conformations of substituted side-chains. Protein Eng. 1, 385-392.

M. Sutcliffe, F. Hayes, D. Carney and T. Blundell. (1987). Knowledge-based modeling of homologous proteins, part I. Three dimensional frameorks derived from the simultaneous superposition of multiple structure. Protein Eng.(377-384).

W. Taylor. (1988). A flexible method to align large numbers of biological sequences. J. Mol. Evol. 28, 161-169.

S. Teichmann, C. Chothia, G. Church and J. Park. (2000). Fast assignements of protein structures to sequences using the intermediate sequence library. Bioinformatics 16, 117-124.

J. Thompson, D. Higgins and T. Gibson. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673-4680.

J. Thompson, F. Plewianiak and O. Poch. (1999). Balibase: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15, 87-88.

J. Thompson, F. Plewianiak and O. Poch. (1999). A comprehensive comparison of multiple sequence alignment programs. Nucleic Acid Res. 27, 2682-2690.

J. Thompson, F. Plewianiak, J. Thierry and O. Poch. (2000). DbClustal: rapid and reliable global multiple alignments of protein sequence detected by database searches. Nucleic Acids Res. 28, 2919-2926.

C. Topham, N. Srinivasan, C. Thorpe, J. Overington and N. Kalsheker. (1994). Comparative modeling of major house dust mite allergen der p I: structure validation using an extended environmental amino acid propensity table. Protein Eng. 7, 869-894.

A. Torda. (1997). Perspectives in protein fold recognition. Curr. Opin. Struct. Biol. 7, 200-205.

A. Tramontano, C. Chothia and A. Lesk. (1989). Structural determinants of the conformations of medium sized loops in proteins. Proteins: Struc. Func. and Gene. 6, 382-394.

S. Vajda and C. DeLisi. (1990). Determining minimum energy conformations of polypetides by dynamic programming. Biopolymers 29, 1755-1772.

M. Vasquez. (1996). Modeling side-chain conformation. Curr. Opin. Struct. Biol. 6, 217-221.

H. W. v. Vlijmen and M. Karplus. (1997). PDB-based protein loop prediction: parameters for selection and methods for optimization. J. Mol. Biol. 267, 975-1001.

J. Wojcik, J. Mornon and J. Chomilier. (1999). New efficient statistical sequence-dependent structure prediction of short to medium-sized protein loops based on an exhaustive loop classification. J. Mol. Biol. 289, 1469-1490



     

 

Sign Guest Book
View Guest Book

This site is maintained by Arzhang Fallahi
Last Updated: August 2, 2004
Comments/Suggestions


Visits to site: