ConSurf Quick Help

 

PDB ID

Each structure in the PDB is represented by a 4 character alphanumeric identifier, assigned upon its deposition. For example, 1bxl and 1d66 are identification codes for PDB entries for Bcl-Xl / Bak complex and Gal4 (Residues 1 - 65) complex with 19mer DNA, respectively. 

For more information check the PDB home page
 

User-Provided PDB file

The PSI-BLAST search for homologues is done using the sequence extracted from the SEQRES record, or from the ATOM record in case the SEQRES record is missing. The ATOM record is essential for ConSurf run; therefore it must be included in the user-provided PDB file. 
 

Chain Identifier

To run ConSurf you should specify a Chain Identifier (A, B, C, in the corresponding field of the PDB file). If no chain is specified in the PDB file, please type 'none' on the chain field. 

One way to get the chain identifier is to display the molecule in FirstGlance in Jmol and click on the chain of interest. The chain identifier is reported in the message box at the lower left frame.

You can also find the chain identifiers of a standard PDB file on the third field (column 12) of the SEQRES records or the sixth field (column 22) of the ATOM records. The chain identifier may be any single legal character, including a blank character, which is used if there is only one chain.

For more information check the PDB File Format Contents Guide
 

E-value (PSI-BLAST)

The Expectation value (E-value) is a parameter that describes the number of hits one can "expect" to see just by chance when searching a database of a particular size. It decreases exponentially with the Score (S) that is assigned to a match between two sequences. Essentially, the E-value describes the random background noise that exists for matches between sequences. The lower the E-value, the more significant the score. The Expectation value is used as a convenient way to create a significance threshold for reporting results. For example, the meaning of an E-value of 1 assigned to a hit is that in a database of the current size one might expect to see 1 match with a similar score simply by chance. Using a higher E-value will probably yield more hits, but their distance from the query sequence will increase. 

This field is irrelevant for a user-provided MSA file.
 

Maximum Number of Homologues

The maximum number of homologues, from those found by PSI-BLAST (with the given E-value), to be included in the calculation. In order to include all the homologues, replace the default value with the word "all".

This field is irrelevant for a user provided MSA file.
 

User-Provided Multiple Sequence Alignment (MSA)

ConSurf accepts external MSAs in the 7 formats supported by  CLUSTAL W.
These are: NBRF/PIR, EMBL/SwissProt, Pearson (Fasta), GDE, Clustal, GCG/MSF and RSF format. For additional information see Format Converter (use "View in browser" choice) links.

In case you provide an external MSA file, you are required to fill the "Query sequence name in MSA file" field in the ConSurf form. This is the name of the sequence in the provided MSA file that corresponds to the query chain of the selected PDB structure.

In case you provide an external MSA file in Fasta format, please use the "-" sign as the only gap symbol, as this is the only standard gap sign that ConSurf accepts.

User-Provided TREE file

In case you provide an external multiple sequence alignment (MSA) file, ConSurf can also accept a corresponding external phylogenetic tree in Newick (Phylip) format, for example:Tree File.
The names of the sequences in the tree file must be identical to the names of the sequences in the MSA file.


 

Graphic Visualization

The target protein chain is represented as a space-filling model with the conservation grades color-coded onto each amino acid van-der-Waals surface. All other chains in the PDB file are displayed in backbone representation and all ligands are presented in ball-and-stick representation.

More on this topic can be read here.


 

Reliability of the results

Calculation quality
The quality of the results depends on many parameters, but the two most important ones are the quality and number of the homologues. If the input homologues list does not include a minimal number of close enough homologues, the quality of the multiple sequence alignment (MSA) decreases, influencing the quality of the phylogenetic tree and the rest of the calculation. On the other hand, if homologues that are too remote are used, the conservation signal/s may decrease due to background noise added to the MSA. Therefore we recommend to run several runs with increasing number of homologues (10, 30, 50, 100, etc.), and compare the conserved regions in the protein. Generally, reliable conserved regions will progressively cluster. In any case, we recommend a minimum of 10 homologues for any ConSurf run.  

Position specific quality
PSI-BLAST is a local alignment engine, which means that it finds homologues that are generally shorter than the query. For this reason the multiple sequence alignment (MSA) usually includes many non-informative regions, or gaps. Obviously, the quality of the calculation for each amino acid position, i.e., column in the MSA, depends on the total number of informative non-gapped residues in the column. We report the number of sequences that are available for computation at each position, referred to as "MSA DATA", in the output file "Amino Acid Conservation Score". In cases that there are less than 6 non-gapped residues in a certain position, we consider the conservation score in this position as unreliable, and it is marked in the output files. When using the Bayesian method for the conservation scores calculations, a confidence interval for the scores estimations is obtained. The high and low values of the interval are assigned color grades according to the 1-9 coloring scheme. If the interval is equal to- or larger than- 4 color grades, the conservation scores are regarded as unreliable, and are also marked in the output files.  
 

Minimal Requirements for a Successful ConSurf Run
  • A protein structure in PDB format and the chain identifier.
  • The allowed length difference between the SEQRES-derived (NSEQRES) sequence or the MSA- extracted sequence (NMSA) and the ATOM-derived sequence (NATOM):  
    • When NSEQRES or NMSA < NATOM: The maximal difference allowed is 10%. For example, if there are 100 residues in the ATOM list, the job is rejected if the SEQRES has less than 90 residues. 

    •  
    • When NSEQRES or NMSA > NATOM: The maximal difference allowed is 80%. For example, if SEQRES lists 100 residues, there must be at least 20 residues in the ATOM records, or else the job is rejected.
  • Identity of at least 60% between the ATOM-derived sequence and the SEQRES- or MSA-extracted sequence (as calculated by CLUSTALW).
  • At least 2 homologous sequences. By default the homologues are found automatically (obtained from PSI-BLAST). Please notice that we consider the conservation scores unreliable when using less than 6 homologues for the calculation.