ConSurf Quick Help
Each structure in the PDB is represented
by a 4
character alphanumeric identifier, assigned upon its deposition. For
1bxl and 1d66 are identification codes for PDB entries for Bcl-Xl / Bak
complex and Gal4 (Residues 1 - 65) complex with 19mer DNA,
For more information check the PDB
User-Provided PDB file
search for homologues is done using the sequence extracted from the SEQRES
record, or from the ATOM
record in case the SEQRES record is missing. The ATOM record is
for ConSurf run; therefore it must be included in the user-provided PDB
To run ConSurf you should specify a Chain
(A, B, C, in the corresponding field of the PDB file). If no chain is specified in the PDB file, please type 'none' on the chain field.
One way to get the chain identifier is
the molecule in
FirstGlance in Jmol and click on the chain of interest. The chain identifier
reported in the message box at the lower left frame.
You can also find the chain identifiers of a
PDB file on the third field (column 12) of the SEQRES records or the
field (column 22) of the ATOM records. The chain identifier may be any
single legal character, including a blank character, which is used if
is only one chain.
For more information check the PDB File Format Contents Guide
The Expectation value (E-value) is a
describes the number of hits one can "expect" to see just by chance
searching a database of a particular size. It decreases exponentially
the Score (S) that is assigned to a match between two sequences.
the E-value describes the random background noise that exists for
between sequences. The lower the E-value, the more significant the
The Expectation value is used as a convenient way to create a
threshold for reporting results. For example, the meaning of an E-value
of 1 assigned to a hit is that in a database of the current size one
expect to see 1 match with a similar score simply by chance. Using a
E-value will probably yield more hits, but their distance from the
sequence will increase.
This field is irrelevant for a
Maximum Number of Homologues
The maximum number of homologues, from
by PSI-BLAST (with the given E-value), to be included in the
In order to include all the homologues, replace the default value with
the word "all".
This field is irrelevant for a user
User-Provided Multiple Sequence
ConSurf accepts external MSAs in the 7
These are: NBRF/PIR, EMBL/SwissProt, Pearson
GDE, Clustal, GCG/MSF and RSF format. For additional information see Format
Converter (use "View in browser" choice) links.
In case you provide an external MSA file,
you are required to fill the "Query sequence name in MSA file" field in
ConSurf form. This is the name of the sequence in the provided MSA file
that corresponds to the query chain of the selected PDB structure.
In case you provide an external MSA file
in Fasta format, please use the "-" sign as the only gap symbol, as
this is the only standard gap sign that
User-Provided TREE file
In case you provide an external
multiple sequence alignment (MSA) file, ConSurf can also accept a corresponding external phylogenetic tree in Newick (Phylip) format, for example:Tree File.
The names of the sequences in the tree file must be identical to the names of the sequences in the
The target protein chain is
represented as a space-filling model with the conservation grades
color-coded onto each amino acid van-der-Waals surface. All other
chains in the PDB file are displayed in backbone representation and all
ligands are presented in ball-and-stick representation.
More on this topic can be read here.
Reliability of the results
Minimal Requirements for a Successful
The quality of the results depends on many
parameters, but the two most important ones are the quality and number
of the homologues. If the input homologues list does not include a
minimal number of close enough homologues, the quality of the multiple
sequence alignment (MSA) decreases, influencing the quality of the
phylogenetic tree and the rest of the calculation. On the other hand,
if homologues that are too remote are used, the conservation signal/s
may decrease due to background noise added to the MSA. Therefore we
recommend to run several runs with increasing number of homologues (10,
30, 50, 100, etc.), and compare the conserved regions in the protein.
Generally, reliable conserved regions will progressively cluster. In
any case, we recommend a minimum of 10 homologues for any ConSurf run.
Position specific quality
PSI-BLAST is a local alignment engine,
which means that it finds homologues that are generally shorter than
the query. For this reason the multiple sequence alignment (MSA)
usually includes many non-informative regions, or gaps. Obviously, the
quality of the calculation for each amino acid position, i.e., column
in the MSA, depends on the total number of informative non-gapped
residues in the column. We report the number of sequences that are
available for computation at each position, referred to as "MSA DATA",
in the output file "Amino Acid Conservation Score". In cases that there
are less than 6 non-gapped residues in a certain position, we consider
the conservation score in this position as unreliable, and it is marked
in the output files. When using the Bayesian method for the
conservation scores calculations, a confidence interval for the scores
estimations is obtained. The high and low values of the interval are
assigned color grades according to the 1-9 coloring scheme. If the
interval is equal to- or larger than- 4 color grades, the conservation
scores are regarded as unreliable, and are also marked in the output
A protein structure in PDB format and the
The allowed length difference between the
SEQRES-derived (NSEQRES) sequence or the MSA- extracted
sequence (NMSA) and the ATOM-derived sequence (NATOM):
When NSEQRES or NMSA
< NATOM: The maximal difference allowed is 10%. For
example, if there are 100 residues in the ATOM list, the job is
rejected if the SEQRES has less than 90 residues.
- When NSEQRES or NMSA
> NATOM: The maximal difference allowed is 80%. For
example, if SEQRES lists 100 residues, there must be at least 20
residues in the ATOM records, or else the job is rejected.
Identity of at least 60% between the
ATOM-derived sequence and the SEQRES- or MSA-extracted sequence (as
calculated by CLUSTALW).
At least 2 homologous sequences. By
default the homologues are found automatically (obtained from
PSI-BLAST). Please notice that we consider the conservation scores
unreliable when using less than 6 homologues for the calculation.