Purpose: xalign is a graphical program which does multiple sequence alignment based on sequence homology and secondary structure.
Copyright (C) 1994 -
No portion of this program may be incorporated into other
programs or sold for profit without express written consent
of the authors. Funding for this project has been provided
by the Medical Research Council of Canada and the Protein
Engineering Networks of Centres of Excellence (Canada).
Even though xalign is a standalone program, we strongly recommend
getting any optional software as outlined on the download page.
Once you have downloaded the software, you then proceed by
uncompressing and untarring the files:
You should then take a look at the
README
file to understand what files are being installed
and the installation options you have.
After this, type "Install" to put
the files in the appropriate places.
The current version of this software comes with an expiry date.
If your software has expired, check out the website above for
further instructions or new versions.
Each input sequence must contain these minimum attributes:
The number of amino acid codes per line does not matter,
however, it is easier to check your input for correctness if
you decide on a constant number such as 50. Also, blanks are
ignored if found in the amino acid sequence. Alternative
amino acid code meanings such as 'B', 'X', and 'Z' are
acceptable input but they will have no scoring value during
the alignment process (unless the amino acid scoring matrix
is changed).
Here is an example input file:
Here is an example input file with secondary structure
information included:
The following section explains how the user can enter
specific knowledge into the alignment process.
Sometimes xalign will insert gaps into an alignment where
you think are not correct. You could change any of the various
gapping penalties in the parameter file but this will
likely change your entire alignment (which you may already
be happy with). To prevent the program from breaking up a
certain section of amino acids, just type asterisks above
those amino acids. Because the program ignores blanks in
the input sequence, the other amino acids without asterisks
must get some kind of default character. In this case, use a
"-" or dash character.
Here is an example of a sequence input file which has this amino
acid clustering:
Another potentially useful tool is to be able to anchor a
certain amino acid in one sequence to a certain amino acid
in another sequence. One can imagine a scenario where a user
knows that two amino acids line up but because of remote
homology, xalign can't understand the significance of that
particular match.
To implement this anchoring procedure, the user specifies a
number between 1-5 above the first amino acid to anchor in
the first sequence. The user then specifies that same
number above the second amino acid in the second sequence.
Here is an example of anchoring one amino acid in one
sequence to another amino acid in another sequence:
If you do not get a graphical window, check with your system administrator to
make sure the program has been installed and is accessible to you. A
common problem is that your PATH environmental variable needs to be
changed to include the location of the installed xalign program.
If you are logged in remotely, then enter the first command in the
console window and the second in your remote login window:
If you click the first button, the sequences are displayed and the user
clicks the sequence to align to.
Since the multiple sequence alignment algorithm is heuristic, xalign
can generate different alignments depending on the order in which
the sequences are processed.
The default computer algorithm is to align
sequences from most to least homologous,
starting first with those sequences that have structure determined.
You as the user have the choice of selecting the initial sequence to
align to or even deciding the complete order for processing sequences.
This freedom is basically allowed for experimental purposes. Most of
the time your best alignment should occur when you select the option
that allows the computer to decide the alignment order.
The printing of pairwise alignments is an option for the
user in the
xalign.parms
file. The bars "|" in the pairwise
alignment indicate amino acids which are identical, the
asterisks "*" denote amino acids which are similar. The
currect amino acid number is printed at the end of each
sequence line.
The "percent sequence homology" is calculated as the score of the
current alignment divided by the score of the perfect alignment.
It is not the number of amino acids which match over the length
of the alignment.
The ranking of pairwise sequences is determined by the
"alignment score". The alignment score
is the percent sequence homology score plus a constant if the sequence
has secondary structure determined.
The order of sequences in the multiple
alignment is based on the sequences which occur first within
the ranked pairwise alignments. Usually it is better to
align sequences from most to least homologous starting first
with those that have structure determined.
Note that the consensus sequence is based on a threshold
percent identity which is set in the parameter file. If the
threshold is reached, then that amino acid is printed otherwise
a dash "-" is printed.
The xalign program was developed to handle the majority of alignment
requests in a reasonable manner. Compromising the relatively straight
forward algorithm for special classes of alignments of probable nature
seemed beyond the intent of the program. Since the tools are available for
the user to correct the errors, let he/she use them.
The following suggestions can help you use the xalign
program to arrive at the best alignment possible. Some
of these suggestions involve modifying variables (denoted as XALIGN:)
in the
xalign.parms
file.
The "xalign.parms" parameter file contains default settings for
gap penalties, amino acid similarity and also some useful output options.
The program looks for an "xalign.parms" file in the current directory, if
one does not exist it uses the one in
$INSTALL/lib/xalign/xalign.parms.
Users who are interested in changing some of default
settings in order to get better alignments may want to copy
the above file to their current directory and try various
changes.
Last modified: Mar 19, 1997
Questions to:
webmaster@diadem.biochem.ualberta.ca
Overview
This graphical program does a multiple alignment of sequences based on
a comprehensive dynamic programming algorithm. The alignment
is based on amino acid similarity, secondary structure similarity,
and various gapping penalties. These parameters have
been generalized to align the majority of sequences in a
reasonable manner.
Main Screen Snapshot
Capabilities
xalign has all of the following attributes which make it a
very powerful yet relatively easy to use program:
Reference
Constrained multiple sequence alignment using XALIGN
Authors:
David Wishart
,
Robert Boyko,
Brian Sykes
in Cabios Vol. 10 no.6 1994 Pages 687-688
uncompress myfile.tar.Z
tar xvf myfile.tar
Basic Sequence Input
An input file contains two or more sequences to align.
Although there is no maximum number of sequences you can align,
you are limited by the amount of memory on the machine you
are running on.
>CaM Calmodulin - Drosophila melanogaster (1-148)
ADQLTEEQIA EFKEAFSLFD KDGDGTITTK ELGTVMRSLG QNPTEAELQD
MINEVDADGN GTIDFPEFLT MMARKMKDTD SEEEIREAFR VFDKDGNGFI
SAAELRHVMT NLGEKLTDEE VDEMIREANI DGDGQVNYEE FVTMMTSK
>TnC Troponin C, cloned chicken skeletal muscle (1-162)
ASMTDQQAEA RAFLSEEMIA EFKAAFDMFD ADGGGDISTK ELGTVMRMLG
QNPTKEELDA IIEEVDEDGS GTIDFEEFLV MMVRQMKEDA KGKSEEELAN
CFRIFDKNAD GFIDIEELGE ILRATGEHVI EEDIEDLMKD SDKNNDGRID
FDEFLKMMEG VQ
To include secondary structure in a sequence, this information
is placed on the line directly below the primary
sequence (upper or lower case letters acceptable). Use "h"
for helical regions, "b" for beta strand, "c" for random
coil, "t" for beta turn and "x" for regions you don't know
or care about.
>CaM Calmodulin - Drosophila melanogaster (1-148)
ADQLTEEQIAEFKEAFSLFDKDGDGTITTKELGTVMRSLGQNPTEAELQD
ccccchhhhhhhhhhhhhhccccccbbbhhhhhhhhhhcccccchhhhhh
MINEVDADGNGTIDFPEFLTMMARKMKDTDSEEEIREAFRVFDKDGNGFI
hhhhhccccccbbbhhhhhhhhhhhhhcccchhhhhhhhhhhhcccccbb
SAAELRHVMTNLGEKLTDEEVDEMIREANIDGDGQVNYEEFVTMMTSK
bhhhhhhhhhhcccccchhhhhhhhhhcccccccbbbhhhhhhhhhcc
>TnC Troponin C, cloned chicken skeletal muscle (1-162)
ASMTDQQAEARAFLSEEMIAEFKAAFDMFDADGGGDISTKELGTVMRMLG
cccchhhhhhhhhcchhhhhhhhhhhhhhccccccbbbhhhhhhhhhhcc
QNPTKEELDAIIEEVDEDGSGTIDFEEFLVMMVRQMKEDAKGKSEEELAN
cccchhhhhhhhhhhccccccbbbhhhhhhhhhhhhhcccccccchhhhh
CFRIFDKNADGFIDIEELGEILRATGEHVIEEDIEDLMKDSDKNNDGRID
hhhhhccccccbbbhhhhhhhhhhhccccchhhhhhhhhhhccccccbbb
FDEFLKMMEGVQ
hhhhhhhhhhcc
Note: If you choose to enter secondary structure information,
then you must enter it for all amino acids.
Advanced Sequence Input
>unkn1 unknown protein mouse
-------------******-------------------------------
SRTEYDPLKFWPITHYCPHSARKDTYPERFYANMPKLDNQGPLSTYPLST
cchhhhhhhhhchhhhhhhccccccccccbbbbccccccccccccchhhh
---------------
QWPIIVDTASATLMS
hhcbbbbbbcccccc
In the example above, the asterisks over "THYCPH" will
ensure that the program will not break up these amino acids
in the alignment.
>unkn1 this protein unknown for mouse
---------1----------------------------------------
SRTEYDPLKFWPITHYCPHSARKDTYPERFYANMPKLDNQGPLSTYPLST
cchhhhhhhhhchhhhhhhccccccccccbbbbccccccccccccchhhh
---------------
QWPIIVDTASATLMS
hhcbbbbbbcccccc
>unkn2 some other protein
---------1----------------------------------------
SRSELDPLKFMPLPITYCGHSAREATYPERDDANMPKLENSTGPLQTYPL
------------------
LSYQCPIIVDTAKHLLNS
The anchoring procedure can be applied to any number of
sequences. You can also have the same anchoring number
appear more than once in a sequence, the program ends up
choosing the anchor which maximizes the total alignment score.
Running xalign
xhost + remoteMachine
setenv DISPLAY hostMachine:0
This allows xalign to run on the remote machine but the
display will go to the host computer.
Output
The output of xalign consists of the following:
Alignment Analysis
First it is important for the user to realize that the programming
model makes a number of assumptions and simplifications in order
to turn multiple sequence alignment into a mathematical problem.
Secondly, the user should realize that solving this particular
mathematical problem "perfectly" is impractical for
3 or more sequences.
Alignment Parameters