n2tool - quickly cluster similar DNA sequences using local
n2tool [-anti-sense Yes|No|y|n ] [-index filename ]
[-ini filename ] -seq filename ... [-threshold n ]
N2tool takes files of DNA sequence information and produces
an index file which links similar sequences together in
clusters. N2tool differs from ICAtool because the former is
guaranteed to compare every sequence against every other
whereas the latter is not, and because n2tool uses a quicker
pairwise comparison algorithm. The other major difference
between the programs is that N2tool has no query mode as
this function is carried out by ICAass.
Sequences can be spread amongst any number of files and new
files can be added at any time to increase the number of
sequences clustered. Various sequence formats are supported
including GenBank, EMBL, plain, (unformatted sequence
files),Staden's semi-colon and Experiment file formats, and
also 2 NBRF/FASTA style formats with the description either
on the same line as '>sequence-name' or with the description
on the line immediately following the sequence name. Extra
files of sequences can be added at any time without any
penalty of recalculation but no sequences referenced by an
index should ever be deleted.
N2tool can get its configuration parameters from the command
line or from a user initial configuration file or just set
to built in defaults. Parameter settings over-ride each
other with defaults being set first, then the configuration
file then finally the command line.
Determines whether sequences should also be compared in the
opposite sense to how they are entered. Default is no.
Defines the name of the index file existing or to be
created. Default is "cluster.index" in the current direc-
Defines the name of the file which holds the user's initial
configuration file. Default is "ICAtool.ini" in the current
-seq filename1 filename2 filenameN
This flag denotes the start of a list of space separated
filenames which hold DNA sequence information. No default,
When creating a cluster index, this flag determines the
subsequence similarity score that defines the threshold at
which 2 sequences are said to be similarDefault is 20 (# of
matches - # of mismatches).
If this file is present then all startup details present in
it will be read. An example would be
If this file is present when in UPDATE mode then any extra
sequences are added to this existing index
ICAtool(1), ICAass(1), ICAprint(1), ICAstats(1),
ICAmatches(1), tofasta(1), ssort(1), just30(1)
Doesn't use base ambiguity symbols properly: use only 'n' or
'N' which are converted to random bases.