Sample Commands for Running Hclust
Loads data from the file "data.txt" and finds tag SNPs using the
default method (hierarchical clustering with a cut-off of .5). This will
also read the additional miscellaneous information from
"SNPinfo.txt". The feature is displayed in the output tables
with data from the analysis.
Uses data from an R object (either matrix or data frame). Finds tag SNPs
using the backward step algorithm.
Plots the cluster dendrogram to a JPEG file.
Loads pre-specified SNPs from "psnp.txt" before beginning the
analysis. pSNPs are automatically chosen as tag SNPs. After the analysis,
tag SNPs are dropped using the backward-step algorithm until there are at
most 10 remaining. Note that pSNPs will never be dropped, so if there are
more than 10 pSNPs, the final set of tag SNPs will be the set of pSNPs
(and the ntags constraint cannot be met.)
First performs the usual Hclust analysis. Then, tag SNPs are dropped using
the backward-step algorithm until stbound is satisfied. If there are more
than 10 tag SNPs remaining at this point, the backward-step algorithm is
continued until there are at most 10 remaining. Finally, the tag SNPs are
printed, by index number and name, to the text file "tags.txt".
Includes data on the quality of each SNP. The quality score and the
correlation score are used to determine the suitability of each SNP. This
command also increases the weighting of the quality score to .2 from the
default value of .1. See the instructions page
for more information.
By default, SNPclust compares the mean coefficient of determination for
each subset during the backward-step alogrithm. This command forces
SNPclust to compare the 10th percentile of each subset.
The list of tag SNPs is written to "tags.txt". In addition, a
data matrix is written to "SNPmat.txt". The matrix includes
cluster assignments for each SNP, scores for tag suitability, and a column
indicating which SNPs were chosen as tag SNPs. The SNPs are sorted into
their clusters and ranked by score.
This command will run SNPclust using the bestN algorithm. First, the
default method will be used. Then the best 20 tagSNPs will be printed. The
tagSNPs are ranked according to the size of their cluster. Ties are broken
by quality score, correlation score, and finally by randomization.