A computer program for using SNP data for species identification
SNP data is useful for species identification, and the analysis of hybrid individuals and populations. This computer program uses diagnostic SNP loci (i.e., loci with fixed differences between taxa) to estimate the species composition of individuals and populations. It is particularly useful when more then two taxa may have hybridized, because in this situation, interpreting SNP data can be difficult. For a description of the statitical methods implemented by this program, see the citation below.
Clarki is a Windows computer program. Data is stored in two files: a diagnostic alleles file, and a sample file. The diagnostic alleles file lists all the taxa that may have hybridized and the alleles that are present in each taxa at each locus. The sample file contains genotypes of the individuals to analyze.
Diagnostic alleles file
To begin an analysis, the use must first open a diagnostic alleles file. An example is shown below:
Cutthroat diagnostic loci for comparing WCT/YCT/RBT in Montana RAG1, VIM, Cal, CBR1, P53, Tnsf, PrL2, MT1B, Try-III WCT C G A A T T G A G YCT C G A C C C A G T RBT T T T C C C G A G
The first line of the file is a description of the file than can contain any text. It is required. The second line of the file lists the names of the loci being used, with each name separated by a comma. The remaining lines of the file list the diagnostic alleles for each taxa. Each of these lines begins with a taxa ID (WCT, YCT, or RBT in the example above). Then the diagnostic alleles for each taxon are listed in the same order as the locus names above. For example, in this example, westslope cutthroat trout (WCT) have a "C" at locus RAG1. Either spaces or tabs can be used to separate taxon names and alleles. Taxon names can not have spaces in them.
Sample data is stored in a GENEPOP-like format. SNP genotypes can be presented using letters (e.g., AT) or numbers. If numbers are used, genotypes can be either 2, 4, or 6 digits long (but cannot be an odd number of digits). For example, 11, 0101, or 001001 are all acceptable genotypes). The name of each sample can be listed after the POP identifier. An example file is shown below. This data uses the diagnostic alleles listed above.
Sample species ID data; cutthroat trout from Yellowstone Park RAG1, VIM, Cal, CBR1, P53, Tnsf, PrL2, MT1B, Try-III POP Grayling Creek Trout1, CC GG AA AA TT CT GG AA GT Trout2, CC GT AT AA CT TT GG AA GG Trout3, CC GG AT AA TT TT GG AA GG Trout4, CC GG AA AA TT TT GG AA GG Trout5, CT GG AT AA TT TT GG AA GG Trout6, CC GG AA CA CT TT GG AA GG Trout7, CT GG AA AA CT CT GG AA GG Trout8, CC GG AT AA TT TT GA AA GG Trout9, CC GT AA CA CT CT GG GA GT Trout10, CC GG AA AA TT CT GG GA GG POP Slough Creek Trout1, CC GG AA CC CC CC AA GG TT Trout2, CC GG AA CC CC CC AA GG TT Trout3, CC GG AA CC CC CC AA GG TT Trout4, CC GG AA CC CC CC AA GG TT Trout5, CC GG AA CC CC CC AA GG TT Trout6, CC GG AA CC CC CC AA GG TT Trout7, CC GG AA CC CC CC AA GG TT Trout8, CC GG AA CC CC CC AA GG TT Trout9, CC GG AT CC CC CC AA GG TT Trout10, CC GG AA CC CC CC AA GG TT
Example of output
Here is the output from the program using the example data files show above.
Result from program 'Clarki' 7/15/2009 12:37:01 PM DIAGNOSTIC ALLELE FILE Species ID MS - Cutthroat DIAGNOSTIC.txt Cutthroat diagnostic markers for empirical test SAMPLE FILE Species ID - Cutthroat test - SAMPLE file.txt Sample species ID data; cutthroat trout from Yellowstone Park PROPORTIONS OF EACH SAMPLE THAT ARE FROM EACH TAXA N WCT YCT RBT Grayling Creek 10 0.81 0.07 0.12 Slough Creek 10 0.99 0.01 Results are <tab> delimited so that they can be copied' and 'pasted onto a spreadsheet.
Download the files here to obtain Clarki. Clarki runs on the Windows operating system, and requires the .NET
framework to be installed. The .NET Framework is a component of the Microsoft Windows
operating system used to build and run Windows-based applications. If you have a recent
version of Windows, you probably already have .NET installed on your computer. You
can check by clicking Start on your Windows desktop, selecting Control Panel, and
then double-clicking the Add or Remove Programs icon. When that window appears, scroll
through the list of applications. If you see Microsoft .NET Framework 2.0 listed,
the latest version is already installed and you do not need to install it again.
If you do not have .NET already installed on your computer, the easiest way to install it is to update your operating system. This is relatively painless. To begin, open Microsoft Explorer, select Tools --> Windows Update, find Microsoft .NET framework 2.0 and install it (It will be listed under "Pick updates to install").
To “install” Clarki.exe download the file StreamTree.exe and the accompianying library of functions Kalinowski_library.dll. Place both of these files in a folder. Click on the Clarki.exe file to run. To “uninstall,” simply delete both of these files.
The following citation should be used when citing Clarki:
- Kalinowski ST (2009) How to uses SNPs and other diagnotic diallelic genetic markers to estimate the composition of multi-species hybrids. Conservation Genetics. pdf