Introduction

SNP data is useful for species identification, and the analysis of hybrid individuals and populations. This computer program uses diagnostic SNP loci (i.e., loci with fixed differences between taxa) to estimate the species composition of individuals and populations. It is particularly useful when more then two taxa may have hybridized, because in this situation, interpreting SNP data can be difficult. For a description of the statitical methods implemented by this program, see the citation below.

Overview

Clarki is a Windows computer program. Data is stored in two files: a diagnostic alleles file, and a sample file. The diagnostic alleles file lists all the taxa that may have hybridized and the alleles that are present in each taxa at each locus. The sample file contains genotypes of the individuals to analyze.

Diagnostic alleles file

To begin an analysis, the use must first open a diagnostic alleles file. An example is shown below:

Cutthroat diagnostic loci for comparing WCT/YCT/RBT in Montana
RAG1, VIM, Cal,	CBR1,	P53,	Tnsf,	PrL2,	MT1B,	Try-III
WCT	C	G	A	A	T	T	G	A	G
YCT C	G	A	C	C	C	A	G	T
RBT	T	T	T	C	C	C	G	A	G

The first line of the file is a description of the file than can contain any text. It is required. The second line of the file lists the names of the loci being used, with each name separated by a comma. The remaining lines of the file list the diagnostic alleles for each taxa. Each of these lines begins with a taxa ID (WCT, YCT, or RBT in the example above). Then the diagnostic alleles for each taxon are listed in the same order as the locus names above. For example, in this example, westslope cutthroat trout (WCT) have a "C" at locus RAG1. Either spaces or tabs can be used to separate taxon names and alleles. Taxon names can not have spaces in them.

Sample file

Sample data is stored in a GENEPOP-like format. SNP genotypes can be presented using letters (e.g., AT) or numbers. If numbers are used, genotypes can be either 2, 4, or 6 digits long (but cannot be an odd number of digits). For example, 11, 0101, or 001001 are all acceptable genotypes). The name of each sample can be listed after the POP identifier. An example file is shown below. This data uses the diagnostic alleles listed above.

Sample species ID data; cutthroat trout from Yellowstone Park
RAG1, VIM, Cal,	CBR1,	P53,	Tnsf,	PrL2,	MT1B,	Try-III
POP Grayling Creek
Trout1,	CC	GG	AA	AA	TT	CT	GG	AA	GT
Trout2,	CC	GT	AT	AA	CT	TT	GG	AA	GG
Trout3,	CC	GG	AT	AA	TT	TT	GG	AA	GG
Trout4,	CC	GG	AA	AA	TT	TT	GG	AA	GG
Trout5,	CT	GG	AT	AA	TT	TT	GG	AA	GG
Trout6,	CC	GG	AA	CA	CT	TT	GG	AA	GG
Trout7,	CT	GG	AA	AA	CT	CT	GG	AA	GG
Trout8,	CC	GG	AT	AA	TT	TT	GA	AA	GG
Trout9,	CC	GT	AA	CA	CT	CT	GG	GA	GT
Trout10,	CC	GG	AA	AA	TT	CT	GG	GA	GG
POP Slough Creek
Trout1,	CC	GG	AA	CC	CC	CC	AA	GG	TT
Trout2,	CC	GG	AA	CC	CC	CC	AA	GG	TT
Trout3,	CC	GG	AA	CC	CC	CC	AA	GG	TT
Trout4,	CC	GG	AA	CC	CC	CC	AA	GG	TT
Trout5,	CC	GG	AA	CC	CC	CC	AA	GG	TT
Trout6,	CC	GG	AA	CC	CC	CC	AA	GG	TT
Trout7,	CC	GG	AA	CC	CC	CC	AA	GG	TT
Trout8,	CC	GG	AA	CC	CC	CC	AA	GG	TT
Trout9,	CC	GG	AT	CC	CC	CC	AA	GG	TT
Trout10,	CC	GG	AA	CC	CC	CC	AA	GG	TT

Example of output

Here is the output from the program using the example data files show above.

Result from program 'Clarki'
7/15/2009 12:37:01 PM

DIAGNOSTIC ALLELE FILE
Species ID MS - Cutthroat DIAGNOSTIC.txt
Cutthroat diagnostic markers for empirical test

SAMPLE FILE
Species ID - Cutthroat test - SAMPLE file.txt
Sample species ID data; cutthroat trout from Yellowstone Park
PROPORTIONS OF EACH SAMPLE THAT ARE FROM EACH TAXA

                    N	   WCT  	 YCT   	RBT
Grayling Creek	     10	  0.81	  0.07   0.12 
Slough Creek       	10	         0.99	  0.01

Results are <tab> delimited so that they can be copied' and 'pasted onto a spreadsheet.

Installation

Download the files here to obtain Clarki. Clarki runs on the Windows operating system, and requires the .NET framework to be installed. The .NET Framework is a component of the Microsoft Windows operating system used to build and run Windows-based applications. If you have a recent version of Windows, you probably already have .NET installed on your computer. You can check by clicking Start on your Windows desktop, selecting Control Panel, and then double-clicking the Add or Remove Programs icon. When that window appears, scroll through the list of applications. If you see Microsoft .NET Framework 2.0 listed, the latest version is already installed and you do not need to install it again.
If you do not have .NET already installed on your computer, the easiest way to install it is to update your operating system. This is relatively painless. To begin, open Microsoft Explorer, select Tools --> Windows Update, find Microsoft .NET framework 2.0 and install it (It will be listed under "Pick updates to install").

To “install” Clarki.exe download the file StreamTree.exe and the accompianying library of functions Kalinowski_library.dll. Place both of these files in a folder. Click on the Clarki.exe file to run. To “uninstall,” simply delete both of these files.

Citation

The following citation should be used when citing Clarki:

  • Kalinowski ST (2009) How to uses SNPs and other diagnotic diallelic genetic markers to estimate the composition of multi-species hybrids. Conservation Genetics. pdf