TreeFit.exe
Software for evaluating how well a UPGMA or neighbor-joining tree fits a matrix of genetic distances
Introduction
Evolutionary trees are frequently used to
describe genetic relationships between populations. Hierarchical,
bifurcating trees are a reasonable model for the evolution of DNA sequences
and species, but may or may not be appropriate for describing population structrue in populations connected by gene flow. For
example, if populations are arranged in a stepping stone pattern, the genetic relationships between populations will not
follow a hierarchical pattern, and traditional neighbor-joining or UPGMA
trees may not be appropriate tools for describing the structure of such
populations.
The computer program
TreeFit was written to analyze how well a tree fits the genetic data the
tree was calculated from. TreeFit creates neighbor-joining and UPGMA trees
from a genetic distance matrix, and then compares the observed genetic
distance between populations with the genetic distance in the tree. The
similarity between these distances is express as R-squared, the familiar
statistic used to summarize the scatter of points around a least-squares
regression line.
A manuscript describing the program is available at this link.
Input file format
TreeFit can read two file formats: a GENEPOP
file of genotypes or a text file of genetic distances. If a GENEPOP file is
used, TreeView will calculate a matrix of pairwise Fst (Weir and Cockerham 1984) and use those genetic
distances to construct evolutionary trees. The GENEPOP file format is
described on the program's webpage. If genetic distances are used as input, the
distance file must have the following format:
1. The first line of the file contains the
title of the data set
2. Each subsequent line starts out with the
name of a population.
3. Population names can not contain spaces.
4. The distance matrix must be in lower-left format (not upper-right).
5. Genetic distances are delimited by spaces.
6. The genetic distance for each population to itself is omitted.
7. There can not be any extra text after the last line.
8. Note that the second line of the file has
the name of the first population (e.g. "Cabin" below) but does not have a
genetic distance after it.
Here is a example:
Pairwise FST for
six populations of bighorn sheep
Cabin
CDome 0.2391
Davis 0.0179 0.2487
Eagle 0.1968 0.1319 0.1960
Canada 0.2270 0.2564 0.2366 0.2092
Kofa 0.2182 0.0234 0.2416 0.1007 0.2360
Download
TreeFit runs on the Microsoft Windows operating system that has the .NET
platform installed. See my Software page for instructions on how to install this on your computer (it probably is already there). Click here to download a ZIP file containing
TreeFit and a library of functions (kalinowski_library.dll) that the program
needs. A
sample distance matrix file is available here. A GENEPOP
file for the same data is available here.
A
manual for the program is available in pdf format here.
Installation / Uninstallation
To "install," place TreeFit.exe and kalinowski_library.dll in the
same folder. Click on TreeFit.exe
to run. Delete both files to "uninstall."
Citation
Please cite the following paper:
KALINOWSI ST (2009) How well do evolutionary trees describe genetic relationships between populations? Heredity 102:506-513. pdf
|