Course Summary and Objectives:

High-throughput sequencing technology has made it possible to obtain large scale genetic data sets for almost any organism, creating a need for computational tools and skill sets to process these data.  While the bioinformatics workflows for processing raw data into SNPs are typically well delineated, the path for analyzing and interpreting the resulting SNP data set can be less clear.  In this workshop, students learn about classical population genetics statistics that test the neutral theory of evolution, and then get hands-on experience writing their own R code to perform each analysis on a realistic sample SNP data set.  Emphasis is placed on programming fundamentals and algorithm design: skills that extend beyond the specific calculations learned in class.  At the end of the semester, each student completes an independent project that consists of running an analysis on their own, often using their own data, and presenting their findings to the class.


  • AFS Exercises
  • Download Exercise Solution
  • A brief introduction to coalescent theory.
  • Neutral (coalescent) theory expectations of allele frequency distributions
  • Selective and demographic forces causing deviations from neutral.
  • Population Mutation Rates and Waterson's Theta
  • Download Slides
  • Running R on the Cluster
  • Download R Script
  • Brief overview of the linux command line environment (navigating directories, creating, deleting, splitting, concatenating, and moving files).
  • Cluster basics: logging in, transferring files, using interactive nodes and submitting job scripts.
  • Running R in the cluster environment; installing packages, setting up scripts to use command line arguments.
  • Download Slides