Tutorial

To run the package, first install julia (here). To create a local environment with the package cd into your work directory and launch julia, then install the package (and other useful packages to handle the data):

using Pkg; Pkg.activate(".")
Pkg.Registry.add(RegistrySpec(url = "https://github.com/ArndtLab/JuliaRegistry.git"))
Pkg.add("HetDister","HistogramBinnings","CSV","DataFrames","DataFramesMeta")

You can now load the installed packages:

using HetDister, HistogramBinnings, CSV, DataFrames, DataFramesMeta

Preparing input data

For example, suppose you have a .vcf file with called variants you want to analyze. Then, in the most basic case, you may compute distances between heterozygous SNPs as follows:

f = "/myproject/myfavouritespecies.vcf"
df = CSV.read(f, DataFrame, 
    delim='\t', 
    comment="##",
    missingstring=[".", "NaN"],
    normalizenames=true,
    ntasks = 1,
    drop = [:INFO, :ID, :FILTER],
)

# remove homozygous variants
@chain df begin
    @rsubset! (:SampleName[1] == '1' && :SampleName[3] == '0') || (:SampleName[1] == '0' && :SampleName[3] == '1')
end

ils = df.POS[2:end] .- df.POS[1:end-1]
@assert all(ils .> 0)

Now we have a vector of intervals ils.

Fitting demographic models

The tool require three inputs: a vector of IBS segments lengths, a mutation rate and a recombination rate (both per bp per generation). First, IBSs need to be placed into a histogram.

h = adapt_histogram(ils)
mu = 1e-8
rho = 1e-8

Then we set up the FitOptions object that contains several parameters for the optimization. We stick with default values and only initialize with the three required inputs:

fop = FitOptions(sum(ils), mu, rho)

And we fit 8 different model, with a number of epochs in the range 1 to 8:

results = demoinfer(h_obs, 1:8, fop)

The fitted models can be accessed with results.fits.