Tutorial
To run the package, first install julia (here). To create a local environment with the package cd into your work directory and launch julia, then install the package (and other useful packages to handle the data):
using Pkg; Pkg.activate(".")Pkg.Registry.add(RegistrySpec(url = "https://github.com/ArndtLab/JuliaRegistry.git"))Pkg.add("HetDister","HistogramBinnings","CSV","DataFrames","DataFramesMeta")You can now load the installed packages:
using HetDister, HistogramBinnings, CSV, DataFrames, DataFramesMetaPreparing input data
For example, suppose you have a .vcf file with called variants you want to analyze. Then, in the most basic case, you may compute distances between heterozygous SNPs as follows:
f = "/myproject/myfavouritespecies.vcf"
df = CSV.read(f, DataFrame,
delim='\t',
comment="##",
missingstring=[".", "NaN"],
normalizenames=true,
ntasks = 1,
drop = [:INFO, :ID, :FILTER],
)
# remove homozygous variants
@chain df begin
@rsubset! (:SampleName[1] == '1' && :SampleName[3] == '0') || (:SampleName[1] == '0' && :SampleName[3] == '1')
end
ils = df.POS[2:end] .- df.POS[1:end-1]
@assert all(ils .> 0)Now we have a vector of intervals ils.
Fitting demographic models
The tool require three inputs: a vector of IBS segments lengths, a mutation rate and a recombination rate (both per bp per generation). First, IBSs need to be placed into a histogram.
h = adapt_histogram(ils)
mu = 1e-8
rho = 1e-8Then we set up the FitOptions object that contains several parameters for the optimization. We stick with default values and only initialize with the three required inputs:
fop = FitOptions(sum(ils), mu, rho)And we fit 8 different model, with a number of epochs in the range 1 to 8:
results = demoinfer(h_obs, 1:8, fop)The fitted models can be accessed with results.fits.