HetDister

Documentation for HetDister.

Module to run demographic inference on diploid genomes, under the assumption of panmixia.

Data form of input and output

The genome needs to be SNP-called and the genomic distance between consecutive heterozygous positions needs to be computed. Heterozygous positions are the ones with genotype 0/1 or 1/0 (Note that the phase is not important). The input is then a vector containind such distances. Additionally, mutation and recombination rates need to be chosen and passed as input as well. See Tutorial for more details on preparing input data.

The demographic model underlying the inference is composed of a variable number of epochs and the population size is constant along each epoch.

The output is a vector of parameters in the form [L, N0, T1, N1, T2, N2, ...] where L is the total sequence length, N0 is the ancestral population size in the furthermost epoch and extending to the infinite past, the subsequent pairs $(T_i, N_i)$ are the duration and size of following epochs going from past to present. This format is referred to as TN vector throughout. The length L should match the input sequence length and is floating to improve the fit.

HetDister.FitOptions
HetDister.FitResult
HetDister.Spectra.laplacekingman
HetDister.Spectra.mldsmcp
HetDister.adapt_histogram
HetDister.compare_mlds
HetDister.compare_models
HetDister.compute_residuals
HetDister.compute_residuals
HetDister.demoinfer
HetDister.demoinfer
HetDister.durations
HetDister.evd
HetDister.get_para
HetDister.pop_sizes
HetDister.pre_fit
HetDister.sds

HetDister.FitOptions — Method

FitOptions(Ltot, mu, rho; kwargs...)

Construct an an object of type FitOptions, requiring total genome length Ltot in base pairs, mutation rate and recombination rate per base pair per generation.

Optional Arguments

Tlow::Number=10, Tupp::Number=1e7: The lower and upper bounds for the duration of epochs.
Nlow::Number=10, Nupp::Number=1e8: The lower and upper bounds for the population sizes.
level::Float64=0.95: The confidence level for the confidence intervals on the parameters estimates.
solver: The solver to use for the optimization, default is LBFGS().
smallest_segment::Int=1: The smallest segment size present in the histogram to consider for the signal search.
force::Bool=true: if true try to fit further epochs even when no signal is found.
maxnts::Int=10: The maximum number of new time splits to consider when adding a new epoch. Higher is greedier.
naive::Bool=true: if true the expected weights are computed using the closed form integral, otherwise using higher order transition probabilities from SMC' theory (slower).
order::Int=10: maximum number of higher order corrections to use when naive is false, i.e. number of intermediate recombination events plus one.
ndt::Int=800: number of Legendre nodes to use when naive is false.
locut::Int=1: index of the first histogram bin to consider in the fit.

Optim Arguments

Additional keywords are passed to Optimization.solve, see Optimization.jl. and the specific Optim.jl section, which is the default optimizer. Defaults are:

maxiters = 6000
maxtime = 60
g_tol = 5e-8