IBSpector

Documentation for IBSpector.

Module to run demographic inference on diploid genomes, under the assumption of panmixia.

Data form of input and output

The genome needs to be SNP-called and the genomic distance between consecutive heterozygous positions needs to be computed. Heterozygous positions are the ones with genotype 0/1 or 1/0 (Note that the phase is not important). The input is then a vector containind such distances. Additionally, mutation and recombination rates need to be chosen and passed as input as well. See Tutorial for more details on preparing input data.

The demographic model underlying the inference is composed of a variable number of epochs and the population size is constant along each epoch.

The output is a vector of parameters in the form [L, N0, T1, N1, T2, N2, ...] where L is the total sequence length, N0 is the ancestral population size in the furthermost epoch and extending to the infinite past, the subsequent pairs $(T_i, N_i)$ are the duration and size of following epochs going from past to present. This format is referred to as TN vector throughout. The length L should match the input sequence length and is floating to improve the fit.

IBSpector.FitOptions
IBSpector.FitResult
IBSpector.Spectra.laplacekingman
IBSpector.Spectra.mldsmcp
IBSpector.adapt_histogram
IBSpector.compare_mlds
IBSpector.compare_mlds!
IBSpector.compare_models
IBSpector.compute_residuals
IBSpector.compute_residuals
IBSpector.compute_residuals
IBSpector.demoinfer
IBSpector.demoinfer
IBSpector.durations
IBSpector.evd
IBSpector.flags
IBSpector.get_covar
IBSpector.get_para
IBSpector.loglike
IBSpector.pop_sizes
IBSpector.pre_fit!
IBSpector.residstructure
IBSpector.sample_model_epochs
IBSpector.sds
IBSpector.setOptimOptions!
IBSpector.setinit!
IBSpector.times

IBSpector.FitOptions — Method

FitOptions(Ltot, nhet, mu, rho; kwargs...)

Construct an an object of type FitOptions, requiring total genome length Ltot in base pairs, number of heterozygous sites nhet, mutation rate mu and recombination rate rho per base pair per generation.

Optional Arguments

Tlow::Number=10, Tupp::Number=1e7: The lower and upper bounds for the duration of epochs.
Nlow::Number=10, Nupp::Number=1e8: The lower and upper bounds for the population sizes.
force::Bool=true: if true try to fit further epochs even when no signal is found.
maxnts::Int=5: The maximum number of new time splits to consider when adding a new epoch. Higher is greedier.
order::Int=0: maximum number of higher order SMC' corrections to account for (i.e. number of intermediate recombination events plus one). When zero, it is set automatically.
ndt::Int=0: number of Legendre nodes to use for numerical integration. When zero, it is set automatically.
locut::Int=1: index of the first histogram bin to consider in the fit.

IBSpector.FitResult — Type

struct FitResult

A data structure to store the results of a fit.

See the introduction for how the model is parameterized Data form of input and output. Some methods are defined for this type to get the vector of parameters, std errors, model evidence, etc. See get_para, sds, evd, pop_sizes, durations.

IBSpector.adapt_histogram — Method

adapt_histogram(segments::AbstractVector{<:Integer}; lo::Int=1, hi::Int=50_000_000, nbins::Int=0, tailthr::Int=0)

Build an histogram from segments logbinned between lo and hi with nbins bins. nbins is automatically determined by default.

The upper limit is adapted to ensure logspacing with the requested nbins. The adaptive strategy is such that the last bin has at least tailthr segments.

IBSpector.compare_mlds! — Method

compare_mlds!(h1::Histogram, h2::Histogram, theta1::Float64, theta2::Float64)

The same as compare_mlds, except that it takes two histograms and mutates them. Return values are the same as compare_mlds.

IBSpector.compare_mlds — Method

compare_mlds(segs1::Vector{Int}, segs2::Vector{Int}; lo = 1, hi = 1_000_000, nbins = 200)

Build two histograms from segs1 and segs2, rescale them for number and average heterozygosity and return four vectors containing respectively common midpoints for bins, the two rescaled weights and variances of the difference between weights.

IBSpector.compare_models — Function

compare_models(models[, mask])

Compare the models parameterized by FitResults and return the best one. Takes an iterable of FitResult as input and optionally a boolean mask to reflect prior knowledge on models to discard.

IBSpector.compute_residuals — Method

compute_residuals(h::Histogram, yth::AbstractVector{<:Real})

Compute the residuals between the observed and expected weights.

IBSpector.compute_residuals — Method

compute_residuals(h::Histogram, mu, rho, TN::Vector; naive=true)

Compute the residuals between the observed and expected weights.

Optional arguments

naive::Bool=true: if true the expected weights are computed using the closed form integral, otherwise using higher order transition probabilities from SMC' theory.
order::Int=10: maximum number of higher order corrections to use when naive is false, i.e. number of intermediate recombination events plus one.
ndt::Int=800: number of Legendre nodes to use when naive is false.

IBSpector.compute_residuals — Method

compute_residuals(h1::Histogram, h2::Histogram)

When two histograms are given the std error of the difference is used for normalization.

IBSpector.demoinfer — Method

demoinfer(segments::AbstractVector{<:Integer}, epochrange::AbstractRange{<:Integer}, mu::Float64, rho::Float64; kwargs...)

Make an histogram with IBS segments and infer demographic models with piece-wise constant epochs where the number of epochs is epochrange.

Return a named tuple which contains the fields:

fits: a vector of FitResult (see FitResult)
chains: a vector of vectors of FitResult, one for each iteration of the correction procedure, and one chain per model
yth: a vector of vectors of the expected weights, one for each model and one vector of expected weights per iteration of the correction procedure
corrections: a vector of vectors of corrections, one for each iteration of the correction procedure, and one vector of corrections per model. Corrections are histogram counts, therefore they have the same shape.
h_obs: the histogram of the observed segments
h_mods: a vector of modified histograms, one for each model, with higher order corrections applied.
ybest: a vector of expected weights corresponding to the best fit, one for each model
resid: a vector of vectors of residuals, one for each model
p: a vector of right-tail p-values for the autocorrelation of residuals, one for each model
llbest: a vector of the best log-likelihoods, one for each model
deltas: a vector of vectors of the maximum absolute difference between corrections in consecutive iterations, and for each model.
lls: a vector of vectors of log-likelihoods, one for each iteration and for each model. The output estimate is the one with the highest log-likelihood.
conv: a vector of booleans, one for each model, indicating whether the maximum iterations were reached (false) or whether the procedure converged before (true).

Optional Arguments

fop::FitOptions = FitOptions(sum(segments), mu, rho): the fit options, see FitOptions.
lo::Int=1: The lowest segment length to be considered in the histogram
hi::Int=50_000_000: The highest segment length to be considered in the histogram
nbins::Int=fop.ndt: The number of bins to use in the histogram
iters::Int=20: The number of iterations to perform after warmup. It might converge earlier. Warmup iterations are proportional to the rho/mu ratio.
reltol::Float64=1e-2: The relative tolerance to use for convergence, i.e. the maximum absolute difference between corrections in consecutive iterations. The convergence condition test this or relchange.
relchange::Float64=1e-4: The relative change in parameters to use for convergence. This is the maximum relative change in parameters between consecutive iterations. The convergence condition test this or reltol.
corcut::Int=fop.locut-1: The index of the last histogram bin to apply corrections to. This should not be changed in most cases.

IBSpector.demoinfer — Method

demoinfer(h::Histogram, epochrange, fop::FitOptions; iters=20, reltol=1e-2, corcut=fop.locut-1, finalize=false)
demoinfer(h, epochs, fop; iters=20, reltol=1e-2, corcut=fop.locut-1, finalize=false)

Take an histogram of IBS segments, fit options, and infer demographic models with piece-wise constant epochs where the number of epochs is epochrange. Return a named tuple as above.

If epochrange is a integer, then it fits only the model with that number of epochs. In this case the returned named tuple contains only one element per field, instead of a vector.

IBSpector.durations — Method

durations(fit::FitResult)

Return the fitted durations of the epochs.

IBSpector.evd — Method

evd(fit::FitResult)

Return the log-evidence of the fit.

IBSpector.flags — Method

flags(fit::FitResult)

Return a named tuple of flags and diagnostics for the fit, including:

converged: whether the optimization converged
convex_optimum: whether the likelihood Hessian at the optimum is strictly positive definite.
at_any_boundary: whether any parameter is at its lower or upper bound
log_like: the log-likelihood of the fit
log_evidence: the log-evidence of the fit
optimizer_message: the original message from the optimizer, which can be useful for diagnosing optimization issues.

IBSpector.get_covar — Method

get_covar(fit::FitResult)

Return the covariance matrix of the parameters of the fit, computed as the inverse of the log-likelihood Hessian at the optimum.

IBSpector.get_para — Method

pars(fit::FitResult)

Return the parameters of the fit.

IBSpector.loglike — Method

loglike(fit::FitResult)

Return the log-likelihood of the fit.

IBSpector.pop_sizes — Method

pop_sizes(fit::FitResult)

Return the fitted population sizes, from past to present.

IBSpector.pre_fit! — Method

pre_fit!(fop::FitOptions, h::Histogram, nfits)

Preliminarily fit h with an approximate model of piece-wise constant epochs for each number of epochs from 1 to nfits.

See FitOptions for how to specify them. It modifies fop in place to adapt it to all the requested epochs. Return a vector of FitResult, one for each number of epochs, see also FitResult.

IBSpector.residstructure — Method

residstructure(residuals::AbstractVector{<:Real})

Compute the p-value for the autocorrelation of adjacent residuals. The p-value is the right tail of the t-distribution.

IBSpector.sample_model_epochs — Method

sample_model_epochs(options::FitOptions, h::Histogram{T,1,E}, fit::FitResult; nsamples = 10_000, naive = isnaive(options))

Sample nsamples from the posterior distribution of the parameters, starting from initial point in MLE fit obtained from demoinfer.

Requires the observed histogram h and the fit options options. Return a Chains object from the MCMCDiagnostics module of Turing, which contains the samples from the posterior distribution. If naive is false, the sampling will be done using the SMC' likelihood, which is more accurate but also more computationally intensive. If naive is true, the sampling will be done using the closed-form integral likelihood, which requires to use a modified histogram h_mod as output by demoinfer.

IBSpector.sds — Method

sds(fit::FitResult)

Return the standard deviations of the parameters of the fit.

IBSpector.setOptimOptions! — Method

setOptimOptions!(fop::FitOptions; kwargs...)

Set the options which are passed to Optimization.solve, see Optimization.jl. and the specific Optim.jl section, which is the default optimizer. Defaults are:

solver: The solver to use for the optimization, default is LBFGS().
maxiters = 6000
maxtime = 60 (in seconds)
g_tol = 5e-8

If given more parameters, they are passed to the optimizer.

IBSpector.setinit! — Method

setinit!(fop::FitOptions, init::AbstractVector{Float64})

Set the initial vector of parameters for the optimization which takes the FitOptions object fop.

IBSpector.times — Method

times(fit::FitResult)

Return the times of size changes.

IBSpector.Spectra.laplacekingman — Method

laplacekingman(r, mu, TN)

Compute the approximate number of segments of length r using the Laplace transform of the Kingman coalescent at frequency 2mu r, given mutation rate mu and population size history TN.

IBSpector.Spectra.mldsmcp — Method

mldsmcp(rs, edges, mu, rho, TN; order = 10, ndt = 800)

Compute the expected number of segments with length in each bin defined by edges, given the midpoints rs, mutation rate mu, recombination rate rho, and population size history TN.

The computation uses the SMC' higher order transition probabilities with order maximum number of intermediate recombination events plus one, and ndt Legendre nodes for the numerical integration.