IBSpector

Documentation for IBSpector.

Module to run demographic inference on diploid genomes, under the assumption of panmixia.

Data form of input and output

The genome needs to be SNP-called and the genomic distance between consecutive heterozygous positions needs to be computed. Heterozygous positions are the ones with genotype 0/1 or 1/0 (Note that the phase is not important). The input is then a vector containind such distances. Additionally, mutation and recombination rates need to be chosen and passed as input as well. See Tutorial for more details on preparing input data.

The demographic model underlying the inference is composed of a variable number of epochs and the population size is constant along each epoch.

The output is a vector of parameters in the form [L, N0, T1, N1, T2, N2, ...] where L is the total sequence length, N0 is the ancestral population size in the furthermost epoch and extending to the infinite past, the subsequent pairs $(T_i, N_i)$ are the duration and size of following epochs going from past to present. This format is referred to as TN vector throughout. The length L should match the input sequence length and is floating to improve the fit.

IBSpector.FitOptionsMethod
FitOptions(Ltot, nhet, mu, rho; kwargs...)

Construct an an object of type FitOptions, requiring total genome length Ltot in base pairs, number of heterozygous sites nhet, mutation rate mu and recombination rate rho per base pair per generation.

Optional Arguments

  • Tlow::Number=10, Tupp::Number=1e7: The lower and upper bounds for the duration of epochs.
  • Nlow::Number=10, Nupp::Number=1e8: The lower and upper bounds for the population sizes.
  • level::Float64=0.95: The confidence level for the confidence intervals on the parameters estimates.
  • force::Bool=true: if true try to fit further epochs even when no signal is found.
  • maxnts::Int=5: The maximum number of new time splits to consider when adding a new epoch. Higher is greedier.
  • order::Int=0: maximum number of higher order SMC' corrections to account for (i.e. number of intermediate recombination events plus one). When zero, it is set automatically.
  • ndt::Int=0: number of Legendre nodes to use for numerical integration. When zero, it is set automatically.
  • locut::Int=1: index of the first histogram bin to consider in the fit.
source
IBSpector.adapt_histogramMethod
adapt_histogram(segments::AbstractVector{<:Integer}; lo::Int=1, hi::Int=50_000_000, nbins::Int=0, tailthr::Int=0)

Build an histogram from segments logbinned between lo and hi with nbins bins. nbins is automatically determined by default.

The upper limit is adapted to ensure logspacing with the requested nbins. The adaptive strategy is such that the last bin has at least tailthr segments.

source
IBSpector.compare_mlds!Method
compare_mlds!(h1::Histogram, h2::Histogram, theta1::Float64, theta2::Float64)

The same as compare_mlds, except that it takes two histograms and mutates them. Return values are the same as compare_mlds.

source
IBSpector.compare_mldsMethod
compare_mlds(segs1::Vector{Int}, segs2::Vector{Int}; lo = 1, hi = 1_000_000, nbins = 200)

Build two histograms from segs1 and segs2, rescale them for number and average heterozygosity and return four vectors containing respectively common midpoints for bins, the two rescaled weights and variances of the difference between weights.

source
IBSpector.compare_modelsFunction
compare_models(models[, mask])

Compare the models parameterized by FitResults and return the best one. Takes an iterable of FitResult as input and optionally a boolean mask to reflect prior knowledge on models to discard.

source
IBSpector.compute_residualsMethod
compute_residuals(h::Histogram, yth::AbstractVector{<:Real})

Compute the residuals between the observed and expected weights.

source
IBSpector.compute_residualsMethod
compute_residuals(h::Histogram, mu, rho, TN::Vector; naive=true)

Compute the residuals between the observed and expected weights.

Optional arguments

  • naive::Bool=true: if true the expected weights are computed using the closed form integral, otherwise using higher order transition probabilities from SMC' theory.
  • order::Int=10: maximum number of higher order corrections to use when naive is false, i.e. number of intermediate recombination events plus one.
  • ndt::Int=800: number of Legendre nodes to use when naive is false.
source
IBSpector.compute_residualsMethod
compute_residuals(h1::Histogram, h2::Histogram)

When two histograms are given the std error of the difference is used for normalization.

source
IBSpector.demoinferMethod
demoinfer(segments::AbstractVector{<:Integer}, epochrange::AbstractRange{<:Integer}, mu::Float64, rho::Float64; kwargs...)

Make an histogram with IBS segments and infer demographic models with piece-wise constant epochs where the number of epochs is epochrange.

Return a named tuple which contains the fields:

  • fits: a vector of FitResult (see FitResult)
  • chains: a vector of vectors of FitResult, one for each iteration of the correction procedure, and one chain per model
  • yth: a vector of vectors of the expected weights, one for each model and one vector of expected weights per iteration of the correction procedure
  • corrections: a vector of vectors of corrections, one for each iteration of the correction procedure, and one vector of corrections per model. Corrections are histogram counts, therefore they have the same shape.
  • h_obs: the histogram of the observed segments
  • h_mods: a vector of modified histograms, one for each model, with higher order corrections applied.
  • ybest: a vector of expected weights corresponding to the best fit, one for each model
  • resid: a vector of vectors of residuals, one for each model
  • p: a vector of right-tail p-values for the autocorrelation of residuals, one for each model
  • deltas: a vector of vectors of the maximum absolute difference between corrections in consecutive iterations, and for each model.
  • lls: a vector of vectors of log-likelihoods, one for each iteration and for each model. The output estimate is the one with the highest log-likelihood.
  • conv: a vector of booleans, one for each model, indicating whether the maximum iterations were reached (false) or whether the procedure converged before (true).

Optional Arguments

  • fop::FitOptions = FitOptions(sum(segments), mu, rho): the fit options, see FitOptions.
  • lo::Int=1: The lowest segment length to be considered in the histogram
  • hi::Int=50_000_000: The highest segment length to be considered in the histogram
  • nbins::Int=fop.ndt: The number of bins to use in the histogram
  • iters::Int=20: The number of iterations to perform after warmup. It might converge earlier. Warmup iterations are proportional to the rho/mu ratio.
  • reltol::Float64=1e-2: The relative tolerance to use for convergence, i.e. the maximum absolute difference between corrections in consecutive iterations. The convergence condition test this or relchange.
  • relchange::Float64=1e-4: The relative change in parameters to use for convergence. This is the maximum relative change in parameters between consecutive iterations. The convergence condition test this or reltol.
  • corcut::Int=fop.locut-1: The index of the last histogram bin to apply corrections to. This should not be changed in most cases.
source
IBSpector.demoinferMethod
demoinfer(h::Histogram, epochrange, fop::FitOptions; iters=20, reltol=1e-2, corcut=fop.locut-1, finalize=false)
demoinfer(h, epochs, fop; iters=20, reltol=1e-2, corcut=fop.locut-1, finalize=false)

Take an histogram of IBS segments, fit options, and infer demographic models with piece-wise constant epochs where the number of epochs is epochrange. Return a named tuple as above.

If epochrange is a integer, then it fits only the model with that number of epochs. In this case the returned named tuple contains only one element per field, instead of a vector.

source
IBSpector.flagsMethod
flags(fit::FitResult)

Return a named tuple of flags and diagnostics for the fit, including:

  • converged: whether the optimization converged
  • convex_optimum: whether the likelihood Hessian at the optimum is strictly positive definite.
  • ci_low: whether the confidence interval of any parameter includes zero
  • at_any_boundary: whether any parameter is at its lower or upper bound
  • log_like: the log-likelihood of the fit
  • log_evidence: the log-evidence of the fit
  • optimizer_message: the original message from the optimizer, which can be useful for diagnosing optimization issues.
source
IBSpector.get_covarMethod
get_covar(fit::FitResult)

Return the covariance matrix of the parameters of the fit, computed as the inverse of the log-likelihood Hessian at the optimum.

source
IBSpector.pre_fit!Method
pre_fit!(fop::FitOptions, h::Histogram, nfits)

Preliminarily fit h with an approximate model of piece-wise constant epochs for each number of epochs from 1 to nfits.

See FitOptions for how to specify them. It modifies fop in place to adapt it to all the requested epochs. Return a vector of FitResult, one for each number of epochs, see also FitResult.

source
IBSpector.residstructureMethod
residstructure(residuals::AbstractVector{<:Real})

Compute the p-value for the autocorrelation of adjacent residuals. The p-value is the right tail of the t-distribution.

source
IBSpector.sample_model_epochsMethod
sample_model_epochs(options::FitOptions, h::Histogram{T,1,E}, fit::FitResult; nsamples = 10_000, naive = isnaive(options))

Sample nsamples from the posterior distribution of the parameters, starting from initial point in MLE fit obtained from demoinfer.

Requires the observed histogram h and the fit options options. Return a Chains object from the MCMCDiagnostics module of Turing, which contains the samples from the posterior distribution. If naive is false, the sampling will be done using the SMC' likelihood, which is more accurate but also more computationally intensive. If naive is true, the sampling will be done using the closed-form integral likelihood, which requires to use a modified histogram h_mod as output by demoinfer.

source
IBSpector.sdsMethod
sds(fit::FitResult)

Return the standard deviations of the parameters of the fit.

source
IBSpector.setOptimOptions!Method
setOptimOptions!(fop::FitOptions; kwargs...)

Set the options which are passed to Optimization.solve, see Optimization.jl. and the specific Optim.jl section, which is the default optimizer. Defaults are:

  • solver: The solver to use for the optimization, default is LBFGS().
  • maxiters = 6000
  • maxtime = 60
  • g_tol = 5e-8

If given more parameters, they are passed to the optimizer.

source
IBSpector.setinit!Method
setinit!(fop::FitOptions, init::AbstractVector{Float64})

Set the initial vector of parameters for the optimization which takes the FitOptions object fop.

source
IBSpector.Spectra.laplacekingmanMethod
laplacekingman(r, mu, TN)

Compute the approximate number of segments of length r using the Laplace transform of the Kingman coalescent at frequency 2mu r, given mutation rate mu and population size history TN.

source
IBSpector.Spectra.mldsmcpMethod
mldsmcp(rs, edges, mu, rho, TN; order = 10, ndt = 800)

Compute the expected number of segments with length in each bin defined by edges, given the midpoints rs, mutation rate mu, recombination rate rho, and population size history TN.

The computation uses the SMC' higher order transition probabilities with order maximum number of intermediate recombination events plus one, and ndt Legendre nodes for the numerical integration.

source