IBSpector
Documentation for IBSpector.
Module to run demographic inference on diploid genomes, under the assumption of panmixia.
Data form of input and output
The genome needs to be SNP-called and the genomic distance between consecutive heterozygous positions needs to be computed. Heterozygous positions are the ones with genotype 0/1 or 1/0 (Note that the phase is not important). The input is then a vector containind such distances. Additionally, mutation and recombination rates need to be chosen and passed as input as well. See Tutorial for more details on preparing input data.
The demographic model underlying the inference is composed of a variable number of epochs and the population size is constant along each epoch.
The output is a vector of parameters in the form [L, N0, T1, N1, T2, N2, ...] where L is the total sequence length, N0 is the ancestral population size in the furthermost epoch and extending to the infinite past, the subsequent pairs $(T_i, N_i)$ are the duration and size of following epochs going from past to present. This format is referred to as TN vector throughout. The length L should match the input sequence length and is floating to improve the fit.
IBSpector.FitOptionsIBSpector.FitResultIBSpector.Spectra.laplacekingmanIBSpector.Spectra.mldsmcpIBSpector.adapt_histogramIBSpector.compare_mldsIBSpector.compare_mlds!IBSpector.compare_modelsIBSpector.compute_residualsIBSpector.compute_residualsIBSpector.compute_residualsIBSpector.demoinferIBSpector.demoinferIBSpector.durationsIBSpector.evdIBSpector.flagsIBSpector.get_covarIBSpector.get_paraIBSpector.loglikeIBSpector.pop_sizesIBSpector.pre_fit!IBSpector.residstructureIBSpector.sample_model_epochsIBSpector.sdsIBSpector.setOptimOptions!IBSpector.setinit!IBSpector.times
IBSpector.FitOptions — Method
FitOptions(Ltot, nhet, mu, rho; kwargs...)Construct an an object of type FitOptions, requiring total genome length Ltot in base pairs, number of heterozygous sites nhet, mutation rate mu and recombination rate rho per base pair per generation.
Optional Arguments
Tlow::Number=10,Tupp::Number=1e7: The lower and upper bounds for the duration of epochs.Nlow::Number=10,Nupp::Number=1e8: The lower and upper bounds for the population sizes.level::Float64=0.95: The confidence level for the confidence intervals on the parameters estimates.force::Bool=true: if true try to fit further epochs even when no signal is found.maxnts::Int=5: The maximum number of new time splits to consider when adding a new epoch. Higher is greedier.order::Int=0: maximum number of higher order SMC' corrections to account for (i.e. number of intermediate recombination events plus one). When zero, it is set automatically.ndt::Int=0: number of Legendre nodes to use for numerical integration. When zero, it is set automatically.locut::Int=1: index of the first histogram bin to consider in the fit.
IBSpector.FitResult — Type
struct FitResultA data structure to store the results of a fit.
See the introduction for how the model is parameterized Data form of input and output. Some methods are defined for this type to get the vector of parameters, std errors, model evidence, etc. See get_para, sds, evd, pop_sizes, durations.
IBSpector.adapt_histogram — Method
adapt_histogram(segments::AbstractVector{<:Integer}; lo::Int=1, hi::Int=50_000_000, nbins::Int=0, tailthr::Int=0)Build an histogram from segments logbinned between lo and hi with nbins bins. nbins is automatically determined by default.
The upper limit is adapted to ensure logspacing with the requested nbins. The adaptive strategy is such that the last bin has at least tailthr segments.
IBSpector.compare_mlds! — Method
compare_mlds!(h1::Histogram, h2::Histogram, theta1::Float64, theta2::Float64)The same as compare_mlds, except that it takes two histograms and mutates them. Return values are the same as compare_mlds.
IBSpector.compare_mlds — Method
compare_mlds(segs1::Vector{Int}, segs2::Vector{Int}; lo = 1, hi = 1_000_000, nbins = 200)Build two histograms from segs1 and segs2, rescale them for number and average heterozygosity and return four vectors containing respectively common midpoints for bins, the two rescaled weights and variances of the difference between weights.
IBSpector.compare_models — Function
compare_models(models[, mask])Compare the models parameterized by FitResults and return the best one. Takes an iterable of FitResult as input and optionally a boolean mask to reflect prior knowledge on models to discard.
IBSpector.compute_residuals — Method
compute_residuals(h::Histogram, yth::AbstractVector{<:Real})Compute the residuals between the observed and expected weights.
IBSpector.compute_residuals — Method
compute_residuals(h::Histogram, mu, rho, TN::Vector; naive=true)Compute the residuals between the observed and expected weights.
Optional arguments
naive::Bool=true: if true the expected weights are computed using the closed form integral, otherwise using higher order transition probabilities from SMC' theory.order::Int=10: maximum number of higher order corrections to use whennaiveis false, i.e. number of intermediate recombination events plus one.ndt::Int=800: number of Legendre nodes to use whennaiveis false.
IBSpector.compute_residuals — Method
compute_residuals(h1::Histogram, h2::Histogram)When two histograms are given the std error of the difference is used for normalization.
IBSpector.demoinfer — Method
demoinfer(segments::AbstractVector{<:Integer}, epochrange::AbstractRange{<:Integer}, mu::Float64, rho::Float64; kwargs...)Make an histogram with IBS segments and infer demographic models with piece-wise constant epochs where the number of epochs is epochrange.
Return a named tuple which contains the fields:
fits: a vector ofFitResult(seeFitResult)chains: a vector of vectors ofFitResult, one for each iteration of the correction procedure, and one chain per modelyth: a vector of vectors of the expected weights, one for each model and one vector of expected weights per iteration of the correction procedurecorrections: a vector of vectors of corrections, one for each iteration of the correction procedure, and one vector of corrections per model. Corrections are histogram counts, therefore they have the same shape.h_obs: the histogram of the observed segmentsh_mods: a vector of modified histograms, one for each model, with higher order corrections applied.ybest: a vector of expected weights corresponding to the best fit, one for each modelresid: a vector of vectors of residuals, one for each modelp: a vector of right-tail p-values for the autocorrelation of residuals, one for each modeldeltas: a vector of vectors of the maximum absolute difference between corrections in consecutive iterations, and for each model.lls: a vector of vectors of log-likelihoods, one for each iteration and for each model. The output estimate is the one with the highest log-likelihood.conv: a vector of booleans, one for each model, indicating whether the maximum iterations were reached (false) or whether the procedure converged before (true).
Optional Arguments
fop::FitOptions = FitOptions(sum(segments), mu, rho): the fit options, seeFitOptions.lo::Int=1: The lowest segment length to be considered in the histogramhi::Int=50_000_000: The highest segment length to be considered in the histogramnbins::Int=fop.ndt: The number of bins to use in the histogramiters::Int=20: The number of iterations to perform after warmup. It might converge earlier. Warmup iterations are proportional to therho/muratio.reltol::Float64=1e-2: The relative tolerance to use for convergence, i.e. the maximum absolute difference between corrections in consecutive iterations. The convergence condition test this orrelchange.relchange::Float64=1e-4: The relative change in parameters to use for convergence. This is the maximum relative change in parameters between consecutive iterations. The convergence condition test this orreltol.corcut::Int=fop.locut-1: The index of the last histogram bin to apply corrections to. This should not be changed in most cases.
IBSpector.demoinfer — Method
demoinfer(h::Histogram, epochrange, fop::FitOptions; iters=20, reltol=1e-2, corcut=fop.locut-1, finalize=false)
demoinfer(h, epochs, fop; iters=20, reltol=1e-2, corcut=fop.locut-1, finalize=false)Take an histogram of IBS segments, fit options, and infer demographic models with piece-wise constant epochs where the number of epochs is epochrange. Return a named tuple as above.
If epochrange is a integer, then it fits only the model with that number of epochs. In this case the returned named tuple contains only one element per field, instead of a vector.
IBSpector.durations — Method
durations(fit::FitResult)Return the fitted durations of the epochs.
IBSpector.evd — Method
evd(fit::FitResult)Return the log-evidence of the fit.
IBSpector.flags — Method
flags(fit::FitResult)Return a named tuple of flags and diagnostics for the fit, including:
converged: whether the optimization convergedconvex_optimum: whether the likelihood Hessian at the optimum is strictly positive definite.ci_low: whether the confidence interval of any parameter includes zeroat_any_boundary: whether any parameter is at its lower or upper boundlog_like: the log-likelihood of the fitlog_evidence: the log-evidence of the fitoptimizer_message: the original message from the optimizer, which can be useful for diagnosing optimization issues.
IBSpector.get_covar — Method
get_covar(fit::FitResult)Return the covariance matrix of the parameters of the fit, computed as the inverse of the log-likelihood Hessian at the optimum.
IBSpector.get_para — Method
pars(fit::FitResult)Return the parameters of the fit.
IBSpector.loglike — Method
loglike(fit::FitResult)Return the log-likelihood of the fit.
IBSpector.pop_sizes — Method
pop_sizes(fit::FitResult)Return the fitted population sizes, from past to present.
IBSpector.pre_fit! — Method
pre_fit!(fop::FitOptions, h::Histogram, nfits)Preliminarily fit h with an approximate model of piece-wise constant epochs for each number of epochs from 1 to nfits.
See FitOptions for how to specify them. It modifies fop in place to adapt it to all the requested epochs. Return a vector of FitResult, one for each number of epochs, see also FitResult.
IBSpector.residstructure — Method
residstructure(residuals::AbstractVector{<:Real})Compute the p-value for the autocorrelation of adjacent residuals. The p-value is the right tail of the t-distribution.
IBSpector.sample_model_epochs — Method
sample_model_epochs(options::FitOptions, h::Histogram{T,1,E}, fit::FitResult; nsamples = 10_000, naive = isnaive(options))Sample nsamples from the posterior distribution of the parameters, starting from initial point in MLE fit obtained from demoinfer.
Requires the observed histogram h and the fit options options. Return a Chains object from the MCMCDiagnostics module of Turing, which contains the samples from the posterior distribution. If naive is false, the sampling will be done using the SMC' likelihood, which is more accurate but also more computationally intensive. If naive is true, the sampling will be done using the closed-form integral likelihood, which requires to use a modified histogram h_mod as output by demoinfer.
IBSpector.sds — Method
sds(fit::FitResult)Return the standard deviations of the parameters of the fit.
IBSpector.setOptimOptions! — Method
setOptimOptions!(fop::FitOptions; kwargs...)Set the options which are passed to Optimization.solve, see Optimization.jl. and the specific Optim.jl section, which is the default optimizer. Defaults are:
solver: The solver to use for the optimization, default isLBFGS().maxiters = 6000maxtime = 60g_tol = 5e-8
If given more parameters, they are passed to the optimizer.
IBSpector.setinit! — Method
setinit!(fop::FitOptions, init::AbstractVector{Float64})Set the initial vector of parameters for the optimization which takes the FitOptions object fop.
IBSpector.times — Method
times(fit::FitResult)Return the times of size changes.
IBSpector.Spectra.laplacekingman — Method
laplacekingman(r, mu, TN)Compute the approximate number of segments of length r using the Laplace transform of the Kingman coalescent at frequency 2mu r, given mutation rate mu and population size history TN.
IBSpector.Spectra.mldsmcp — Method
mldsmcp(rs, edges, mu, rho, TN; order = 10, ndt = 800)Compute the expected number of segments with length in each bin defined by edges, given the midpoints rs, mutation rate mu, recombination rate rho, and population size history TN.
The computation uses the SMC' higher order transition probabilities with order maximum number of intermediate recombination events plus one, and ndt Legendre nodes for the numerical integration.